Montag, 22. Juni 2009

Tile-based memory layout

After long time now another update. Next logical step in the development is to add a tile-based memory layout to allow large, unique, non-repeating landscapes. Here a first screenshot showing the tiles.

Dienstag, 31. März 2009

More Videos

Here two videos showing the Happy Buddha scene (1024x2048x1024).
High quality video here: Buddha avi [mirror]

The updated demo download from today (right side, first position in the links)
also includes the endless Buddha executable.



Montag, 30. März 2009

Video

For the ones of you who cannot run the demo for some reason, I just captured a short video of it. You can watch it below in the window or download the larger version with better quality to see more details.

Landscape AVI [mirror]

Samstag, 28. März 2009

CUDA optimizations II

Today I would like to share a couple of interesting references about optimizing CUDA. There are many similariries among these presentations, but still its interesting as reading through give you new ideas about whats possible.

1.) Optimization Techniques for Large Data Structures on CUDA
2.) AstroGPU - CUDA Optimization Part I
3.) AstroGPU - CUDA Optimization Part II
4.) CUDA Programming Notes
5.) NVISION08: Advanced CUDA: Optimizing to Get 20x Performance
6.) Top 5 Optimization Strategies for CUDA
7.) CUDA at MIT - IAP2009

Looking at foil 3 of the first presentation, using the GPU should give an average speedup of factor 10 compared to the CPU in case the algorithm can be fully SIMD parallized. ( GPU: GTX280, 933GFlops/141.7 GB/s Mem, CPU: Intel Core 2 QX9650, 96 GFlops/12.8 GB/s Mem).

Now looking at NVidias CUDA page, I am often surprised to see that some algorithms seem to have been sped up like 100x or even more, compared to CPU - this seems to be rather hard to believe, taking the numbers above into account.

Montag, 23. März 2009

New Benchmark Version

Today I ported the CUDA version to CPU (multicore), it is included in the updated Demo

[-Download-] (CUDA 2.1 Required - Driver version 181.20 or newer )

The first results so far are:

CPU (3Ghz PentiumD) - Single/Repeated/Repeated 2xAA: 3/1.2/0.6 fps
CPU (Intel Core2 Quad Q6600, 4x 3Ghz) - Single/Repeated/Repeated 2xAA: 15/8/5 fps
GPU (8800GTS) - Single/Repeated/Repeated 2xAA: 33/24/17 fps
GPU (285GTX) - Single/Repeated/Repeated 2xAA: 44/34/36 fps

Scene is this time the complex version of the one shown in the pictures below
(spherescape_complex.rle4).

Reason for the low CPU performance is mostly due many floating point operations I guess. Changing the calculations to Integer might improve the speed. Now its the most possible fair comparison however, since CPU and GPU get the same c++ code to execute.

Donnerstag, 19. März 2009

CPU vs. GPU

Today I made a comparison of CPU vs. GPU, to see if it was really worth the work to write everything in CUDA rather than for CPU. [detaild pics] [-CPU-Demo-]

The oponents:
CPU: 3.0 Ghz Pentium D, 1GB vs.
GPU: NVidia GTX285, 1GB

In the first round the CPU seems to provide a good performance, compared to the GPU - the GPU is just 3x faster than the CPU.

In the second round however, the GPU already wins over CPU with a speed factor of 7.3 : 1.

In the third round the CPU now lost all ground and the GPU wins about 20:1 (47.5:2.4)

Finally it would be interesting to know why the GPU doesnt work linear at all. I dont have any idea why the framerate is not half if the computations are doubled or vice versa.

Mittwoch, 18. März 2009

Demo with 2x AA

Small update - the demo linked below now also includes 2xAA (not 2x2!), reducing the aliasing of distant pixels significantly. On the GTS 8800 its quite slow right now, but on the GTX285 its almost no difference to the normal version I found.
For the GTS perhaps I will think about only applying AA to distant geometry to increase the speed.