Here two videos showing the Happy Buddha scene (1024x2048x1024).
High quality video here: Buddha avi [mirror]
The updated demo download from today (right side, first position in the links)
also includes the endless Buddha executable.
Dienstag, 31. März 2009
Montag, 30. März 2009
Video
For the ones of you who cannot run the demo for some reason, I just captured a short video of it. You can watch it below in the window or download the larger version with better quality to see more details.
Landscape AVI [mirror]
Landscape AVI [mirror]
Samstag, 28. März 2009
CUDA optimizations II
Today I would like to share a couple of interesting references about optimizing CUDA. There are many similariries among these presentations, but still its interesting as reading through give you new ideas about whats possible.
1.) Optimization Techniques for Large Data Structures on CUDA
2.) AstroGPU - CUDA Optimization Part I
3.) AstroGPU - CUDA Optimization Part II
4.) CUDA Programming Notes
5.) NVISION08: Advanced CUDA: Optimizing to Get 20x Performance
6.) Top 5 Optimization Strategies for CUDA
7.) CUDA at MIT - IAP2009
Looking at foil 3 of the first presentation, using the GPU should give an average speedup of factor 10 compared to the CPU in case the algorithm can be fully SIMD parallized. ( GPU: GTX280, 933GFlops/141.7 GB/s Mem, CPU: Intel Core 2 QX9650, 96 GFlops/12.8 GB/s Mem).
Now looking at NVidias CUDA page, I am often surprised to see that some algorithms seem to have been sped up like 100x or even more, compared to CPU - this seems to be rather hard to believe, taking the numbers above into account.
1.) Optimization Techniques for Large Data Structures on CUDA
2.) AstroGPU - CUDA Optimization Part I
3.) AstroGPU - CUDA Optimization Part II
4.) CUDA Programming Notes
5.) NVISION08: Advanced CUDA: Optimizing to Get 20x Performance
6.) Top 5 Optimization Strategies for CUDA
7.) CUDA at MIT - IAP2009
Looking at foil 3 of the first presentation, using the GPU should give an average speedup of factor 10 compared to the CPU in case the algorithm can be fully SIMD parallized. ( GPU: GTX280, 933GFlops/141.7 GB/s Mem, CPU: Intel Core 2 QX9650, 96 GFlops/12.8 GB/s Mem).
Now looking at NVidias CUDA page, I am often surprised to see that some algorithms seem to have been sped up like 100x or even more, compared to CPU - this seems to be rather hard to believe, taking the numbers above into account.
Montag, 23. März 2009
New Benchmark Version
Today I ported the CUDA version to CPU (multicore), it is included in the updated Demo
[-Download-] (CUDA 2.1 Required - Driver version 181.20 or newer )
The first results so far are:
CPU (3Ghz PentiumD) - Single/Repeated/Repeated 2xAA: 3/1.2/0.6 fps
CPU (Intel Core2 Quad Q6600, 4x 3Ghz) - Single/Repeated/Repeated 2xAA: 15/8/5 fps
GPU (8800GTS) - Single/Repeated/Repeated 2xAA: 33/24/17 fps
GPU (285GTX) - Single/Repeated/Repeated 2xAA: 44/34/36 fps
Scene is this time the complex version of the one shown in the pictures below
(spherescape_complex.rle4).
Reason for the low CPU performance is mostly due many floating point operations I guess. Changing the calculations to Integer might improve the speed. Now its the most possible fair comparison however, since CPU and GPU get the same c++ code to execute.
[-Download-] (CUDA 2.1 Required - Driver version 181.20 or newer )
The first results so far are:
CPU (3Ghz PentiumD) - Single/Repeated/Repeated 2xAA: 3/1.2/0.6 fps
CPU (Intel Core2 Quad Q6600, 4x 3Ghz) - Single/Repeated/Repeated 2xAA: 15/8/5 fps
GPU (8800GTS) - Single/Repeated/Repeated 2xAA: 33/24/17 fps
GPU (285GTX) - Single/Repeated/Repeated 2xAA: 44/34/36 fps
Scene is this time the complex version of the one shown in the pictures below
(spherescape_complex.rle4).
Reason for the low CPU performance is mostly due many floating point operations I guess. Changing the calculations to Integer might improve the speed. Now its the most possible fair comparison however, since CPU and GPU get the same c++ code to execute.
Donnerstag, 19. März 2009
CPU vs. GPU
Today I made a comparison of CPU vs. GPU, to see if it was really worth the work to write everything in CUDA rather than for CPU. [detaild pics] [-CPU-Demo-]
The oponents:
CPU: 3.0 Ghz Pentium D, 1GB vs.
GPU: NVidia GTX285, 1GB
In the first round the CPU seems to provide a good performance, compared to the GPU - the GPU is just 3x faster than the CPU.
In the second round however, the GPU already wins over CPU with a speed factor of 7.3 : 1.
In the third round the CPU now lost all ground and the GPU wins about 20:1 (47.5:2.4)
Finally it would be interesting to know why the GPU doesnt work linear at all. I dont have any idea why the framerate is not half if the computations are doubled or vice versa.
The oponents:
CPU: 3.0 Ghz Pentium D, 1GB vs.
GPU: NVidia GTX285, 1GB
In the first round the CPU seems to provide a good performance, compared to the GPU - the GPU is just 3x faster than the CPU.
In the second round however, the GPU already wins over CPU with a speed factor of 7.3 : 1.
In the third round the CPU now lost all ground and the GPU wins about 20:1 (47.5:2.4)
Finally it would be interesting to know why the GPU doesnt work linear at all. I dont have any idea why the framerate is not half if the computations are doubled or vice versa.
Mittwoch, 18. März 2009
Demo with 2x AA
Small update - the demo linked below now also includes 2xAA (not 2x2!), reducing the aliasing of distant pixels significantly. On the GTS 8800 its quite slow right now, but on the GTX285 its almost no difference to the normal version I found.
For the GTS perhaps I will think about only applying AA to distant geometry to increase the speed.
For the GTS perhaps I will think about only applying AA to distant geometry to increase the speed.
Dienstag, 17. März 2009
Now the algorithm works entirely on the GPU
Today I finished shifting the ray generation part to the GPU, saving another 1-4ms as well as an unnecessary memcopy. Also silhouette-smoothing is working well, together with basic anti-aliasing ( so far only for GTX2xx cards ).
As for the smoothing, I tried two variants (left), and found the one in the middle looks best so far. The unsmoothed original (top) is too edgy and the one on the bottom smoothens too much for the tree-scene which lets near rendered geometry look like a 2D impostor.
The updated demo is here [-download-] (Cuda 2.1)
Also containing softening for the buddha & dragon scenes now
For the experienced ones of you, the shader-folder contains the shader in GLSL (soft.frag). You can experiment a bit by modifying the smoothing.
As for the smoothing, I tried two variants (left), and found the one in the middle looks best so far. The unsmoothed original (top) is too edgy and the one on the bottom smoothens too much for the tree-scene which lets near rendered geometry look like a 2D impostor.
The updated demo is here [-download-] (Cuda 2.1)
Also containing softening for the buddha & dragon scenes now
For the experienced ones of you, the shader-folder contains the shader in GLSL (soft.frag). You can experiment a bit by modifying the smoothing.
Sonntag, 15. März 2009
Silhouette Smoothing
Samstag, 14. März 2009
Soft Voxels II
Today I improved the filtering a bit. The softening looks more nice than yesterday (also its slower a litte). [-dl-new shaders-]
Still I'm not yet sure if soft voxels look better than hard-edged voxels in general. It gives the impression of missing detail and low resolution - both things which are unwanted..
Better would be real filtering to approximate the surface.
Freitag, 13. März 2009
Soft Voxels
Donnerstag, 12. März 2009
New Release
Today its time for a new release. Major mapping bugs are fixed and the colors look better now (I hope).
[-Demo Version v2-] ( Cuda 2.1 )
I also posted the Demo as IOTD on GDev as I think its worth to see.
[-link-]
[-Demo Version v2-] ( Cuda 2.1 )
I also posted the Demo as IOTD on GDev as I think its worth to see.
[-link-]
Dienstag, 10. März 2009
Happy Buddha reloaded
Any limit?
View distance set to 4.000.000 - still interactive (18fps). To have unique voxels everywhere is a problem in this case however.
Here we can also see an advantage of the RLE structure - its very easy to generate procedural mountains. With octree-raycasting it might be possible too, but right now I dont have an idea how this could work easily.
Here we can also see an advantage of the RLE structure - its very easy to generate procedural mountains. With octree-raycasting it might be possible too, but right now I dont have an idea how this could work easily.
Montag, 9. März 2009
Anti-Aliasing
Freitag, 6. März 2009
Maximal complexity ?
Donnerstag, 5. März 2009
Better Performance
Abonnieren
Posts (Atom)