Dienstag, 21. Mai 2013

Parallel all permutations - non-recursive - GPU optimal

This post is a general one - not voxel related.

Some time ago I developed a a simple, fast and yet non-recursive method to directly compute the n'th permutation. Thats useful especially when working on the graphics card with CUDA for parallel evaluations. (it might have been published somewhere already - so if you know the reference, you can post it as a comment)

The algorithm is based on the following property. Lets say there are 6 characters in the string to permute. Then the result is 6! = 720 combinations. In the first column of the results, we get 720/6=120 lines of each character. This means if the index counts from 0 to 719, then the first character to be removed is computed as remove_index = index/120.

For the next column, 5 numbers are left. This means, that 120 = 5 * 24 lines of the same character - and so on. 

We the following list of removes:
column0: remove_index = (a/120)%5;
column1: remove_index = (a/24)%4;
column2: remove_index = (a/6)%3;
column3: remove_index = (a/2)%2;
column4: remove_index = (a/1)%1;

Code snipplet:

std::string default = "12345";
int perm=1, digits=default.size();
for (int i=1;i<=digits;perm*=i++);
for (int a=0;a
<perm;a++)
{
std::string avail=default;
for (int b=digits,div=perm;b>0; b--)
{
div/=b;
int index = (a/div)%b;
printf("%c", avail[index] );
//avail[index]=avail[b-1]; // non-lexigraphic but fast
avail.erase(index,1) ; // lexigraphically correct
}
printf("\n");
}
printf("permutations:%d\n",perm);

In the code, perm defines the Lehmer code (wiki)

Dienstag, 19. Februar 2013

Another slight speedup

Further optimization helped to speed up a bit more.

Samstag, 16. Februar 2013

Another speedup: now ~70fps@1920x1024

Seems working on the algorithm pays off. Todays version reaches an even higher performance.
Also better performance for the forest scene as well (960x512@~100fps)






Mittwoch, 13. Februar 2013

OpenCL Happy Buddha Raycasting 1920x1024@50fps

Here a first glimpse of  a new optimization. The good old Happy Buddha scene (1024^3) raycasted at 50 fps at almost HD resolution (1920x1024). Video below. For 960x512 it renders at 150 fps.





Donnerstag, 7. Februar 2013

Instancing: Combined AABB + Voxel raycasting

Here a first combined test by raycasting 10.000 to 100.000 models.
Due to the mixed kernel, performance is not as good as pure octree.







Samstag, 25. August 2012

Blending two Voxel Scenes

Here a experiment to apply blendshapes to a voxel scene. Its much slower, still runs at interactive framerates (20-30). The video lags as the framecapture@1024x768 took quite a lot of CPU time.


OpenCL Voxel Octree Raycasting

Here the same scene visualized using OpenCL octree raycasting. Eventhough the persistent threads method is not used, its around 20-100% faster than the previous splatting (depending on the scene). The code is ported from my previous CUDA raycaster. Hardware is again the GTX580M.

Each method has its advantages though. While octree raycasting works better for static objects, splatting is better for dynamic scenes.