Sonntag, 12. April 2015
Work in Progress on Outstar (SSAO / Bugfixes)
After a year, I finally found time to continue the development. Here a shot from a castle I built in about 1 day, including sculpting of some new blocks. Note that there are also rooms behind windows where you can walk around inside The new version also has a lot of bugfixes and now is getting closer to be ready for a demo release. SSAO (the lowermost image) is in progress, but not yet to my satisfaction. The complete castle's octree ist stored in 2.1MB on Hard disk. Title of the game will be Outstar. You can soon check out http://www.outstar.net . (The 20 fps in the screenshots is because I captured them on my notebook rather desktop PC)
Donnerstag, 22. Mai 2014
Voxel Engine Update - 18 Level Octree & Copy Paste in Action
Copy and Paste works well now. Areas of voxels can be grouped to new entities for placing them in the scene. The size of the entire voxel data of the screenshot below on disk is just 1.3MB, which is quite OK. Thanks to my new graphics card, voxel raycasting and terrain rendering also runs smoothly at 60+ fps at full HD resolution now. Also worth a note: The raycasting method is updated to allow octrees greater than 16 levels - the scene below is created in an 18 level octree. Now, problems with precision begin to arise, which need to be solved.
Dienstag, 13. Mai 2014
Quadric Mesh Simplification with Source Code
In the past days I have written a quadric based mesh simplification program. After searching the internet I couldnt find any code that was free to use, not unnecessarily bloated, fast and memory efficient, even the quadric based method is soon 20 years old. I therefore decided to write one myself.
Features / Summary:
The result is short and easy to use in case you need to adopt it to your project. You can fetch the C++ Project with source here: (about 300 lines for the main part, contained in Simplify.h )
Download Source and Data
The code is about 4x-7x faster than Meshlab, which is already fast. Using multi-core programming, it could even be faster.
Here a comparison along Meshlab, QSlim and this method:
Program output (left) and Meshlab (right). Note that Meshlab produces floating teeth and looses details around the eyes, nose and the mouth region. Reduction was 85k -> 3k Triangles.
Original (left) this code (middle) and meshlab (right)
Here another comparison: Program output (left) and Meshlab (right).
Features / Summary:
- Threshold based, therefore faster than sorting based methods
- Since the Quadric Matrices are symmetric, only 10 elements are stored & computed per Matrix instead of 16
- Non-closed meshes are supported by extra treating mesh borders
- Simplifies 2.000.000 triangles to 20.000 triangles in 3 seconds on a Core i7
- MIT License
- MS Visual Studio 2012 , C++
Update Sept.20th 2014 : improved quality of reduced borders
Download Source and Data
The code is about 4x-7x faster than Meshlab, which is already fast. Using multi-core programming, it could even be faster.
Here a comparison along Meshlab, QSlim and this method:
Program output (left) and Meshlab (right). Note that Meshlab produces floating teeth and looses details around the eyes, nose and the mouth region. Reduction was 85k -> 3k Triangles.
Original (left) this code (middle) and meshlab (right)
Here another comparison: Program output (left) and Meshlab (right).
Samstag, 3. Mai 2014
Raycaster Speed-Up up to 400% by Image Warping (ReProjection)
Introduction: Since real-time raytracing is getting faster like with the Brigade Raytracer e.g., I believe this technology can be an important contribution to this area, as it might bring raytracing one step closer to being usable for video games.
Algorithm: A technology I am working on since a while now is to exploit temporal coherence between two consecutive rendered images to speed up ray-casting. The idea is to store the x- y- and z-coordinate for each pixel in the scene in a coordinate-buffer and re-project it into the following screen using the differential view matrix. The resulting image will look as Fig.1.
The method then gathers empty 2x2 pixel blocks on the screen and stores them into an indexbuffer for raycasting the holes. Raycasting single pixels too inefficient. Small holes remaining after the hole-filling pass are closed by a simple image filter. To improve the overall quality, the method updates the screen in tiles (8x4) by raycasting an entire tile and overwriting the cache. Doing so, the entire cache is refreshed after 32 frames. Further, a triple buffer system is used. That means two image caches which are copied to alternately and one buffer that is written to. This is done since it often happens that a pixel is overwritten in one frame, but becomes visible already in the next frame. Therefore, before the hole filling starts, the two cache buffers are projected to the main image buffer.
Limitations: The method also comes with limitations of course. So the speed up depends on the motion in the scene obviously, and the method is only suitable for primary rays and pixel properties that remain constant over multiple frames, such as static ambient lighting. Further, during fast motions, the silhouettes of geometry close to the camera tends to loose precision and geometry in the background will not move as smooth as if the scene is fully raytraced each time. There, future work might include creating suitable image filters to avoid these effects.
Results: Most of the pixels can be re-used using this technology. As only a fraction of the original needs to be raycasted, the speed up is significant and up to 5x the original speed, depending on the scene (see Fig.2 - Fig.4). Resolution for that test was 1024x768, the GPU was an NVIDIA GeForce GTX765M.
Here also two videos showing this technology in action: Video1 Video2
(I uploaded them a while ago)
Finally a few papers for further reading:
Exploiting Temporal Coherence in Ray Casted Walkthroughs
Iterative Image Warping
A Shared-Scene-Graph Image-Warping Architecture for VR: Low Latency versus Image Quality
Three-Dimensional Image Warping on Programmable Graphics Hardware
Accelerating Real-Time Shading with Reverse Reprojection Caching
Algorithm: A technology I am working on since a while now is to exploit temporal coherence between two consecutive rendered images to speed up ray-casting. The idea is to store the x- y- and z-coordinate for each pixel in the scene in a coordinate-buffer and re-project it into the following screen using the differential view matrix. The resulting image will look as Fig.1.
The method then gathers empty 2x2 pixel blocks on the screen and stores them into an indexbuffer for raycasting the holes. Raycasting single pixels too inefficient. Small holes remaining after the hole-filling pass are closed by a simple image filter. To improve the overall quality, the method updates the screen in tiles (8x4) by raycasting an entire tile and overwriting the cache. Doing so, the entire cache is refreshed after 32 frames. Further, a triple buffer system is used. That means two image caches which are copied to alternately and one buffer that is written to. This is done since it often happens that a pixel is overwritten in one frame, but becomes visible already in the next frame. Therefore, before the hole filling starts, the two cache buffers are projected to the main image buffer.
Limitations: The method also comes with limitations of course. So the speed up depends on the motion in the scene obviously, and the method is only suitable for primary rays and pixel properties that remain constant over multiple frames, such as static ambient lighting. Further, during fast motions, the silhouettes of geometry close to the camera tends to loose precision and geometry in the background will not move as smooth as if the scene is fully raytraced each time. There, future work might include creating suitable image filters to avoid these effects.
Results: Most of the pixels can be re-used using this technology. As only a fraction of the original needs to be raycasted, the speed up is significant and up to 5x the original speed, depending on the scene (see Fig.2 - Fig.4). Resolution for that test was 1024x768, the GPU was an NVIDIA GeForce GTX765M.
Here also two videos showing this technology in action: Video1 Video2
(I uploaded them a while ago)
Finally a few papers for further reading:
Exploiting Temporal Coherence in Ray Casted Walkthroughs
Iterative Image Warping
A Shared-Scene-Graph Image-Warping Architecture for VR: Low Latency versus Image Quality
Three-Dimensional Image Warping on Programmable Graphics Hardware
Accelerating Real-Time Shading with Reverse Reprojection Caching
Fig.1 Result after basic re-projection |
Fig.2 Original Version |
Fig.3 With Re-Projection Enabled + In Motion |
Fig.4 With Re-Projection Enabled + Standing |
Mittwoch, 30. April 2014
Polygon Rendering vs. Voxel Raycasting
Since I started the project I have always wondered if it wouldnt be better to use polygons/triangles rather than voxels. To verify this decision, I created a reasonably complex test scene from multiple instances of the same block, and rendered the scene once with triangles using Display Lists & LOD and second using voxel raycasting (LOD inherently). The triangle block is created by exporting the voxels as polygon mesh, and then using Meshlab to reduce the triangles and create multiple LOD levels.
It turns out that triangles are much faster for a small to a medium number of instances (its about equal for 10x10x10 = 1000 instances) - but when having a large amount of 8000 instances (32 Million Triangles), 40x5x40 blocks, voxel raycasting is significantly faster ( 40 vs 13 fps ).
For triangles, I havent used any sophisticated occlusion culling. It would therefore be interesting to see how much more performance could be achieved using the voxel based umbra or the software based rasterization presented by intel.
For the ones of you that would like to experiment with occlusion culling, the source & data of the polygon benchmark can be downloaded [here].
It turns out that triangles are much faster for a small to a medium number of instances (its about equal for 10x10x10 = 1000 instances) - but when having a large amount of 8000 instances (32 Million Triangles), 40x5x40 blocks, voxel raycasting is significantly faster ( 40 vs 13 fps ).
For triangles, I havent used any sophisticated occlusion culling. It would therefore be interesting to see how much more performance could be achieved using the voxel based umbra or the software based rasterization presented by intel.
For the ones of you that would like to experiment with occlusion culling, the source & data of the polygon benchmark can be downloaded [here].
Raycasting Outside |
Raycasting inside |
Triangles Outside |
Triangles inside |
Sonntag, 27. April 2014
Samstag, 26. April 2014
NVIDIA GTX580M vs ATI Radeon R9 270x @1920x1024
Today I tried to run the demo on a newer ATI 270x card of a friend. The result was better than expected as the ATI card achieved greater 100 fps for most of the time. On the GTX580M, the framerate was around 40 outside and ~30 indoors.
New features in this version: FXAA & loading / saving of level data.
New features in this version: FXAA & loading / saving of level data.
ATI Radeon R9 270x |
NVIDIA GTX 580M |
Abonnieren
Posts (Atom)