Ray Tracey's blog

Friday, January 28, 2011

Update 5 Cornell Box Pong: looking for help

For the past couple of days, I've been scratching my head over how to modify the tokaspt source code (gl_scene_cc to be more specific) in order to cram in some 2D gameplay physics (circle/circle collision detection) and direct user interaction with the paddles. Unfortunately without much success. I've come to the disheartening conclusion that I know too little about C++ to be able to write useful working code for the game (I know how to program very basic routine-like stuff with loops, hoping that would suffice, but alas). If anyone is interested to help out with this project, I would be eternally grateful! You can contact me at the address on this page http://i55.tinypic.com/2ppfqma.jpg

UPDATE: for the game physics of Cornell Box Pong, I've decided to use Box2D, an excellent easy-to-learn and open source 2D physics engine used by many iPhone games, which uses an advanced continuous collision detection model. You can also specify parameters such as gravity, friction, physics simulation rate and accuracy of the physics.

If I'm done with this project, I plan to integrate Bullet, the open source 3D physics engine, into tokaspt so real-time path traced physics driven animations like in this awesome video will be possible in real-time.

New videos of the Brigade real-time path tracer!

Development on the Brigade path tracer by Jacco Bikker and Dietger van Antwerpen is going strong and has made great progress as proven by these videos running on a system with dual hexacore Intel CPU (12 cores total) and 2 GTX 470 GPUs. All of the scenes are real-time pathtraced with multiple bounces at 10-12 fps.

Minecraft level with HDR skydome lighting (this time with global illumination) (24 spp):
http://www.youtube.com/watch?v=wAGEZnFVTos

simple scene with HDR skydome lighting and refraction (24 spp):
http://www.youtube.com/watch?v=A4bkT-QTQXY

Buddha and Dragon scene, multi-bounce lighting (8spp):
http://www.youtube.com/watch?v=3qvdO4TxU-8

Escher scene, lots of areas with indirect lighting (8 spp):
http://www.youtube.com/watch?v=tLREB7dp1F4

Same Escher scene but with significantly reduced noise (8 spp):
http://www.youtube.com/watch?v=lGnvuL8kq_Q

Especially the last video quite convincingly shows that real-time path tracing for games (at moderate framerates and resolution) is almost ready for prime time!

Tuesday, January 25, 2011

Update 4 on Cornell Box Pong

Here is the latest animation. I've made the reflecting sphere in the background bigger, so you can clearly see the Pong ball bouncing its way through the scene. Still a simulation, animation consists of 28 frames (3 seconds/frame rendertime on a Geforce 8600 GT M).

And 2 youtube videos:

http://www.youtube.com/watch?v=Ze1U8X4Awuc
http://www.youtube.com/watch?v=rCqXd6bAw0k

Download scene (needs tokaspt to run)

The benefits of real-time path tracing in this scene are obvious and impossible or very difficult to achieve with rasterization based techniques:

- refraction in the glass sphere in front
- reflection on curved surfaces
- diffuse interreflection, showing color bleeding on the background spheres and on the Pong ball
- soft shadows behind the background spheres and under the Pong ball when touching the ground
- true ambient occlusion (no fakes as used by SSAO, or the much better quality AOV) when the balls approach ceiling or floor
- indirect lighting (ceiling, parts of the back wall in shadow)
- anti-aliasing (multiple stochastic samples per pixel)

Now I want to focus on getting the game code ready.

Update: I've made the room higher and all the walls, celing and floor are now convex, which should simplify the collision detection:

Sunday, January 23, 2011

Update 3 on Cornell Box Pong

I uploaded a smoother animation (twice the framerate, just click on the image to see the whole picture). This is just a simulation of what the final game should look like. Just to whet game developers' appetite for the coming age of real-time path traced graphics ;-)

I also added a flash effect when the ball hits the paddles (just emitting white light). Lots of different and fancy effects can be added to make it more arcady and pinball machine like, changing the color of the main light source, making the balls in the background bounce up and down and changing color and emitter properties on the fly, changing the color of the side walls dynamically to indicate difficulty level, enlarging or narrowing the room by moving the side walls, adding more than one ball to the game, ...

And another one with bouncing spheres in the background. Notice the dynamically changing soft shadows and ambient occlusion in the corners behind the spheres.

I also posted a video on Youtube showing navigation in the Pong scene on my Geforce 8600 GT M. I set the quality at 32 spppp (samples per pixel per pass): http://www.youtube.com/watch?v=OOpLyv7gvBE. Even on this very low-end GPU, you can still achieve pretty high quality interactive scene navigation at 0.86 fps.

Update 2 on real-time path traced Cornell Box Pong

I have made a simulation of the gameplay of Cornell Box Pong, running at a (simulated) 9 fps (click here or on the animation to see the whole thing instead of only the left half):

This is the quality that could be expected in real-time (10 fps) when running on a high-end Fermi card (GTX 480, 570, 580). Notice the color bleeding from the green and red wall on the white spheres, the yellow ball is reflected in the paddles and refracted in the glass sphere in front. I could also animate the spheres in the background to show off the color bleeding even better and the side or back walls could change color during gameplay to add a nice visual effect (e.g. when the pong ball hits one of the side walls), the paddles could emit light when bouncing the ball back, etc... The goal is to make a very simple game, with simple geometry but with real-time, fully dynamic, photorealistic lighting demonstrating ultrahigh-quality dynamic GI effects only possible with real-time path tracing.

These are the frames making up the animation in full "simulated real-time"quality (rendered for about 3 seconds on my laptop with GeForce 8600GT M, should render in less then 100 milliseconds on a GTX580):

New scene can be downloaded from http://www.2shared.com/file/xiJ1VeSl/pongscene2.html (needs tokaspt, the extremely fast CUDA path tracer). Still thinking on how to progam the gameplay. Stay tuned!

Friday, January 21, 2011

Update 1 on the real-time path traced Cornell Box Pong

First update on my real-time path traced game project!!! :D
Here's a screenshot of the scene:

Everything you see in the picture is either a sphere or a part of a (very large) sphere. Ceiling, floor, side walls and back wall are in fact huge intersecting spheres, giving the impression of planes. The circular light source in the ceiling is also a part of a light emitting sphere protruding through the ceiling. The Pong bats are also parts of spheres protruding through the side walls. I included some diffuse spheres to show off the color bleeding and the obligatory reflective sphere as well.

I ran into trouble making the Pong bats as described in my previous post, so I decided to make the bats by using just one sphere per bat instead of two. The sketch below shows how it’s done:

In order to make the path traced image converge as fast as possible (in under 100 milliseconds to allow for some playability), I made the lightsource in the ceiling bigger. I think you should be able to get 30 fps in this scene on a GTX 580 and with 64 samples per pixel per frame at the default resolution. (If you have one of these cards or another Fermi-based card, please leave a comment with your performance statistics, press "p" in tokaspt to see the fps counter and change the spppp)

The above Pong scene can be downloaded here: http://www.2shared.com/file/PqfY-sBV/pongscene.html Place the file in the tokaspt folder and open it from within tokaspt by pressing F9. You also need tokaspt and a CUDA enabled GPU.

On to the gameplay, which still needs to be implemented. The gameplay mechanics are extremely simple: all the movement of the spheres happens in 2D, just like in the original Pong game: the ball is moving in a vertical plane between the Pong bats and the bats can only move up or down. Only 3 points need to be changed per frame: the centers of the 2 spheres making up the bats (only up and down) and the center of the blue ball. The blue ball bounces off the ceiling and the floor in the direction of the side walls. If the player or the computer fails to bounce the ball back with the bats and the ball hits the red sphere (red wall) or the green sphere (green wall) the game is lost for that player and another game begins. Since everything is happening in 2D this is just a matter of simple collision detection calculation between two circles. There are plenty of 2D Pong games on the net with open source code (single player and multi-player), so I only have to copy one of those and change the tokaspt source. Should be a piece of cake, except that I haven't done anything like this before :)

Monday, January 17, 2011

Real-time path traced Cornell Box Pong

I've started working on a real-time path traced version of Pong, the very first computer game ever. I'm planning to use a modified version of the Cornell box, using the framework of tokaspt, the excellent real-time CUDA path tracer by Thierry Berger-Perrin which is based on smallpt by Kevin Beason. The tokaspt path tracer is extremely fast (much much faster than iray, Octane or Brigade) and converges to a noise free image in a matter of milliseconds, because a) scenes are very simple and only use spheres as primitives (there are no planes, even the walls of the Cornell box are just parts of huge spheres with very large radii) and b) because occupancy (execution unit usage) is maximized by minimizing registers and shared memory usage and going for 1 main pass (hence minimizing memory traffic). There are no acceleration structures used, so geometry should stay simple (very limited number of spheres) to ensure real-timeness. On the other hand, this allows for completely dynamic scenes, since no acceleration structure also means that no update of acceleration structure is required.

The plan is as follows: use the Cornell box as the "playing area". The bouncing ball will be a diffuse or specular or light emitting sphere. The rectangular boxes cannot be made out of triangles (tokaspt only supports spheres as primitive) and should instead be made out of the intersection of two intersecting spheres, creating a lens-shaped object as in the picture below (grey part):

The "boxes" will thus have curved surfaces on which the ball can bounce off:

Potential problems:

- I have very little programming experience
- my development hardware is ancient (GT 8600), but even with this old card I can get very fast convergence in the scenes included in tokaspt
- making the gameplay code work and above all fun: 2D physics with collision detection of ball with the lens-shaped boxes (the ping pong bats), ceiling and floor and the ball bouncing back and forth between boxes at progressive speeds to steadily increase difficulty level (I found some opensource code for a basic Pong game here so this part shouldn't be too difficult)
- all the code should be executed on the GPU

Let's see how far I can get this. Hopefully some screenshots will follow soon.

Saturday, January 15, 2011

Nvidia's Project Denver to appear first in Maxwell GPU in 2013

Just found this piece in an article from The Register:

Nvidia is not providing much in the way of detail about Project Denver, but Andy Keane, general manager of Tesla supercomputing at Nvidia, told El Reg that Nvidia was slated to deliver its Denver cores concurrent with the Maxwell series of GPUs, which are due in 2013. As we previously reported, Nvidia's Kepler family of GPUs, implemented in 28 nanometer processes, are due this year, delivering somewhere between three and four times the gigaflops per watt of the current "Fermi" generation of GPUs. The Maxwell GPUs are expected to offer about 16 times the gigaflops per watt of the Fermi. (The Register is wrong here, the chart actually showed 16x Gigaflops per Watt over Tesla or GT200 cards)(Nvidia has not said what wafer baking process will be used for the Maxwells, but everyone is guessing either 22 or 20 nanometers).

While Keane would not say how many ARM cores would be bundled on the Maxwell GPUs, he did confirm that Nvidia would be putting a multicore chip on the GPUs and hinted that it would be considerably more than the two cores used on the Tegra 2 SoCs. "We are going to choose the number of cores that are right for the application," says Keane.

A multicore ARM CPU integrated into the GPU, nice!

Which algorithm is the best choice for real-time path tracing?

I did some very basic research about which rendering method could be the best/most practical/fastest GI method AND deliver the highest quality for things like interactive photorealistic walkthroughs and games. These are the candidates with their strengths:

instant radiosity

- fast
- only useful for diffuse and semi-glossy scenes
- performance deteriorates quickly in glossy scenes
- many artefacts due to light bleeding through, singularity effects, clamping, ...

unidirectional path tracing (PT)

- best for exteriors (mostly direct lighting)
- not so good for interiors with much indirect lighting and small light sources
- very slow for caustics

bidirectional path tracing (BDPT)

- best for interiors (indirect lighting, small light sources)
- fast caustics
- very slow for reflected caustics

Metropolis light transport (MLT) + BDPT

- best for interiors (indirect lighting, small light sources)
- especially useful for scenes with very difficult lighting (e.g. through a keyhole, light splitting through prism)
- faster for reflected caustics

energy redistribution path tracing

- mix of Monte Carlo PT and MLT
- best for interiors (indirect lighting, small light sources)
- much faster than PT for scenes with very difficult lighting (e.g. light coming through a small opening, lighting the scene indirectly)
- fast caustics
- not so fast for glossy materials
- problems with detailed geometry

photon mapping

- best for indoor scenes
- biased, artefacts, splotchy, low frequency noise
- fast, but not progressive
- large memory footprint
- very useful for caustics + reflected caustics

stochastic progressive photon mapping

- best for indoor
- fast and progressive
- very small memory footprint
- handles all kinds of caustics robustly

I also found this comment from vlado (V-Ray developer) on the V-Ray forums regarding Metropolis light transport:

"I came to the conclusion that MLT is way overrated. It can be very useful in some special situations, but for most everyday scenarios, it performs (much) worse than a well-implemented path tracer. This is because MLT cannot take advantage of any sort of sample ordering (e.g. quasi-Monte Carlo sampling, or the Schlick sequence that we use, or N-rooks sampling etc). A MLT renderer must fall back to pure random numbers which greatly increases the noise for many simple scenes (like an open skylight scene)."

BDPT with quasi Monte Carlo (QMC) for indoor and PT with QMC for outdoor scenes seem to be the best candidates for real-time pathtraced games. Two-way path tracing could be a very interesting alternative as well. Caustics are a nice effect for perfectly physically correct rendering, but are really not that important in most scenes and can generally be ignored for real-time purposes, where convergence speed is of uttermost importance.

Friday, January 14, 2011

Carmack excited about Nvidia's Project Denver, continues ray tracing research

Nvidia's recently announced move into CPU territory with Project Denver has been very well received by the general public. Game developers (who have known about this project for over a year) like John Carmack have also expressed interest. Most people are tired of the current x86 duopoly held by AMD and Intel, and for them this archaic 30+ years old legacy technology cannot die fast enough. Carmack is especially happy about Nvidia's choice for ARM because he is already familiar with coding for the ARM CPUs in mobile devices like Apple's iPhone and Google's Android.

From his twitter acount:

"I have quite a bit of confidence that Nvidia will be able to make a good ARM core. Probably fun for their engineers."

"Goal for today: parallel implementation of my TraceWorld Kd tree builder"

"10mtri model got 2.5x faster on 1 thread, 19x faster on 24 (hyper)threads."

"Amdahl’s law is biting pretty hard at the start, with only being able to fan out one additional thread per node processed."

As can be seen from his twitter entries, he also restarted his research on ray tracing. One specific thing to note is that he's talking about Amdahl's law. I first saw this law in a Siggraph 2008 presentation by Jon Olick on parallelism, and it is something that will be hampering traditional rasterization more than raycasting/raytracing. From wikipedia (Amdahl's law):

"The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. For example, if 95% of the program can be parallelized, then the theoretical maximum speedup using parallel computing would be 20 times faster, no matter how many processors are used. "

And a few thoughts related to Amdahl's law from Carmack's talk at QuakeCon 2010 in August (see http://raytracey.blogspot.com/2010/08/is-carmack-working-on-ray-tracing-based.html), implying that current GPUs, no matter how powerful, can only speed up the code to a certain extent and that scalability on multi-core CPUs is better (because contrary to the GPU, multi-core CPUs can speed up the serial code parts as well):

"so I’m going through a couple of stages of optimizing our internal raytracer, (TreeWorld used for precomputing the lightmaps and megatextures, not for real-time purposes) this is making things faster and the interesting thing about the processing was, what we found was, it’s still a fair estimate that the GPUs are going to be five times faster at some task than the CPUs. But now everybody has 8 core systems and we’re finding that a lot of the stuff running software on this system turned out to be faster than running the GPU version on the same system. And that winds up being because we get killed by Amdahl’s law there where you’re throwing the very latest and greatest GPU and your kernel amount (?) goes ten times faster. The scalability there is still incredibly great, but all of this other stuff that you’re dealing with of virtualizing of textures and managing all of that did not get that much faster. So we found that the 8 core systems were great and now we’re looking at 24 thread systems where you’ve got dual thread six core dual socket systems. It’s an incredible amount of computing power and that comes around another important topic where PC scalability is really back now "

Nvidia's project Denver is very important in this respect and will bring the theoretical maximum speedup (limited by Amdahl's law) much closer to reality, because CPU cores and GPU cores are located on the same chip and are not depending on any bandwidth restrictions. The ARM CPU cores will take care of the latency sensitive sequential parts of the code, while the CUDA cores will happily blast through the parallel code. For ray tracing in particular, this means that the ARM CPU cores will be able to dynamically build acceleration structures and speed up tree traversal for highly irregular workloads with random access, and that the plentiful CUDA cores will do ray-triangle intersection and BRDF shading at amazing speeds. This will make the Denver chip a fully programmable ray tracing platform which greatly accelerates all stages of the ray tracing pipeline. In short, a wet dream for ray tracing enthusiasts like myself :D! Based on the power-efficient ARM architecture, I think that Denver-derived chips will also be the platform of choice for cloud gaming services, for which heat and power inefficiency from the currently used x86 CPUs are creating a huge problem.

Monday, January 10, 2011

Arnold render to have full GPU acceleration in a few years

I've come across this very interesting interview about Arnold render, Sony Image Works' primary production renderer for CG feature films: http://www.3dworldmag.com/2011/01/07/pros-and-cons-of-gpu-accelerated-rendering/

Some excerpts from the interview:

"The first target for that backend is the CPU, and that’s what we’re using now in production. But the design goals of OSL include having a GPU backend, and if you were to browse on the discussion lists for OSL right now, you would see people working on GPU-accelerated renderers. So that could happen in future: that a component of the rendering could happen on the GPU, even for something like Arnold."

"it doesn’t make sense to cram the kinds of scenes we throw at Arnold every day, with tens of thousands of piece of geometry and millions of textures, at the GPU. Not today. Maybe in a few years it will."

Arnold render is a unidirectional path tracer, so it makes a perfect fit for acceleration by GPUs. "Maybe in a few years it will" could be a reference to Project Denver. When Project Denver materializes in future high-end GPUs from Nvidia, there will be a massive speed-up for production renderers like Arnold and other biased and unbiased renderers. The implications for rendering companies will be huge: all renderers will become greatly accelerated and there will no longer be a CPU rendering camp and a GPU rendering camp. Everyone will want to run their renderer on this super-Denver-chip. GPU renderers like Octane, V-Ray RT GPU and iray will have a headstart on this new platform. Real-time rendering (e.g. CryEngine 4) and offline rendering (e.g. Arnold) will converge much faster since they will be using the same hardware.

AMD and Intel will not sit still and recently launched Fusion and Sandy Bridge, which basically follow the same philosophy as project Denver, but coming from the other side: while Nvidia is adding CPU cores to the GPU, AMD and Intel are adding GPU cores to the CPU. Which approach is better remains to be seen, but I think that Nvidia will have the better performing product as usual. Eventually there will no longer be a distinction between CPUs and GPUs, since they will all be merged on the same chip: a few latency-optimized cores (today's CPU cores) which process the parts of the code that are inherently serial and are impossible to parallellize and thousands of throughput-optimized cores (today's GPU cores or stream processors), which handle the parallel parts of the code, all on the same chip using the same shared memory pool.

The coming years will be very exciting for offline and real-time graphics, in particular for raytracing based rendering. Photon mapping for example is a perfect candidate that could become real-time in a couple of years.

Thursday, January 6, 2011

Nvidia is building its own CPU!!!

This is HUGE news!! Nvidia today announced Project Denver at CES, an ARM-based CPU core manufactured by Nvidia which will be integrated into the GPU. This high-end GPU/CPU chip will provide the killer platform for real-time path tracing and will pave the way for truly real-time (30fps, 1080p) path traced games with photorealistic quality graphics.

Bill Dally, chief scientist at Nvidia, already hinted that future GPUs from Nvidia will be incorporating ARM-based CPU cores on the same chip as the GPU. Now it's official (Project Denver will first appear in the Maxwell GPU, see http://raytracey.blogspot.com/2011/01/nvidias-project-denver-to-appear-first.html)! There's an interesting blog post from Bill Dally on http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/. Some paragraphs which are relevant to GPU ray tracing/path tracing:

"As you may have seen, NVIDIA announced today that it is developing high-performance ARM-based CPUs designed to power future products ranging from personal computers to servers and supercomputers.

Known under the internal codename “Project Denver,” this initiative features an NVIDIA CPU running the ARM instruction set, which will be fully integrated on the same chip as the NVIDIA GPU. This initiative is extremely important for NVIDIA and the computing industry for several reasons.

NVIDIA’s project Denver will usher in a new era for computing by extending the performance range of the ARM instruction-set architecture, enabling the ARM architecture to cover a larger portion of the computing space. Coupled with an NVIDIA GPU, it will provide the heterogeneous computing platform of the future by combining a standard architecture with awesome performance and energy efficiency."

"An ARM processor coupled with an NVIDIA GPU represents the computing platform of the future. A high-performance CPU with a standard instruction set will run the serial parts of applications and provide compatibility while a highly-parallel, highly-efficient GPU will run the parallel portions of programs."

I wonder what Intel's and AMD's answer will be. High-end versions of Fusion and Sandy Bridge/LRB/Knight's Ferry? Either way, it's clear that all of "the big 3" are now pursuing CPU/GPU hybrid chips. Bidirectional path tracing, Markov chain Monte Carlo rendering methods (such as Metropolis light transport and ERPT) and photon mapping will benefit enormously in performance on these hybrid architectures because, being partially sequential, these algorithms are par excellence an ideal match for these hybrid chips (but with clever parallellization tricks they can already run fast on current GPUs, see MLT on GPU and photon mapping on GPU). Very complex procedural shaders will run much faster and superfast acceleration structure rebuilding (which is inherently sequential but can be parallellized to a great extent) will allow real-time ray tracing of thousands and even millions (see HLBVH paper by Pantaleoni and Luebke) of dynamic objects simultaneously. GPU and CPU will share the same memory pool, so no more slow PCIe transfers needed. Project Denver is in essence exactly what Neoptica (a think tank group of top graphics engineers acquired by Intel in 2007) had in mind (http://pharr.org/matt/talks/graphicshardware.pdf). The irony is that Neoptica's vision was intended for Larrabee, but now it's Nvidia that will make it real with the Denver project.

With Nvidia soon producing its own CPUs, competition will become fierce. From now on, Nvidia is not just a GPU company anymore, but is targetting the same PC crowd as Intel and AMD. The concepts of "GPU" and "CPU" will slowly vanish in favor of hybrid architectures, like LRB, Fusion and future Nvidia products (Keppler/Maxwell???). And there is also Imagination Technologies which will incorporate hardware accelerated ray tracing in PowerVR GPUs. Exciting times ahead! :-)

Thursday, December 30, 2010

2010, an excellent year for raytracing!

What an exciting year this has been, for raytracing at least. There has been a huge buzz around accelerated ray tracing and unbiased rendering, in which the GPU has played a pivotal role. A little overview:

- Octane Render is publicly announced. A demo is released which lets many people experience high quality unbiased GPU rendering for the first time. Unparallelled quality and amazing rendertimes on even a low-end GTX8800, catch many by surprise.

- Arion, the GPU sibling of Random Control's Fryrender, is announced shortly after Octane. Touts hybrid CPU+GPU unbiased rendering as a distinguishing feature. The product eventually releases at a prohibitively expensive price (1000€ for 1 multi-GPU license)

- Luxrender's OpenCL-based GPU renderer SmallLuxGPU integrates stochastic progressive photon mapping, an unbiased rendering method which excels at caustic-heavy scenes

- Brigade path tracer is announced, a hybrid (CPU+GPU) real-time path tracer aimed at games. Very optimized, very fast, user-defined quality, first path tracer with support for dynamic objects. GI quality greatly surpasses virtual point light/instant radiosity based methods and even photon mapping, can theoretically handle all types of BRDF, is artefact free (except for noise) and nearly real-time. No screen-space limitations. The biggest advantage over other methods is progressive rendering which instantly gives a good idea of the final converged image (some filtering and LOD scheme, similar to VoxLOD, could produce very high quality results in real-time). Very promising, it could be the best option for high-quality dynamic global illumination in games in 2 to 3 years.

- release of Nvidia Fermi GPU: caches and other enhancements (e.g. concurrent kernel execution) give ray tracing tasks an enormous boost, up to 3.5x faster in scenes with many incoherent rays compared to the previous architecture. Design Garage, an excellent tech demo featuring GPU path tracing is released alongside the cards

- Siggraph 2010 puts heavy focus on GPU rendering

- GTC 2010: Nvidia organizes a whole bunch of GPU ray tracing sessions covering OptiX, iray, etc.

- John Carmack re-expresses interest in real-time ray tracing as an alternative rendering method for next-generation games (besides sparse voxel octrees). He even started twittering about his GPU ray tracing experiments in OpenCL: http://raytracey.blogspot.com/2010/08/is-carmack-working-on-ray-tracing-based.html

- GPU rendering gets more and more criticized by the CPU rendering crowd (Luxology, Next Limit, their userbase, ...) feeling the threat of decreased revenue

- release of mental ray's iray

- release of V-Ray RT GPU, the product that started the GPU rendering revolution

- Caustic Graphics is bought by Imagination Technologies, the maker of PowerVR GPU. A surprising and potentially successful move for both companies. Hardware accelerated real-time path tracing at very high sampling rates (higher than on Nvidia Fermi) could become possible. PowerVR GPUs are integrated in Apple TV, iPad, iPhone and iPod Touch, so this is certainly something to keep an eye on in 2011. Caustic doesn't disappoint when it comes to hype and drama :)

- one of my most burning questions since the revelation of path tracing on GPU, "is the GPU capable of more sophisticated and efficient rendering algorithms than brute force path tracing?" got answered just a few weeks ago, thanks to Dietger van Antwerpen and his superb work on GPU-based Metropolis light transport and energy redistribution path tracing.

All in all, 2010 was great for me and delivered a lot to write about. Hopefully 2011 will be at least equally exciting. Some wild speculation of what might happen:

- Metropolis light transport starts to appear in commercial GPU rendering software (very high probability for Octane)
- more news about Intel's Knight's Corner/Ferry with maybe some perfomance numbers (unlikely)
- Nvidia launches Kepler at the end of 2011 which offers 3x path tracing performance of Fermi (to good to be true?)
- PowerVR GPU maker and Caustic Graphics bring hardware accelerated real-time path tracing to a mass audience through Apple mobile products (would be great)
- Luxology and Maxwell Render reluctantly embrace GPU rendering (LOL)
- finally a glimpse of OTOY's real-time path tracing (fingers crossed)
- Brigade path tracer gains exposure and awareness with the release of the first path traced game in history (highly possible)
- ...

Joy!

Monday, December 27, 2010

Global illumination with Markov Chain Monte Carlo rendering in Nvidia Optix 2.1 + Metropolis Light Transport with participating media on GPUs

Optix 2.1 was released a few days ago and includes a Markov Chain Monte Carlo (MCMC) sample, which only works on Fermi cards (New sample: MCMC - Markov Chain Monte Carlo method rendering. A global illumination solution that requires an SM 2.0 class device (e.g. Fermi) or higher).

MCMC rendering methods, such as MLT (Metropolis light transport) and ERPT (energy redistribution path tracing) are partially sequential because each path of a Markov chain depends on the previous path and is therefor more difficult to parallellize for GPUs than standard Monte Carlo algorithms. This is an image of the new MCMC sampler included in the new Optix SDK, which can be downloaded from http://developer.nvidia.com/object/optix-download.html.

There is also an update on the Kelemen-style Metropolis Light Transport GPU renderer from Dietger van Antwerpen. He has released this new video showing Metropolis light transport with participating media running on the GPU: http://www.youtube.com/watch?v=3Xo0qVT3nxg

This scene is straight from the original Metropolis light transport paper from Veach and Guibas (http://graphics.stanford.edu/papers/metro/metro.pdf). Participating media (like fog, smoke and god rays) are one of the most difficult and compute intensive phenomena to simulate accurately with global illumination, because it is essentially a volumetric effect in which light scattering occurs. Subsurface scattering belongs to the same category of expensive difficult-to-render volumetric effects. The video shows it can now be done in almost real-time with MLT. which is pretty impressive!

Friday, December 24, 2010

Move over OTOY, here comes the new AMD tech demo!

June 2008: Radeon HD 4870 launches with the OTOY/Cinema 2.0/Ruby tech demo featuring voxel raytracing. It can't get much closer to photorealism than this... or can it?

December 2010: Radeon HD 6970 launches with this craptastic tech demo. Talk about progress. Laughable fire effects, crude physics with only a few dozen dynamic objects, pathetic Xbox 1 city model and lighting, uninspired Mecha design. It may be just a tech demo but this is just a disgrace for a high-tech GPU company. Well done AMD! Now where the hell is that Cinema 2.0 Ruby demo you promised dammit? My HD 4890 is almost EOL and already LOL :p

Sunday, December 19, 2010

GPU-accelerated biased and unbiased rendering

Since I've seen the facemeltingly awesome youtube video of Kelemen-style MLT+bidirectional path tracing running on a GPU, I'm quite convinced that most (if not all) unbiased rendering algorithms can be accelerated on the GPU. Here's a list of the most common unbiased algorithms which have been ported successfully to the GPU:

- unidirectional (standard) path tracing: used by Octane, Arion, V-Ray RT GPU, iray, SmallLuxGPU, OptiX, Brigade, Indigo Render, a bunch of renderers integrating iray, etc. Jan Novak is one of the first to report a working implementation of path tracing on the GPU (implemented with CUDA on a GTX 285, https://dip.felk.cvut.cz/browse/pdfcache/novakj8_2009dipl.pdf). The very first paper reporting GPU path tracing is "Stochastic path tracing on consumer graphics cards" from 2008 by Huwe and Hemmerling (implemented in GLSL).
- bidirectional path tracing (BDPT): http://www.youtube.com/watch?v=70uNjjplYzA, I think Jan Novak, Vlastimil Havran and Carsten Dachsbacher made this work as well in their paper "Path regeneration for interactive path tracing"
- Metropolis Light Transport (MLT)+BDPT: http://www.youtube.com/watch?v=70uNjjplYzA
- energy redistribution path tracing (ERPT): http://www.youtube.com/watch?v=c7wTaW46gzA, http://www.youtube.com/watch?v=d9X_PhFIL1o
- (stochastic) progressive photon mapping (SPPM): used by SmallLuxGPU, there's also a GPU-optimised parallellised version on Toshiya Hachisuka's website, CUDA http://www.youtube.com/watch?v=zg9NcCw53iA, OpenCL http://www.youtube.com/watch?v=O5WvidnhC-8

Octane, my fav unbiased GPU renderer, will also implement an MLT-like rendering algorithm in the next verion (beta 2.3 version 6), which is "coming soon". I gathered some interesting quotes from radiance (Octane's main developer) regarding MLT in Octane:

“We are working on a firefly/caustic capable and efficient rendering algorithm, it's not strictly MLT but a heavily modified version of it. Trust me, this is the last big feature we need to implement to have a capable renderer, so it's our highest priority feature to finish.”

“MLT is an algorithm that's much more efficient at rendering complex scenes, not so efficient at simple, directly lit scenes (eg objects in the open). However MLT does sample away the fireflies.”

“The fireflies are a normal side effect of unbiased rendering, they are reflective or refractive caustics. We're working on new algorithms in the next version that will solve this as it will compute these caustics better.”

“they are caustics, long paths with a high contribution, a side effect of unbiased path tracing. MLT will solve this problem which is in development and slated for beta 2.3”

“the pathtracing kernel already does caustics, it's just not very efficient without MLT, which will be in the next 2.3 release.”

“lights (mesh emitters) are hard to find with our current algorithms, rendertimes will severely improve with the new MLT replacement that's coming soon.”

“it will render more efficiently [once] we have portals/MLT/bidir.”

“All exteriors render in a few minutes clean in octane currently. (if you have a decent GPU like a medium range GTX260 or better). Interiors is more difficult, requires MLT and ultimately bidir path tracing. However, with plain brute force pathtracing octane is the same or slightly faster than a MLT/Bidir complex/heavily matured [CPU] engine, which gives good promise for the future, as we're working on those features asap.”

With all unbiased rendering techniques soon possible and greatly accelerated on the GPU, what about GPU acceleration for biased production rendering techniques (such as photon mapping and irradiance caching)? There have been a lot of academic research papers on this subject (e.g. Purcell, Rui Wang and Kun Zhou, Fabianowski and Dingliani, McGuire and Luebke, ...), but since it's a lot trickier to parallellize photon mapping and irradiance caching than unbiased algorithms while still obtaining production quality, it's still not quite ready for integration in commercial software. But this will change very soon imo: on the ompf forum I've found a link to a very impressive video showing very high-quality CUDA-accelerated photon mapping http://www.youtube.com/watch?v=ZTuos2lzQpM.

This is a render of Sponza, 800x800 resolution, rendered in 11.5 seconds on 1 GTX 470! (image taken from http://kaikaiwang.blogspot.com/):

11 seconds for this quality and resolution on just one GPU is pretty amazing if you ask me. I'm sure that further optimizations could bring the rendertime down to 1 second. The video also shows real-time interaction (scale, rotate, move, delete) with objects from the scenery (something that could be extended to support many dynamic objects via HLBVH). I could see this being very useful for real-time production quality global illumination using a hybrid of path tracing for exteriors and photon mapping for interiors, caustics, point lights.

Just like 2010 was the year of GPU-accelerated unbiased rendering, I think 2011 will become the year of heavily GPU-accelerated biased rendering (photon mapping in particular).

Wednesday, December 15, 2010

Real-time Metropolis Light Transport on the GPU: it works!!!!

This is probably the most significant news since the introduction of real-time path tracing on the GPU. I've been wondering for quite a while if MLT (Metropolis Light Transport) would be able to run on current GPU architectures. MLT is a more efficient and more complex algorithm than path tracing for rendering certain scenes which are predominantly indirectly lit (e.g. light coming through a narrow opening, such as a half-closed door, and illuminating a room), a case in which path tracing has much difficulty to find "important" contributing light paths. For this reason, it is the rendering method of choice for professional unbiased renderers like Maxwell Render, Fryrender, Luxrender, Indigo Render and Kerkythea Render.

Dietger van Antwerpen, an IGAD student who co-developed the Brigade path tracer and who also managed to make ERPT (energy distribution ray tracing) run in real-time on a Fermi GPU, has posted two utterly stunning and quite unbelievable videos of his latest progress:

- video 1 showing a comparison between real-time ERPT and path tracing on the GPU:

ERPT on the left, standard path tracing (PT) on the right. Light is coming in from a narrow opening, a scenario in which PT has a hard time to find light paths and converge, because it randomly samples the environment. ERPT shares properties with MLT: once it finds an important light path, it will sample nearby paths via small mutations of the found light path, so convergence is much faster.

- video 2 showing Kelemen-style MLT (an improvement on the original MLT algorithm) running in real-time on the GPU. The video description mentions Kelemen-style MLT on top of bidirectional path tracing (BDPT) with multiple importance sampling, pretty amazing.

Kelemen-MLT after 10 seconds of rendering at 1280x720 on a single GTX 470. The beautiful caustics are possible due to bidirectional path tracing+MLT and are much more difficult to obtain with standard path tracing.

These videos are ultimate proof that current GPUs are capable of more complex rendering algorithms than brute-force standard path tracing and can potentially accelerate the very same algorithms used in the major unbiased CPU renderers. This bodes very well for GPU renderers like Octane (which has its own MLT-like algorithm), V-Ray RT GPU, SmallLuxGPU and iray.

If Dietger decides to implement these in the Brigade path tracer we could be seeing (quasi) noise-free, real-time path traced (or better "real-time BDPT with MLT" traced) games much sooner than expected. Verrrry exciting stuff!! I think some rendering companies would hire this guy instantly.

Friday, December 10, 2010

Voxels again

Just encountered a very nice blog about voxel rendering, sparse voxel octrees and massive procedural terrain rendering: http://procworld.blogspot.com

The author has made some video's of the tech using OpenCL, showing the great detail that can be achieved when using voxels: http://www.youtube.com/watch?v=PzmsCC6hetM, http://www.youtube.com/watch?v=oZ6x_jbZ2GA It does look a bit like the atomontage engine.

Wednesday, December 1, 2010

OnLive just works, even for those darn Europeans!

I think this deserves it's own post. Someone (Anonymous) told me that the OnLive service can be accessed and played from EU countries as well, so I gave it a try and downloaded and installed the tiny OnLive plug-in. To my surprise, I actually got it working on a pretty old PC (just a Pentium 4 at 3GHz). I was flabbergasted. I'm about 6000 miles away from the OnLive servers and it's still running! The quality of the video stream was more than decent and smoother than when I try to decode 720p Youtube videos which my system just cannot handle.

My first impression: I love it, now I'm absolutely positive that this is the very near future for video games. It's a joy to watch others play and to start playing the same game within seconds! I've tried some Borderlands, Splinter Cell Conviction and FEAR2. There is some lag, because I'm about 6000 miles away from the OnLive server (I got a warning during log-in that my connection has huge latency), but I could nevertheless still enjoy the game. About half a second (or less) passes between hitting the shoot key and seeing the gun actually shoot, and when moving your character . I must say though that I got used to the delay after a while, and I anticipated my moves by half a second. My brain notices the delay during the first minutes of play, but I forgot about it after a while and just enjoyed the game. I think that if I can enjoy an OnLive game from 6000 miles away, then US players, who live much closer to the OnLive servers, have got to have an awesome experience. The lag could also be due to my own ancient PC (which is not even dual core) or to the local network infrastructure here in Belgium even though I have a pretty big bandwidth connection. I can't wait until they deploy their EU servers. Image quality is very variable, I guess it's partly because of my PC, which cannot decode the video stream fast enough. FEAR 2 looked very sharp though. The image looks best when you're not moving the camera and just stare at the scene. The recently announced MicroConsole seems to offer very good image quality from what I've read.

I think that cloud gaming will give an enormous boost to the graphics side of games and that photorealistic games will be here much sooner thanks to cloud rendering and it's inherent rendering efficiency (especially when using ray tracing, see the interview with Jules Urbach). My biggest gripe with consoles like Xbox and Playstation is that they stall graphics development for the duration of the console cycle (around 5 years), especially the latest round of consoles. With the exception of Crysis, PC games don't make full use of the latest GPUs which are much more powerful than the consoles. I just ran 3DMark05 some days ago, and it's striking me that this 5-year old benchmark still looks superior than any console game on the market. I truely hope that cloud gaming will get rid of the fixed console hardware and free up game developers (and graphics engineers in particular) to go nuts, because I'm sick of seeing another Unreal Engine 3 powered game.

I also think that OnLive will not be the only player and that there will be fierce competition between several cloud gaming services, each with their own exclusive games. I can imagine a future with multiple cloud gaming providers such as OnLive, OTOY, Gaikai, PlayStation Little Big Cloud, Activision Cloud of Duty, EA Battlefield Cloud of Honor, UbiCloud, MS Red Ringing Cloud of Death (offering Halo: RROD exclusively), Valve Strrream... To succeed they would have to be accessible for free (just like OnLive is now), without monthly subscription fees.

All in all, it's an awesome experience and it's going to open up gaming for the masses and will give a new meaning to the word "video" game. The incredible ease of use (easier than downloading a song from the iTunes Store) will attract vast audiences and for this reason I think it's going to be much bigger than Wii and will completely shake up the next-gen console landscape (Wii2, PS4 and Xbox 2.5/720/RROD2/...). MS, Sony and Nintendo better think twice before releasing a brand new console.

Be it OnLive, OTOY, Gaikai or any other service, I, for one, welcome our new cloud gaming overlords!

Friday, November 26, 2010

Enter the singularity blog

This is just an awesome blog: http://entersingularity.wordpress.com/ (originally located at http://enterthesingularity.blogspot.com/ and moved since June 2010).

The author currently has a job at OnLive, the cloud-gaming service, but before working there he used to write about voxel raytracing, sparse voxel octrees and the like. I assume he's working on some super-secret game graphics technology involving voxels and ray tracing, targeted to run on the cloud. Another main theme of the blog is how the brain works and how and when artificial intelligence could match and even surpass the capacity of the human brain (something which I find even more interesting than real-time path tracing). To achieve this dream (or nightmare, depending on your view), you would need a gigantic amount of computing power and cloud servers will probably first to fulfill that requirement. Could it be that one day OnLive will turn into SkyNet? ;-)

Such a suprahuman intelligence could help scientists think about cures for cancer, stem cell research, nanotechnology, the Palestine-Israelian conflict and could help them understand their wives ;-).