Monday, July 30, 2018

Nvidia gearing up to unleash real-time ray tracing to the masses

In the last two months, Nvidia roped in several high profile, world class ray tracing experts (with mostly a CPU ray tracing background):

Matt Pharr

One of the authors of the Physically Based Rendering books (www.pbrt.org, some say it's the bible for Monte Carlo ray tracing). Before joining Nvidia, he was working at Google with Paul Debevec on Daydream VR, light fields and Seurat (https://www.blog.google/products/google-ar-vr/experimenting-light-fields/), none of which took off in a big way for some reason.

Before Google, he worked at Intel on Larrabee, Intel's failed attempt at making a GPGPU for real-time ray tracing and rasterisation which could compete with Nvidia GPUs) and ISPC, a specialised compiler intended to extract maximum parallelism from the new Intel chips with AVX extensions. He described his time at Intel in great detail on his blog: http://pharr.org/matt/blog/2018/04/30/ispc-all.html (sounds like an awful company to work for).

Intel also bought Neoptica, Matt's startup, which was supposed to research new and interesting rendering techniques for hybrid CPU/GPU chip architectures like the PS3's Cell


Ingo Wald

Pioneering researcher in the field of real-time ray tracing from the Saarbrücken computer graphics group in Germany, who later moved to Intel and the university of Utah to work on a very high performance CPU based ray tracing frameworks such as Embree (used in Corona Render and Cycles) and Ospray.

His PhD thesis "Real-time ray tracing and interactive global illumination" from 2004, describes a real-time GI renderer running on a cluster of commodity PCs and hardware accelerated ray tracing (OpenRT) on a custom fixed function ray tracing chip (SaarCOR).

Ingo contributed a lot to the development of high quality ray tracing acceleration structures (built with the surface area heuristic).


Eric Haines

Main author of the famous Real-time Rendering blog, who worked until recently for Autodesk. He also used to maintain the Real-time Raytracing Realm and Ray Tracing News


What connects these people is that they all have a passion for real-time ray tracing running in their blood, so having them all united under one roof is bound to give fireworks.

With these recent hires and initiatives such as RTX (Nvidia's ray tracing API), it seems that Nvidia will be pushing real-time ray tracing into the mainstream really soon. I'm really excited to finally see it all come together. I'm pretty sure that ray tracing will very soon be everywhere and its quality and ease-of-use will soon displace rasterisation based technologies (it's also the reason why I started this blog exactly ten years ago).






Senior Real Time Ray Tracing Engineer
NVIDIA, Santa Clara, CA, US

Job description

Are you a real-time rendering engineer looking to work on real-time ray tracing to redefine the look of video games and professional graphics applications? Are you a ray tracing expert looking to transform real-time graphics as we lead the convergence with film? Do you feel at home in complex video game codebases built on the latest GPU hardware and GPU software APIs before anybody else gets to try them?

At NVIDIA we are developing the most forward-looking real-time rendering technology combining traditional graphics techniques with real-time ray tracing enabled by NVIDIA's RTX technology. We work at all levels of the stack, from the hardware and driver software, to the engine and application level code. This allows us to take on problems that others can only dream of solving at this point

We are looking for Real Time Rendering Software Engineers who are passionate about pushing the limits of what is possible with the best GPUs and who share our forward-looking vision of real-time rendering using real-time ray tracing.

In this position you will work with some of the world leading real-time ray tracing and rendering experts, developer technology engineers and GPU system software engineers. Your work will impact a number of products being worked on at NVIDIA and outside NVIDIA. These include the NVIDIA Drive Constellation autonomous vehicle simulator, NVIDIA Isaac virtual simulator for robotics, and NVIDIA Holodeck collaborative design virtual environment. Outside NVIDIA our work is laying the foundation for future video games and other rendering applications using real-time ray tracing. The first example of this impact is the NVIDIA GameWorks Ray Tracing denoising modules and much of the technology featured in our NVIDIA RTX demos at GDC 2018.


What You Will Be Doing
  • Implementing new rendering techniques in a game engine using real-time ray tracing with NVIDIA RTX technology 
  • Improving the performance and quality of techniques you or others developed 
  • Ensuring that the rendering techniques are robust and work well for the content needs of products using them 

What We Need To See
  • Strong knowledge of C++ 
  • BS/MS or higher degree in Computer Science or related field with 5+ years of experience 
  • Up to date knowledge of real-time rendering and offline rendering algorithms and research 
  • Experience with ray tracing in real-time or offline 
  • Knowledge of the GPU Graphics Pipeline and GPU architecture 
  • Experience with GPU Graphics and Compute programming APIs such as Direct3D 11, Direct3D 12, DirectX Raytracing, Vulkan, OpenGL, CUDA, OpenCL or OptiX 
  • Experience writing shader code in HLSL or GLSL for these APIS. 
  • Experience debugging, profiling and optimizing rendering code on GPUs 
  • Comfortable with a complex game engine codebase, such as Unreal Engine 4, Lumberyard, CryEngine or Unity 
  • Familiar with the math commonly used in real-time rendering 
  • Familiar with multi-threaded programming techniques 
  • Can do attitude, with the will to dive into existing code and do what it takes to accomplish your job 
  • Ability to work well with others in a team of deeply passionate individuals who respect each other

Sunday, July 22, 2018

Accelerating path tracing by using the BVH as multiresolution geometry

Before continuing the tutorial series, let's have a look at a simple but effective way to speed up path tracing. The idea is quite simple: like an octree, a bounding volume hierarchy (BVH) can double as both a ray tracing acceleration structure and a way to represent the scene geometry at multiple levels of detail (multi-resolution geometry representation). Specifically the axis-aligned bounding boxes (AABB) of the BVH nodes at different depths in the tree serve as a more or less crude approximation of the geometry.

Low detail geometry enables much faster ray intersections and can be useful when light effects don't require full geometric accuracy, for example in the case of motion blur, glossy (blurry) reflections, soft shadows, ambient occlusion and global illumination with diffuse bounced lighting. Especially when geometry is not directly visible in the view frustum or in specular (mirror-like) reflections, using geometry proxies can provide a significant speedup (depending on the fault tolerance) at an almost imperceptible and negligible loss in quality.

Advantages of using the BVH itself as multi-resolution LOD geometry representation:
  • doesn't require an additional scene voxelisation step (the BVH itself provides the LOD): less memory hungry
  • skips expensive triangle intersection when possible
  • performs only ray/box intersections (as opposed to having a mix of ray/triangle and ray/box intersections) which is more efficient on the GPU (avoids thread divergence) 
  • BVH is stored in the GPU's cached texture memory (which is faster than global memory which should therefore store the triangles)
  • BVH nodes can store extra attributes like smoothed normals, interpolated colours and on-the-fly generated GI
(Note: AFAIK low level access to the acceleration structure is not provided by API's like OptiX/RTX and DXR, this has to be written in CUDA, ISPC or OpenCL)

The renderer determines the appropriate level of detail based on the distance from the camera for primary rays or on the distance from the ray origin and the ray type for secondary rays (glossy/reflection, shadow, AO or GI rays). The following screenshots show the bounding boxes of the BVH nodes from depth 1 (depth 0 is the rootnode) up to depth 12:

BVH level 1 (BVH level 0 is just the bunny's bounding box)
BVH level 2
BVH level 3
BVH level 4
BVH level 5
BVH level 6
BVH level 7
BVH level 8
BVH level 9
BVH level 10
BVH level 11
BVH level 12 (this level contains mostly inner BVH nodes, but also a few leafnodes)
The screenshot below shows the bottom-most BVH level (i.e. leafnodes only, hence some holes are apparent):

Visualizing the BVH leafnodes (bottom most BVH level) 
Normals are axis aligned, but can be precomputed per AABB vertex (and stored at low precision) by averaging the normals of the AABBs it contains, with the leafnodes averaging the normals of their triangles.

TODO upload code to github or alternaive non ms repo and post link, propose fixes to fill holes, present benchmark results (8x speedup), get more timtams 

Friday, June 1, 2018

Real-time path tracing on a 40 megapixel screen

The Blue Brain Project is a Switzerland based computational neuroscience project which aims to demystify how the brain works by simulating a biologically accurate brain using a state-of-the-art supercomputer. The simulation runs at multiple scales and goes from the whole brain level down to the tiny molecules which transport signals from one cell to another (neurotransmitters). The knowledge gathered from such an ultra-detailed simulation can be applied to advance neuroengineering and medical fields.

To visualize these detailed brain simulations, we have been working on a high performance rendering engine, aptly named "Brayns". Brayns uses raytracing to render massively complex scenes comprised of trillions of molecules interacting in real-time on a supercomputer. The core ray tracing intersection kernels in Brayns are based on Intel's Embree and Ospray high performance ray tracing libraries, which are optimised to render on recent Intel CPUs (such as the Skylake architecture). These CPUs  basically are a GPU in CPU disguise (as they are based on Intel's defunct Larrabee GPU project), but can render massive scientific scenes in real-time as they can address over a terabyte of RAM. What makes these CPUs ultrafast at ray tracing is a neat feature called AVX-512 extensions, which can run several ray tracing calculations in parallel (in combination with ispc), resulting in blazingly fast CPU ray tracing performance which rivals that of a GPU and even beats it when the scene becomes very complex. 

Besides using Intel's superfast ray tracing kernels, Brayns has lots of custom code optimisations which allows it to render a fully path traced scene in real-time. These are some of the features of Brayns:
  • hand optimised BVH traversal and geometry intersection kernels
  • real-time path traced diffuse global illumination
  • Optix real-time AI accelerated denoising
  • HDR environment map lighting
  • explicit direct lighting (next event estimation)
  • quasi-Monte Carlo sampling
  • volume rendering
  • procedural geometry
  • signed distance fields raymarching 
  • instancing, allowing to visualize billions of dynamic molecules in real-time
  • stereoscopic omnidirectional 3D rendering
  • efficient loading and rendering of multi-terabyte datasets
  • linear scaling across many nodes
  • optimised for real-time distributed rendering on a cluster with high speed network interconnection
  • ultra-low latency streaming to high resolution display walls and VR caves
  • modular architecture which makes it ideal for experimenting with new rendering techniques
  • optional noise and gluten free rendering
Below is a screenshot of an early real-time path tracing test on a 40 megapixel curved screen powered by seven 4K projectors: 

Real-time path traced scene on a 8 m by 3 m (25 by 10 ft) semi-cylindrical display,
powered by seven 4K projectors (40 megapixels in total)

Seeing this scene projected lifesize in photorealistic detail on a 180 degree stereoscopic 3D screen and interacting with it in real-time is quite a breathtaking experience. Having 3D molecules zooming past the observer will be the next milestone. I haven't felt this thrilled about path tracing in quite some time.



Technical/Medical/Scientific 3D artists wanted 


We are currently looking for technical 3D artists to join our team to produce immersive neuroscientific 3D content. If this sounds interesting to you, get in touch by emailing me at sam.lapere@live.be

Friday, December 22, 2017

Freedom of noise: Nvidia releases OptiX 5.0 with real-time AI denoiser

2018 will be bookmarked as a turning point for Monte Carlo rendering due to the wide availability of fast, high quality denoising algorithms, which can be attributed for a large part to Nvidia Research: Nvidia just released OptiX 5.0 to developers, which contains a new GPU accelerated "AI denoiser" which works as post-processing filter.



In contrast to traditional denoising filters, this new denoiser was trained using machine learning on a database of thousands of rendered image pairs (using both the noisy and noise-free renders of the same scene) providing the denoiser with a "memory": instead of calculating the reconstructed image from scratch (as a regular noise filter would do), it "remembers" the solution from having encountered similar looking noisy input scenes during the machine learning phase and makes a best guess, which is often very close to the converged image but incorrect (although the guesses progressively get better as the image refines and more data is available). By looking up the solution in its memory, the AI denoiser thus bypasses most of the costly calculations needed for reconstructing the image and works pretty much in real-time as a result.

The OptiX 5.0 SDK contains a sample program of a simple path tracer with the denoiser running on top (as a post-process). The results are nothing short of stunning: noise disappears completely, even difficult indirectly lit surfaces like refractive (glass) objects and shadowy areas clear up remarkably fast and the image progressively get closer to the ground truth. 

The OptiX denoiser works great for glass and dark, indirectly lit areas

The denoiser is based on the Nvidia research paper "Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder". The relentless Karoly Zsolnai from Two-minute papers made an excellent video about this paper:



While in general the denoiser does a fantastic job, it's not yet optimised to deal with areas that converge fast, and in some instances overblurs and fails to preserve texture detail as shown in the screen grab below. The blurring of texture detail improves over time with more iterations, but perhaps this initial overblurring can be solved with more training samples for the denoiser:

Overblurring of textures
The denoiser is provided free for commercial use (royalty-free), but requires an Nvidia GPU. It works with both CPU and GPU rendering engines and is already implemented in Iray (Nvidia's own GPU renderer), V-Ray (by Chaos Group), Redshift Render and Clarisse (a CPU based renderer for VFX by Isotropix).

Some videos of the denoiser in action in Optix, V-Ray, Redshift and Clarisse:

Optix 5.0: youtu.be/l-5NVNgT70U



Iray: youtu.be/yPJaWvxnYrg

This video shows the denoiser in action in Iray and provides a high level explanation of the deep learning algorithm behind the OptiX/Iray denoiser:



V-Ray 4.0: youtu.be/nvA4GQAPiTc




Redshift: youtu.be/ofcCQdIZAd8 (and a post from Redshift's Panos explaining the implementation in Redshift)


ClarisseFX: youtu.be/elWx5d7c_DI



Other renderers like Cycles and Corona already have their own built-in denoisers, but will probably benefit from the OptiX denoiser as well (especially Corona which was acquired by Chaos Group in September 2017).

The OptiX team has indicated that they are researching an optimised version of this filter for use in interactive to real-time photorealistic rendering, which might find its way into game engines. Real-time noise-free photorealistic rendering is tantalisingly close.

Sunday, July 9, 2017

Towards real-time path tracing: An Efficient Denoising Algorithm for Global Illumination

July is a great month for rendering enthusiasts: there's of course Siggraph, but the most exciting conference is High Performance Graphics, which focuses on (real-time) ray tracing. One of the more interesting sounding papers is titled: "Towards real-time path tracing: An Efficient Denoising Algorithm for Global Illumination" by Mara, McGuire, Bitterli and Jarosz, which was released a couple of days ago. The paper, video and source code can be found at


Abstract 
We propose a hybrid ray-tracing/rasterization strategy for realtime rendering enabled by a fast new denoising method. We factor global illumination into direct light at rasterized primary surfaces and two indirect lighting terms, each estimated with one pathtraced sample per pixel. Our factorization enables efficient (biased) reconstruction by denoising light without blurring materials. We demonstrate denoising in under 10 ms per 1280×720 frame, compare results against the leading offline denoising methods, and include a supplement with source code, video, and data.

While the premise of the paper sounds incredibly exciting, the results are disappointing. The denoising filter does a great job filtering almost all the noise (apart from some noise which is still visible in reflections), but at the same it kills pretty much all the realism that path tracing is famous for, producing flat and lifeless images. Even the first Crysis from 10 years ago (the first game with SSAO) looks distinctly better. I don't think applying such aggressive filtering algorithms to a path tracer will convince game developers to make the switch to path traced rendering anytime soon. A comparison with ground truth reference images (rendered to 5000 samples or more) is also lacking from some reason. 

At the same conference, a very similar paper will be presented titled "Spatiotemporal Variance-Guided Filtering: Real-Time Reconstruction for Path-Traced Global Illumination". 

Abstract 
We introduce a reconstruction algorithm that generates a temporally stable sequence of images from one path-per-pixel global illumination. To handle such noisy input, we use temporal accumulation to increase the effective sample count and spatiotemporal luminance variance estimates to drive a hierarchical, image-space wavelet filter. This hierarchy allows us to distinguish between noise and detail at multiple scales using luminance variance.  
Physically-based light transport is a longstanding goal for real-time computer graphics. While modern games use limited forms of ray tracing, physically-based Monte Carlo global illumination does not meet their 30 Hz minimal performance requirement. Looking ahead to fully dynamic, real-time path tracing, we expect this to only be feasible using a small number of paths per pixel. As such, image reconstruction using low sample counts is key to bringing path tracing to real-time. When compared to prior interactive reconstruction filters, our work gives approximately 10x more temporally stable results, matched references images 5-47% better (according to SSIM), and runs in just 10 ms (+/- 15%) on modern graphics hardware at 1920x1080 resolution.
It's going to be interesting to see if the method in this paper produces more convincing results that the other paper. Either way HPG has a bunch more interesting papers which are worth keeping an eye on.

UPDATE (16 July): Christoph Schied from Nvidia and KIT, emailed me a link to the paper's preprint and video at http://cg.ivd.kit.edu/svgf.php Thanks Christoph!

Video screengrab:


I'm not convinced by the quality of filtered path traced rendering at 1 sample per pixel, but perhaps the improvements in spatiotemporal stability of this noise filter can be quite helpful for filtering animated sequences at higher sample rates.

UPDATE (23 July) There is another denoising paper out from Nvidia: "Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder" which uses machine learning to reconstruct the image.


Abstract 
We describe a machine learning technique for reconstructing image se- quences rendered using Monte Carlo methods. Our primary focus is on reconstruction of global illumination with extremely low sampling budgets at interactive rates. Motivated by recent advances in image restoration with deep convolutional networks, we propose a variant of these networks better suited to the class of noise present in Monte Carlo rendering. We allow for much larger pixel neighborhoods to be taken into account, while also improving execution speed by an order of magnitude. Our primary contri- bution is the addition of recurrent connections to the network in order to drastically improve temporal stability for sequences of sparsely sampled input images. Our method also has the desirable property of automatically modeling relationships based on auxiliary per-pixel input channels, such as depth and normals. We show signi cantly higher quality results compared to existing methods that run at comparable speeds, and furthermore argue a clear path for making our method run at realtime rates in the near future.

Sunday, May 21, 2017

Practical light field rendering tutorial with Cycles

This week Google announced "Seurat", a novel surface lightfield rendering technology which would enable "real-time cinema-quality, photorealistic graphics" on mobile VR devices, developed in collaboration with ILMxLab:


The technology captures all light rays in a scene by pre-rendering it from many different viewpoints. During runtime, entirely new viewpoints are created by interpolating those viewpoints on-the-fly resulting in photoreal reflections and lighting in real-time (http://www.roadtovr.com/googles-seurat-surface-light-field-tech-graphical-breakthrough-mobile-vr/).

At almost the same time, Disney released a paper called "Real-time rendering with compressed animated light fields", demonstrating the feasibility of rendering a Pixar quality 3D movie in real-time where the viewer can actually be part of the scene and walk in between scene elements or characters (according to a predetermined camera path):


Light field rendering in itself is not a new technique and has actually been around for more than 20 years, but has only recently become a viable rendering technique. The first paper was released at Siggraph 1996 ("Light field rendering" by Mark Levoy and Pat Hanrahan) and the method has since been incrementally improved by others. The Stanford university compiled an entire archive of light fields to accompany the Siggraph paper from 1996 which can be found at http://graphics.stanford.edu/software/lightpack/lifs.html. A more up-to-date archive of photography-based light fields can be found at http://lightfield.stanford.edu/lfs.html

One of the first movies that showed a practical use for light fields is The Matrix from 1999, where an array of cameras firing at the same time (or in rapid succession) made it possible to pan around an actor to create a super slow motion effect ("bullet time"):

Bullet time in The Matrix (1999)

Rendering the light field

Instead of attempting to explain the theory behind light fields (for which there are plenty of excellent online sources), the main focus of this post is to show how to quickly get started with rendering a synthetic light field using Blender Cycles and some open-source plug-ins. If you're interested in a crash course on light fields, check out Joan Charmant's video tutorial below, which explains the basics of implementing a light field renderer:


The following video demonstrates light fields rendered with Cycles:



Rendering a light field is actually surprisingly easy with Blender's Cycles and doesn't require much technical expertise (besides knowing how to build the plugins). For this tutorial, we'll use a couple of open source plug-ins:

1) The first one is the light field camera grid add-on for Blender made by Katrin Honauer and Ole Johanssen from the Heidelberg University in Germany: 


This plug-in sets up a camera grid in Blender and renders the scene from each camera using the Cycles path tracing engine. Good results can be obtained with a grid of 17 by 17 cameras with a distance of 10 cm between neighbouring cameras. For high quality, a 33-by-33 camera grid with an inter-camera distance of 5 cm is recommended.

3-by-3 camera grid with their overlapping frustrums

2) The second tool is the light field encoder and WebGL based light field viewer, created by Michal Polko, found at https://github.com/mpk/lightfield (build instructions are included in the readme file).

This plugin takes in all the images generated by the first plug-in and compresses them by keeping some keyframes and encoding the delta in the remaining intermediary frames. The viewer is WebGL based and makes use of virtual texturing (similar to Carmack's mega-textures) for fast, on-the-fly reconstruction of new viewpoints from pre-rendered viewpoints (via hardware accelerated bilinear interpolation on the GPU).


Results and Live Demo

A live online demo of the light field with the dragon can be seen here: 


You can change the viewpoint (within the limits of the original camera grid) and refocus the image in real-time by clicking on the image.  




I rendered the Stanford dragon using a 17 by 17 camera grid and distance of 5 cm between adjacent cameras. The light field was created by rendering the scene from 289 (17x17) different camera viewpoints, which took about 6 minutes in total (about 1 to 2 seconds rendertime per 512x512 image on a good GPU). The 289 renders are then highly compressed (for this scene, the 107 MB large batch of 289 images was compressed down to only 3 MB!). 

A depth map is also created at the same time an enables on-the-fly refocusing of the image, by interpolating information from several images, 

A later tutorial will add a bit more freedom to the camera, allowing for rotation and zooming.

Wednesday, January 11, 2017

OpenCL path tracing tutorial 3: OpenGL viewport, interactive camera and defocus blur

Just a link to the source code on Github for now, I'll update this post with a more detailed description when I find a bit more time:



 Part 1 Setting up an OpenGL window

https://github.com/straaljager/OpenCL-path-tracing-tutorial-3-Part-1




Part 2 Adding an interactive camera, depth of field and progressive rendering

https://github.com/straaljager/OpenCL-path-tracing-tutorial-3-Part-2



Thanks to Erich Loftis and Brandon Miles for useful tips on improving the generation of random numbers in OpenCL to avoid the distracting artefacts (showing up as a sawtooth pattern) when using defocus blur (still not perfect but much better than before).

The next tutorial will cover rendering of triangles and triangle meshes.

Monday, November 28, 2016

Wanted: GPU rendering developers

I'm working for an international company with very large (<Trump voice>"YUUUUUGE"<\Trump voice>) industry partners.

We are currently looking for excellent developers with experience in GPU rendering (path tracing) for a new project.

Our ideal candidates have either a:
  • Bachelor in Computer Science, Computer/Software Engineering or Physics with a minimum of 2 years of work experience in a relevant field, or
  • Master in Computer Science, Computer/Software Engineering or Physics, or
  • PhD in a relevant field
and a strong interest in physically based rendering and ray tracing.


Self-taught programmers are encouraged to apply if they meet the following requirements:
  • you breathe rendering and have Monte Carlo simulations running through your blood
  • you have a copy of PBRT (www.pbrt.org, version 3 was released just last week) on your bedside table
  • provable experience working with open source rendering frameworks such as PBRT, LuxRender, Cycles, AMD RadeonRays or with a commercial renderer will earn you extra brownie points
  • 5+ years of experience with C++
  • experience with CUDA or OpenCL
  • experience with version control systems and working on large projects
  • proven rendering track record (publications, Github projects, blog)

Other requirements:
  • insatiable hunger to innovate
  • a "can do" attitude
  • strong work ethic and focus on results
  • continuous self-learner
  • work well in a team
  • work independently and able to take direction
  • ability to communicate effectively
  • comfortable speaking English
  • own initiatives and original ideas are highly encouraged
  • willing to relocate to New Zealand

What we offer:
  • unique location in one of the most beautiful and greenest countries in the world
  • be part of a small, high-performance team 
  • competitive salary
  • jandals, marmite and hokey pokey ice cream

For more information, contact me at sam.lapere@live.be

If you are interested, send your CV and cover letter to sam.lapere@live.be. Applications will close on 16 December or when we find the right people. (update: spots are filling up quickly so we advanced the closing date with five days)

Monday, November 14, 2016

OpenCL path tracing tutorial 2: path tracing spheres

This tutorial consists of two parts: the first part will describe how to ray trace one sphere using OpenCL, while the second part covers path tracing of a scene made of spheres. The tutorial will be light on ray tracing/path tracing theory (there are plenty of excellent resources available online such as Scratch-a-Pixel) and will focus instead on the practical implementation of rendering algorithms in OpenCL.The end result will be a rendered image featuring realistic light effects such as indirect lighting, diffuse colour bleeding and soft shadows, all achieved with just a few lines of code:



Part 1: Ray tracing a sphere

Computing a test image on the OpenCL device

The host (CPU) sets up the OpenCL environment and launches the OpenCL kernel which will be executed on the OpenCL device (GPU or CPU) in parallel. Each work item (or thread) on the device will calculate one pixel of the image. There will thus be as many work items in the global pool as there are pixels in the image. Each work item has a unique ID which distinguishes from all other work items in the global pool of threads and which is obtained with get_global_id(0)

The X- and Y-coordinates of each pixel can be computed by using that pixel's unique work item ID:
  • x-coordinate: divide by the image width and take the remainder
  • y-coordinate: divide by the image width
By remapping the x and y coordinates from the [0 to width] range for x and [0 to height] range for y to the range [0 - 1] for both, and plugging those values in the red and green channels repsectively yields the following gradient image (the image is saved in ppm format which can be opened with e.g. IrfanView of Gimp):


The OpenCL code to generate this image:


1
2
3
4
5
6
7
8
9
__kernel void render_kernel(__global float3* output, int width, int height)
{
 const int work_item_id = get_global_id(0); /* the unique global id of the work item for the current pixel */
 int x = work_item_id % width; /* x-coordinate of the pixel */
 int y = work_item_id / width; /* y-coordinate of the pixel */
 float fx = (float)x / (float)width; /* convert int to float in range [0-1] */
 float fy = (float)y / (float)height; /* convert int to float in range [0-1] */
 output[work_item_id] = (float3)(fx, fy, 0); /* simple interpolated colour gradient based on pixel coordinates */
}

Now let's use the OpenCL device for some ray tracing.


Ray tracing a sphere with OpenCL

We first define a Ray and a Sphere struct in the OpenCL code:

A Ray has 
  • an origin in 3D space (3 floats for x, y, z coordinates) 
  • a direction in 3D space (3 floats for the x, y, z coordinates of the 3D vector)
A Sphere has 
  • a radius
  • a position in 3D space (3 floats for x, y, z coordinates), 
  • an object colour (3 floats for the Red, Green and Blue channel) 
  • an emission colour (again 3 floats for each of the RGB channels)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
struct Ray{
 float3 origin;
 float3 dir;
};

struct Sphere{
 float radius;
 float3 pos;
 float3 emi;
 float3 color;
};

Camera ray generation

Rays are shot from the camera (which is in a fixed position for this tutorial) through an imaginary grid of pixels into the scene, where they intersect with 3D objects (in this case spheres). For each pixel in the image, we will generate one camera ray (also called primary rays, view rays or eye rays) and follow or trace it into the scene. For camera rays, the ray origin is the camera position and the ray direction is the vector connecting the camera and the pixel on the screen.

Source: Wikipedia


The OpenCL code for generating a camera ray:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
struct Ray createCamRay(const int x_coord, const int y_coord, const int width, const int height){

 float fx = (float)x_coord / (float)width;  /* convert int in range [0 - width] to float in range [0-1] */
 float fy = (float)y_coord / (float)height; /* convert int in range [0 - height] to float in range [0-1] */

 /* calculate aspect ratio */
 float aspect_ratio = (float)(width) / (float)(height);
 float fx2 = (fx - 0.5f) * aspect_ratio;
 float fy2 = fy - 0.5f;

 /* determine position of pixel on screen */
 float3 pixel_pos = (float3)(fx2, -fy2, 0.0f);

 /* create camera ray*/
 struct Ray ray;
 ray.origin = (float3)(0.0f, 0.0f, 40.0f); /* fixed camera position */
 ray.dir = normalize(pixel_pos - ray.origin);

 return ray;
}




Ray-sphere intersection

To find the intersection of a ray with a sphere, we need the parametric equation of a line, which denotes the distance from the ray origin to the intersection point along the ray direction with the parameter "t"

intersection point = ray origin + ray direction * t

The equation of a sphere follows from the Pythagorean theorem in 3D (all points on the surface of a sphere are located at a distance of radius r from its center): 

(sphere surface point - sphere center)2 = radius2 

In the case of a sphere centered at the origin (with coordinates [0,0,0]), the vector [sphere surface point - sphere center] reduces to the coordinates of a point on the sphere's surface (the intersection point). Combining both equations then gives

(ray origin + ray direction * t)2 = radius2

Expanding this equation in a quadratic equation of the form ax2 + bx + c = 0 where
  • a = (ray direction) . (ray direction)  
  • b = 2 * (ray direction) . (ray origin to sphere center) 
  • c = (ray origin to sphere center) . (ray origin to sphere center) - radius2 
yields solutions for t (the distance to the point where the ray intersects the sphere) given by the quadratic formula −b ± √  b2 − 4ac / 2a (where b2 - 4ac is called the discriminant).

Depending on whether the determinant is negative, zero or positive, there can be zero (ray misses sphere), one (ray just touches the sphere at one point) or two solutions (ray fully intersects the sphere at two points) respectively. The distance t can be positive (intersection in front of ray origin) or negative (intersection behind ray origin). The details of the mathematical derivation are explained in this Scratch-a-Pixel article.

The ray-sphere intersection algorithm is optimised by omitting the "a" coefficient in the quadratic formula, because its value is the dot product of the normalised ray direction with itself which equals 1. Taking the square root of the discriminant (an expensive function) can only be performed when the discriminant is non-negative.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
bool intersect_sphere(const struct Sphere* sphere, const struct Ray* ray, float* t)
{
 float3 rayToCenter = sphere->pos - ray->origin;

 /* calculate coefficients a, b, c from quadratic equation */

 /* float a = dot(ray->dir, ray->dir); // ray direction is normalised, dotproduct simplifies to 1 */ 
 float b = dot(rayToCenter, ray->dir);
 float c = dot(rayToCenter, rayToCenter) - sphere->radius*sphere->radius;
 float disc = b * b - c; /* discriminant of quadratic formula */

 /* solve for t (distance to hitpoint along ray) */

 if (disc < 0.0f) return false;
 else *t = b - sqrt(disc);

 if (*t < 0.0f){
  *t = b + sqrt(disc);
  if (*t < 0.0f) return false; 
 }

 else return true;
}


Scene initialisation

For simplicity, in this first part of the tutorial the scene will be initialised on the device in the kernel function (in the second part the scene will be initialised on the host and passed to OpenCL which is more flexible and memory efficient, but also requires to be more careful with regards to memory alignment and the use of memory address spaces). Every work item will thus have a local copy of the scene (in this case one sphere).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
__kernel void render_kernel(__global float3* output, int width, int height)
{
 const int work_item_id = get_global_id(0); /* the unique global id of the work item for the current pixel */
 int x_coord = work_item_id % width; /* x-coordinate of the pixel */
 int y_coord = work_item_id / width; /* y-coordinate of the pixel */

 /* create a camera ray */
 struct Ray camray = createCamRay(x_coord, y_coord, width, height);

 /* create and initialise a sphere */
 struct Sphere sphere1;
 sphere1.radius = 0.4f;
 sphere1.pos = (float3)(0.0f, 0.0f, 3.0f);
 sphere1.color = (float3)(0.9f, 0.3f, 0.0f);

 /* intersect ray with sphere */
 float t = 1e20;
 intersect_sphere(&sphere1, &camray, &t);

 /* if ray misses sphere, return background colour 
 background colour is a blue-ish gradient dependent on image height */
 if (t > 1e19){ 
  output[work_item_id] = (float3)(fy * 0.1f, fy * 0.3f, 0.3f);
  return;
 }

 /* if ray hits the sphere, it will return the sphere colour*/
 output[work_item_id] = sphere1.color;
}



Running the ray tracer 

Now we've got everything we need to start ray tracing! Let's begin with a plain colour sphere. When the ray misses the sphere, the background colour is returned:


A more interesting sphere with cosine-weighted colours, giving the impression of front lighting.


To achieve this effect we need to calculate the angle between the ray hitting the sphere surface and the normal at that point. The sphere normal at a specific intersection point on the surface is just the normalised vector (with unit length) going from the sphere center to that intersection point.

1
2
3
4
5
        float3 hitpoint = camray.origin + camray.dir * t;
 float3 normal = normalize(hitpoint - sphere1.pos);
 float cosine_factor = dot(normal, camray.dir) * -1.0f;
 
 output[work_item_id] = sphere1.color * cosine_factor;


Adding some stripe pattern by multiplying the colour with the sine of the height:


Screen-door effect using sine functions for both x and y-directions


Showing the surface normals (calculated in the code snippet above) as colours:



Source code

https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-1-Raytracing-a-sphere


Download demo (works on AMD, Nvidia and Intel)

The executable demo will render the above images.

https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-1-Raytracing-a-sphere/releases/tag/1.0



Part 2: Path tracing spheres

Very quick overview of ray tracing and path tracing

The following section covers the background of the ray tracing process in a very simplified way, but should be sufficient to understand the code in this tutorial. Scratch-a-Pixel provides a much more detailed explanation of ray tracing.  

Ray tracing is a general term that encompasses ray casting, Whitted ray tracing, distribution ray tracing and path tracing. So far, we have only traced rays from the camera (so called "camera rays", "eye rays" or "primary rays") into the scene, a process called ray casting, resulting in plainly coloured images with no lighting. In order to achieve effects like shadows and reflections, new rays must be generated at the points where the camera rays intersect with the scene. These secondary rays can be shadow rays, reflection rays, transmission rays (for refractions), ambient occlusion rays or diffuse interreflection rays (for indirect lighting/global illumination). For example, shadow rays used for direct lighting are generated to point directly towards a light source while reflection rays are pointed in (or near) the direction of the reflection vector. For now we will skip direct lighting to generate shadows and go straight to path tracing, which is strangely enough easier to code, creates more realistic and prettier pictures and is just more fun.

In (plain) path tracing, rays are shot from the camera and bounce off the surface of scene objects in a random direction (like a high-energy bouncing ball), forming a chain of random rays connected together into a path. If the path hits a light emitting object such as a light source, it will return a colour which depends on the surface colours of all the objects encountered so far along the path, the colour of the light emitters, the angles at which the path hit a surface and the angles at which the path bounced off a surface. These ideas form the essence of the "rendering equation", proposed in a paper with the same name by Jim Kajiya in 1986.

Since the directions of the rays in a path are generated randomly, some paths will hit a light source while others won't, resulting in noise ("variance" in statistics due to random sampling). The noise can be reduced by shooting many random paths per pixel (= taking many samples) and averaging the results.


Implementation of (plain) path tracing in OpenCL       

The code for the path tracer is based on smallpt from Kevin Beason and is largely the same as the ray tracer code from part 1 of this tutorial, with some important differences on the host side:

- the scene is initialised on the host (CPU) side, which requires a host version of the Sphere struct. Correct memory alignment in the host struct is very important to avoid shifting of values and wrongly initialised variables in the OpenCL struct, especially when  using OpenCL's built-in data types such as float3 and float4. If necessary, the struct should be padded with dummy variables to ensure memory alignment (the total size of the struct must be a multiple of the size of float4).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
struct Sphere
{
 cl_float radius;
 cl_float dummy1;   
 cl_float dummy2;
 cl_float dummy3;
 cl_float3 position;
 cl_float3 color;
 cl_float3 emission;
};

- the scene (an array of spheres) is copied from the host to the OpenCL device into global memory (using CL_MEM_READ_WRITE) or constant memory (using CL_MEM_READ_ONLY

1
2
3
4
5
6
7
8
9
// initialise scene
 const int sphere_count = 9;
 Sphere cpu_spheres[sphere_count];
 initScene(cpu_spheres);

 // Create buffers on the OpenCL device for the image and the scene
 cl_output = Buffer(context, CL_MEM_WRITE_ONLY, image_width * image_height * sizeof(cl_float3));
 cl_spheres = Buffer(context, CL_MEM_READ_ONLY, sphere_count * sizeof(Sphere));
 queue.enqueueWriteBuffer(cl_spheres, CL_TRUE, 0, sphere_count * sizeof(Sphere), cpu_spheres);

- explicit memory management: once the scene is on the device, its pointer can be passed on to other device functions preceded by the keyword "__global" or "__constant".

- the host code automatically determines the local size of the kernel work group (the number of work items or "threads" per work group) by calling the OpenCL function kernel.getWorkGroupInfo(device)


The actual path tracing function

- iterative path tracing function: since OpenCL does not support recursion, the trace() function traces paths iteratively (instead of recursively) using a loop with a fixed number of bounces (iterations), representing path depth.

- each path starts off with an "accumulated colour" initialised to black and a "mask colour" initialised to pure white. The mask colour "collects" surface colours along its path by multiplication. The accumulated colour accumulates light from emitters along its path by adding emitted colours multiplied by the mask colour.

- generating random ray directions: new rays start at the hitpoint and get shot in a random direction by sampling a random point on the hemisphere above the surface hitpoint. For each new ray, a local orthogonal uvw-coordinate system and two random numbers are generated: one to pick a random value on the horizon for the azimuth, the other for the altitude (with the zenith being the highest point)

- diffuse materials: the code for this tutorial only supports diffuse materials, which reflect incident light almost uniformly in all directions (in the hemisphere above the hitpoint)

- cosine-weighted importance sampling: because diffuse light reflection is not truly uniform, the light contribution from rays that are pointing away from the surface plane and closer to the surface normal is greater. Cosine-weighted importance sampling favours rays that are pointing away from the surface plane by multiplying their colour with the cosine of the angle between the surface normal and the ray direction.

- while ray tracing can get away with tracing only one ray per pixel to render a good image (more are needed for anti-aliasing and blurry effects like depth-of-field and glossy reflections), the inherently noisy nature of path tracing requires tracing of many paths per pixel (samples per pixel) and averaging the results to reduce noise to an acceptable level.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
float3 trace(__constant Sphere* spheres, const Ray* camray, const int sphere_count, const int* seed0, const int* seed1){

 Ray ray = *camray;

 float3 accum_color = (float3)(0.0f, 0.0f, 0.0f);
 float3 mask = (float3)(1.0f, 1.0f, 1.0f);

 for (int bounces = 0; bounces < 8; bounces++){

  float t;   /* distance to intersection */
  int hitsphere_id = 0; /* index of intersected sphere */

  /* if ray misses scene, return background colour */
  if (!intersect_scene(spheres, &ray, &t, &hitsphere_id, sphere_count))
   return accum_color += mask * (float3)(0.15f, 0.15f, 0.25f);

  /* else, we've got a hit! Fetch the closest hit sphere */
  Sphere hitsphere = spheres[hitsphere_id]; /* version with local copy of sphere */

  /* compute the hitpoint using the ray equation */
  float3 hitpoint = ray.origin + ray.dir * t;
  
  /* compute the surface normal and flip it if necessary to face the incoming ray */
  float3 normal = normalize(hitpoint - hitsphere.pos); 
  float3 normal_facing = dot(normal, ray.dir) < 0.0f ? normal : normal * (-1.0f);

  /* compute two random numbers to pick a random point on the hemisphere above the hitpoint*/
  float rand1 = 2.0f * PI * get_random(seed0, seed1);
  float rand2 = get_random(seed0, seed1);
  float rand2s = sqrt(rand2);

  /* create a local orthogonal coordinate frame centered at the hitpoint */
  float3 w = normal_facing;
  float3 axis = fabs(w.x) > 0.1f ? (float3)(0.0f, 1.0f, 0.0f) : (float3)(1.0f, 0.0f, 0.0f);
  float3 u = normalize(cross(axis, w));
  float3 v = cross(w, u);

  /* use the coordinte frame and random numbers to compute the next ray direction */
  float3 newdir = normalize(u * cos(rand1)*rand2s + v*sin(rand1)*rand2s + w*sqrt(1.0f - rand2));

  /* add a very small offset to the hitpoint to prevent self intersection */
  ray.origin = hitpoint + normal_facing * EPSILON;
  ray.dir = newdir;

  /* add the colour and light contributions to the accumulated colour */
  accum_color += mask * hitsphere.emission; 

  /* the mask colour picks up surface colours at each bounce */
  mask *= hitsphere.color; 
  
  /* perform cosine-weighted importance sampling for diffuse surfaces*/
  mask *= dot(newdir, normal_facing); 
 }

 return accum_color;
}



A screenshot made with the code above (also see the screenshot at the top of this post). Notice the colour bleeding (bounced colour reflected from the floor onto the spheres), soft shadows and lighting coming from the background.



Source code

https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-2-Path-tracing-spheres


Downloadable demo (for AMD, Nvidia and Intel platforms, Windows only)

https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-2-Path-tracing-spheres/releases/tag/1.0


Useful resources

- Scratch-a-pixel is an excellent free online resource to learn about the theory behind ray tracing and path tracing. Many code samples (in C++) are also provided. This article gives a great introduction to global illumination and path tracing.

- smallpt by Kevin Beason is a great little CPU path tracer in 100 lines code. It of formed the inspiration for the Cornell box scene and for many parts of the OpenCL code 


Up next

The next tutorial will cover the implementation of an interactive OpenGL viewport with a progressively refining image and an interactive camera with anti-aliasing and depth-of-field.