"There is nothing so permanent in Washington as a temporary government program."
- Milton Friedman

Framework 4 (Last updated: October 25, 2019)
Framework 3 (Last updated: February 6, 2017)
Framework 2 (Last updated: October 8, 2006)
Framework (Last updated: October 8, 2006)
Libraries (Last updated: September 16, 2004)
Really old framework (Last updated: September 16, 2004)
Order Independent Translucency
Tuesday, December 11, 2007 | Permalink

Executable
Source code
OrderIndependentTranslucency.zip (2.1 MB)

Required:
Direct3D 10
This demo implements order independent translucency using a Stencil Routed A-Buffer, which is explained in this exemplarily short paper. The basic idea is that you hijack MSAA for the purpose of storing multiple incoming fragments in each pixel. Each stencil sample is initially cleared to a separate value and then the geometry is rasterized with multisampling disabled. The stencil test will then route the incoming fragments one by one to each of the available samples, as long as there are unwritten samples available. This allows up to 8 layers to be stored with current hardware. Once the buffer has been filled it is sorted in back-to-front order and blended. The paper suggests using bitonic sort; however, this demo uses odd-even mergesort instead because that requires fewer compare-and-swap operations to be performed (19 instead of 24).

This demo uses D3D10, but it could have benefited from D3D10.1 in at least two ways. First of all, multisampled depth buffers can't be used for texturing in D3D10, so this demo uses a separate render target for this purpose. Also, now the depth bits of the depth-stencil buffer is entirely unused, which is wasteful. Secondly, and probably more important, is that multisampled buffers can't be CopyResource'd in D3D10. Currently a significant chunk of the frame time is consumed just initializing the stencil buffer. A better way to handle this would be to initially set up a stencil clear-buffer just once, and then clear the active stencil buffer by copying that stencil clear-buffer into it. A copy is likely a good deal faster than 8 fullscreen passes with a sample write mask, which is required now.

The advantage of using this technique compared to depth peeling is that you don't need multiple passes. If you want up to 8 layers, you need 8 passes with depth peeling, whereas this technique only needs a single pass, plus a sort pass. The disadvantage is that once the buffer is full fragments will be discarded. If you limit yourself to 8 layers with depth peeling, you'll get the 8 front-most layers, whereas with this technique the layers are destroyed based on the order they arrived, meaning that the front-most layer could be the one that got discarded if you're unlucky, which of course is a lot more disturbing than dropping layers in the back, which may not be noticable at all.

This demo should run on Radeon HD 2000 series and up and GeForce 8000 series and up.