Pages

SV_PrimitiveID without the perf hit

  On Nvidia/AMD if you access SV_PrimitiveID in a pixel shader there is a fairly substantial performance hit.

For the scene shown rasterizing the visibility buffer went from 3.35 ms to 5.45 ms by accessing SV_PrimitiveID in the pixel shader. This scene had 30.5 million triangles rasterized per frame. 


The thing is you kind of need primitive ID for visibility buffers...

 It turns out that you can access SV_PrimitiveID without the performance hit on Nvidia if you do a little song and dance involving creating a so-called fast geometry shader.

 You need NVAPI, Nvidia's driver extension library.

 They have a function NvAPI_D3D11_CreateFastGeometryShader you can use instead of the standard D3D11 CreateGeometryShader.

If you feed a GS that follows a restricted set of rules it will produce a very fast GS without the standard problems often associated with the GS.  And oddly enough you can access SV_PrimitiveID in this GS and pass it down to the PS, and what do you know, no performance hit..

 So yes adding a GS that does almost nothing makes the shader complete in 60% of the time.

 

What about AMD?

  An alternative to emitting the primitive ID is to pass down the 3 vertex IDs,  then write those out, but this requires a larger visibility buffer(64 bits instead of 32 bits).

 AMD has driver extensions for accessing any of the 3 vertices from the PS, so you can pass down SV_VertexID and then access and write these out.  With D3D12 this functionality is standard.

 On Nvida I can toggle between the triangle ID vs 3 vertex ID variant and I do not see any performance difference(RTX 2060), so performance appears on par.

Intel? 

Supposedly Intel has no performance issues accessing SV_PrimitiveID in the PS, I do not have an Intel GPU to test on.


Why does SV_PrimitiveID cause such performance issues on Nvidia etc?

 It is unclear to me exactly what is going on, but one thing I've noticed is that vertex cache optimizations(tipsify/forsyth/random) do nothing when SV_PrimitiveID is accessed in the PS on NVidia. 

So perhaps it is somehow disabling the vertex cache.