Something that seems to come up again and again is the topic of linear vs. non-linear depth. If you take a look at the standard DirectX projection matrix and do the math for the component, you’ll end up with something like this
where is the depth value before projection, is the depth value after projection and , correspond to the near and far planes. So projection actually transforms into some variation of . The reason for this is simple: GPUs rasterize primitives in screen space and interpolate attribute data linearly in screen space as well. Linear depth in view space, however, becomes non-linear after projection and thus cannot be correctly interpolated by simple linear interpolators. Conversely, it turns out that is linear in screen space. This is actually quite easy to see: Assume a plane in view space
Perspective projection transforms view space x and y coordinates to
Inserting these equations into the original plane equation yields
which gives us
clearly showing that is a linear function of screen space and . This is illustrated quite nicely in this blog post by rendering
ddy(z') as color to the screen. The same holds for other generic attributes like texture coordinates: The GPU cannot directly interpolate and , but will interpolate and instead. The attribute value will then be reconstructed per pixel by multiplying by .
Now that we have established that the value that ends up in the depth buffer is not the depth but rather something related to , one might ask what kind of effect this will have on depth precision. After all, is a highly non-linear function that will significantly warp the original depth values. Check out the graph below: I plotted the resulting for the view space depth range for different near plane values :Notice how steep the function is on the first couple of meters. Almost the entire interval is spent on the first couple of meters.
In order to test this result empirically I wrote a small program that will sample the range in regular intervals on the GPU, calculate the depth value after projection and write it to some depth buffer of choice. The buffer is then read back to the CPU and view space depth is reconstructed for each sample. This allows us to calculate the error of original depth value vs. reconstructed depth value. Here are the results for the formats
DXGI_FORMAT_D32_FLOAT with the following configuration: , :
Note how the error for
DXGI_FORMAT_D16_UNORM quickly approaches ridiculous proportions; 16 bit integer depth in combination with a projective transform is definitely a no go! Here’s another plot to illustrate the error of
DXGI_FORMAT_D32_FLOAT in more detail:Much better, though at the extremes we still get an error of over 100 meters. With some care though, this can greatly reduced: The shape of the hyperbolic curve is largely determined by the near plane distance . Even a slight change from to reduces the maximal error from down to .
I also tested
DXGI_FORMAT_D24_UNORM_S8_UINT but the results were so close to
DXGI_FORMAT_D32_FLOAT that I can only conclude that the driver internally maps the depth format to 32 bit float. Not that much of a surprise, this is exactly what the the AMD GCN architecture does as well.
- First of all: Make sure that your near plane is as far away from the camera as you can afford it. This will flatten the hyperbolic curve and provide much better depth precision far away from the viewer.
- Unless you are in some crazy setting with hundreds of kilometers view distance and you are going for sub centimeter depth resolution,
DXGI_FORMAT_D32_FLOATshould be good enough and on modern GPUs should come at no additional cost compared to
DXGI_FORMAT_D16_UNORMisn’t really a choice for projective transforms. It can be quite valuable for orthographic projections though (for example sun shadow maps), reducing bandwidth by half compared to a 32 bit format.
And if you really really need linear depth you can write it via the
SV_DEPTH semantic in the pixel shader. Beware though, you’ll loose the early Z unless you use the variant
SV_DepthLessEqual. Check out this blog post for more details. In most cases though I would argue that non linear depth is just fine.