Depth Precision

In a previous post I discussed the use of reciprocal depth 1/z in depth buffer generation. I gave some figures showing the problematic hyperbolical depth value distribution in the depth buffer and the dependence on the near plane.

Let’s expand a bit on that and investigate strategies to better distribute depth values. In the following I will be using a right handed coordinate system, i.e. -z points forward and (as usual) vectors are multiplied from the right. View space depth is denoted z and the near/far planes are z_n and z_f.

Standard Depth

The standard DirectX projection matrix \mathbf{P} as produced by D3DXMatrixPerspectiveFovRH, transforms view space positions \mathbf{v} = (x, y, z) into clip space positions \mathbf{v'}

  \mathbf{v'} = \mathbf{P} \mathbf{v} = \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & \frac{z_f}{z_n-z_f} & \frac{z_n z_f}{z_n-z_f} \\ 0 & 0 & -1 & 0\end{pmatrix} \mathbf{v}

and results in depth buffer values

  z'=\frac{\frac{z_f}{z_n-z_f} z + \frac{z_n z_f}{z_n-z_f}}{-z}

As shown before, this can cause a significant warp of the resulting depth values due to the division by z.

Reverse Depth

Reverse depth aims to better distribute depth values by reversing clip space: Instead of mapping [z_n,z_f] \mapsto [0,1], the projection matrix is adjusted to produce [z_n,z_f] \mapsto [1,0]. This can be achieved by multiplying the projection matrix with a simple ‘z reversal’ matrix, yielding

  \mathbf{v'} = \begin{pmatrix}1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & -1 & 1 \\ 0 & 0 & 1 & 0\end{pmatrix} \mathbf{P} \mathbf{v} = \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & -\frac{z_f}{z_n-z_f}-1& -\frac{z_n z_f}{z_n-z_f} \\ 0 & 0 & -1 & 0\end{pmatrix} \mathbf{v}

The advantage of this mapping is that it’s much better suited to storing depth values in floating point format: Close to the near plane, where similar view depth values are pushed far apart by the hyperbolic depth distribution, not much floating point precision is required. It is thus safe to map these values to the vicinity of 1.0 where the floating point exponent is ‘locked’. Similar values near the far plane on the other hand are compressed to even closer clip space values and thus benefit from the extremely precise range around 0.0. Interestingly this results in the least precision in the middle area of the view space depth range.

Linear depth (i.e W-Buffer)

As the name already suggests, the idea here is to write out the depth value itself, normalized to [0, 1] range:

  z' = z_n + \frac{-z}{z_f-z_n}

Unfortunately, without explicit hardware support, this method causes significant performance overhead as it requires depth export from pixel shaders.

Asymptotic behaviour

For situations where extremely large view distances are required, one can let z_f approach infinity. For standard depth we get

  \lim \limits_{z_f \to \infty} \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & \frac{z_f}{z_n-z_f} & \frac{z_n z_f}{z_n-z_f} \\ 0 & 0 & -1 & 0\end{pmatrix} = \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & -1 & -z_n \\ 0 & 0 & -1 & 0\end{pmatrix}

and for reverse depth

  \lim \limits_{z_f \to \infty} \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & -\frac{z_f}{z_n-z_f}-1& -\frac{z_n z_f}{z_n-z_f} \\ 0 & 0 & -1 & 0\end{pmatrix} = \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & 0 & z_n \\ 0 & 0 & -1 & 0\end{pmatrix}

Note how the matrices are suddenly free of approximate numerical computations, resulting in less rounding and truncation errors.

Precision Analysis

With all these possibilities, which one will be the fit best for a given scenario? I try to answer this question by comparing depth resolution for each projection type. The idea is simple: View space depth is sampled in regular intervals between z_n and z_f. Each sampled depth value is transformed into clip space and then into the selected buffer format. The next adjacent buffer value is then projected back into view space. This gives the minimum view space distance two objects need to be apart so they don’t map to the same buffer value – hence the minimum separation required to avoid z-fighting.

The following graph overlays depth resolution for the different projection methods. Click the legend on top of the graph to toggle individual curves. The slider on the bottom lets you restrict the displayed depth range. The zoom button resamples the graph for the current depth range. Use the reset button to reset the depth range.

Near Plane Far Plane Buffer Type


Results depend a lot on the buffer type. For floating point buffers, reverse depth clearly beats the other candidates: It has the lowest error rates and is impressively stable w.r.t. extremely close near planes. As the distance between the near and far plane increases z_n-z_f starts to drop more and more mantissa bits of z_n, effectively making the projection converge gracefully to the reverse infinite far plane projection. On top of that, on the AMD GCN architecture floating point comes at the same cost as 24-bit integer depth.

With integer buffers, linear Z would be the method of choice if it wouldn’t entail the massive performance overhead. Reverse depth seems to perform slightly better than standard depth, especially towards the far plane. But both methods share the sensitivity to small near plane values.

Reverse depth on OpenGL

In order to get proper reverse depth working on OpenGL one needs to work around OpenGL’s definition of [-1,1] clip space. Ways to do so would either be via the extension arb_clip_control or a combination of gl_DepthRangedNV and a custom far clipping plane. The result should then be the same as reverse depth on DirectX. I included the Reversed OpenGL curve to visualise the depth resolution if clip space is kept at [-1,1] and the regular window transform glDepthRange(0,1) is used to transform clip space values into buffer values.


Raspberry Pi

So, I got myself a new gadget for christmas: A Raspberry Pi model B. For those of you who are not familiar: It’s basically a credit card sized computer with the computing power of a cell phone. Well, a somewhat slow cellphone to tell the truth. Not really the right device to do any fancy computing. One really cool thing though: the integrated BCM2835 chip has a built-in GPU with support for full HD video encoding/decoding and OpenGL 2.0 ES. And all at the power consumption of about 3.5 Watts. Perfect for a little, always on, media center to stream youtube videos and music or watch movies!

First thing, I loaded XBian, a Raspberry Pi optimized Linux distribution with XBMC preinstalled onto a memory card and booted up. Worked like a charm, out of the box you get a full media center solution with support for video playback and audio streaming and plugins that let you browse youtube and dailymotion. Sweet!

IR control

Up to that point I had used my mouse as input and this was getting a bit tedious: A mouse to control the Raspberry, one remote to control the TV and another one to control the sound bar. Way too many devices with a two year old kid running around ;). So I decided to simplify the system. The Raspberry has eight general purpose input/output (GPIO) ports. They can quite easily be programmed and thus used to control all sorts of electronic circuits. At first I hooked up an IR receiver like shown in this Adafruit tutorial. Once I had configured LIRC this allowed me to control XBMC with any remote control. Next I built an IR sender circuit like described here. This allowed me to send IR signals as well. As a final step I configured my programmable remote to mimic some device unknown to TV and sound bar and used lircrc in combination with irsend to control both. Let me illustrate with an example: The volume up button on my remote sends a volume up command which only the Raspberry can understand. Based on the contents of lircrc, the Raspberry then sends the appropriate volume up command for the sound bar. Likewise, a channel up command will be translated to a command for the TV. This allowed me to get rid of two of the three input devices as everything can now be controlled with one remote only.

I should probably add that I struggled for quite a while to get this to work reliably. Sometimes the TV would not react, sometimes multiple button presses were sent etc. Finally it turned out that the issue was related to the low processing power of the CPU: Under heavy load, lircd would be interrupted by the OS quite often in order to run other tasks and thus mistimed the IR impulses. The issue went away once I increased the priority of the lircd daemon.

More fun stuff

This project was so much fun that I started to to look into something else: What about supervising system and network stats? So I wrote a little python script that would grab internet transfer stats from my router, and temperature and free memory from the on-chip sensors in the Raspberry. The script would run every 5 minutes and log the results to a mysql database. I then installed the apache web server and made a little website that shows the data in some neat charts. Some weeks later I also hooked up a temperature and humidity sensor to one of the free GPIOs.

And as the last project (for now :) ) I built a custom tracker for my cellphone: I configured my phone to send a HTTP POST request with the current location to my public IP address (which I set up via dyndns). A php script then extracts the location and stores it to a database. Another small webpage then displays the data via the google maps API.

stats tracking1

And that’s it for now. Definitely a lot of fun and a nice departure from graphics programming. Note that I’ve left out a lot of the gory details, like the components in electronic circuits, or patching the lirc GPIO driver to fix some problems with IR inference from energy saving lamps. Don’t hesitate to contact me in case you’re interested and I can send you some more details.

Gaussian Kernel Calculator

Did you ever wonder how some algorithm would perform with a slightly different Gaussian blur kernel? Well than this page might come in handy: just enter the desired standard deviation \sigma and the kernel size n (all units in pixels) and press the “Calculate Kernel” button. You’ll get the corresponding kernel weights for use in a one or two pass blur algorithm in two neat tables below.

Sigma   Kernel Size  

One dimensional Kernel

This kernel is useful for a two pass algorithm: First perform a horizontal blur with the weights below and then perform a vertical blur on the resulting image (or vice versa).

0.06136 0.24477 0.38774 0.24477 0.06136

Two dimensional Kernel

These weights below be used directly in a single pass blur algorithm: n^2 samples per pixel.

0.003765 0.015019 0.023792 0.015019 0.003765
0.015019 0.059912 0.094907 0.059912 0.015019
0.023792 0.094907 0.150342 0.094907 0.023792
0.015019 0.059912 0.094907 0.059912 0.015019
0.003765 0.015019 0.023792 0.015019 0.003765

Analysis & Implementation Details

Below you can find a plot of the continuous distribution function and the discrete kernel approximation. One thing to look out for are the tails of the distribution vs. kernel support: For the current configuration we have 1.24% of the curve’s area outside the discrete kernel. Note that the weights are renormalized such that the sum of all weights is one. Or in other words: the probability mass outside the discrete kernel is redistributed evenly to all pixels within the kernel.

The weights are calculated by numerical integration of the continuous gaussian distribution over each discrete kernel tap. Take a look at the java script source in case you are interested.


Linear Depth

Something that seems to come up again and again is the topic of linear vs. non-linear depth. If you take a look at the standard DirectX projection matrix and do the math for the z component, you’ll end up with something like this

  z' = \frac{z_f}{z_f - z_n} (1 - \frac{z_n}{z})

where z is the depth value before projection, z' is the depth value after projection and z_n, z_f correspond to the near and far planes. So projection actually transforms z into some variation of 1/z. The reason for this is simple: GPUs rasterize primitives in screen space and interpolate attribute data linearly in screen space as well. Linear depth z in view space, however, becomes non-linear after projection and thus cannot be correctly interpolated by simple linear interpolators. Conversely, it turns out that 1/z is linear in screen space. This is actually quite easy to see: Assume a plane in view space

  Ax + By + Cz = D

Perspective projection transforms view space x and y coordinates to

  x' = \frac{x}{z}, \qquad y' = \frac{y}{z}

Inserting these equations into the original plane equation yields

  A x' z + B y' z + C z = D

which gives us

  \frac{1}{z} = \frac{A}{D} x' + \frac{B}{D} y' + \frac{C}{D}

clearly showing that 1/z is a linear function of screen space x' and y'. This is illustrated quite nicely in this blog post by rendering ddx(z') and ddy(z') as color to the screen. The same holds for other generic attributes like texture coordinates: The GPU cannot directly interpolate u and v, but will interpolate u/z and v/z instead. The attribute value will then be reconstructed per pixel by multiplying by z.

Depth Precision

Now that we have established that the value that ends up in the depth buffer is not the depth but rather something related to 1/z, one might ask what kind of effect this will have on depth precision. After all, 1/z is a highly non-linear function that will significantly warp the original depth values. Check out the graph below: I plotted the resulting z' for the view space depth range z \in \{0,\dots,100\} for different near plane values z_n:znwarp2Notice how steep the function is on the first couple of meters. Almost the entire interval z'\in\{0,\dots,0.99\} is spent on the first couple of meters.

In order to test this result empirically I wrote a small program that will sample the range z \in \{z_n,\dots,z_f\} in regular intervals on the GPU, calculate the depth value z' after projection and write it to some depth buffer of choice. The buffer is then read back to the CPU and view space depth is reconstructed for each sample. This allows us to calculate the error of original depth value vs. reconstructed depth value. Here are the results for the formats DXGI_FORMAT_D16_UNORM and DXGI_FORMAT_D32_FLOAT with the following configuration: z_n = 0.1, z_f = 10000:
D16U_D32FNote how the error for DXGI_FORMAT_D16_UNORM quickly approaches ridiculous proportions; 16 bit integer depth in combination with a projective transform is definitely a no go! Here’s another plot to illustrate the error of DXGI_FORMAT_D32_FLOAT in more detail:D32FMuch better, though at the extremes we still get an error of over 100 meters. With some care though, this can greatly reduced: The shape of the hyperbolic z' curve is largely determined by the near plane distance z_n. Even a slight change from z_n=0.1 to z_n=0.25 reduces the maximal error from 1.4\% down to 0.26\%.

I also tested DXGI_FORMAT_D24_UNORM_S8_UINT but the results were so close to DXGI_FORMAT_D32_FLOAT that I can only conclude that the driver internally maps the depth format to 32 bit float. Not that much of a surprise, this is exactly what the the AMD GCN architecture does as well.

Practical Considerations

  • First of all: Make sure that your near plane is as far away from the camera as you can afford it. This will flatten the hyperbolic 1/z curve and provide much better depth precision far away from the viewer.
  • Unless you are in some crazy setting with hundreds of kilometers view distance and you are going for sub centimeter depth resolution, DXGI_FORMAT_D32_FLOAT should be good enough and on modern GPUs should come at no additional cost compared to DXGI_FORMAT_D24_UNORM_S8_UINT.
  • DXGI_FORMAT_D16_UNORM isn’t really a choice for projective transforms. It can be quite valuable for orthographic projections though (for example sun shadow maps), reducing bandwidth by half compared to a 32 bit format.

Linear Depth

And if you really really need linear depth you can write it via the SV_DEPTH semantic in the pixel shader. Beware though, you’ll loose the early Z unless you use the variant SV_DepthGreater, or SV_DepthLessEqual. Check out this blog post for more details. In most cases though I would argue that non linear depth is just fine.