Depth Precision

In a previous post I discussed the use of reciprocal depth 1/z in depth buffer generation. I gave some figures showing the problematic hyperbolical depth value distribution in the depth buffer and the dependence on the near plane.

Let’s expand a bit on that and investigate strategies to better distribute depth values. In the following I will be using a right handed coordinate system, i.e. -z points forward and (as usual) vectors are multiplied from the right. View space depth is denoted z and the near/far planes are z_n and z_f.

Standard Depth

The standard DirectX projection matrix \mathbf{P} as produced by D3DXMatrixPerspectiveFovRH, transforms view space positions \mathbf{v} = (x, y, z) into clip space positions \mathbf{v'}

  \mathbf{v'} = \mathbf{P} \mathbf{v} = \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & \frac{z_f}{z_n-z_f} & \frac{z_n z_f}{z_n-z_f} \\ 0 & 0 & -1 & 0\end{pmatrix} \mathbf{v}

and results in depth buffer values

  z'=\frac{\frac{z_f}{z_n-z_f} z + \frac{z_n z_f}{z_n-z_f}}{-z}

As shown before, this can cause a significant warp of the resulting depth values due to the division by z.

Reverse Depth

Reverse depth aims to better distribute depth values by reversing clip space: Instead of mapping [z_n,z_f] \mapsto [0,1], the projection matrix is adjusted to produce [z_n,z_f] \mapsto [1,0]. This can be achieved by multiplying the projection matrix with a simple ‘z reversal’ matrix, yielding

  \mathbf{v'} = \begin{pmatrix}1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & -1 & 1 \\ 0 & 0 & 1 & 0\end{pmatrix} \mathbf{P} \mathbf{v} = \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & -\frac{z_f}{z_n-z_f}-1& -\frac{z_n z_f}{z_n-z_f} \\ 0 & 0 & -1 & 0\end{pmatrix} \mathbf{v}

The advantage of this mapping is that it’s much better suited to storing depth values in floating point format: Close to the near plane, where similar view depth values are pushed far apart by the hyperbolic depth distribution, not much floating point precision is required. It is thus safe to map these values to the vicinity of 1.0 where the floating point exponent is ‘locked’. Similar values near the far plane on the other hand are compressed to even closer clip space values and thus benefit from the extremely precise range around 0.0. Interestingly this results in the least precision in the middle area of the view space depth range.

Linear depth (i.e W-Buffer)

As the name already suggests, the idea here is to write out the depth value itself, normalized to [0, 1] range:

  z' = z_n + \frac{-z}{z_f-z_n}

Unfortunately, without explicit hardware support, this method causes significant performance overhead as it requires depth export from pixel shaders.

Asymptotic behaviour

For situations where extremely large view distances are required, one can let z_f approach infinity. For standard depth we get

  \lim \limits_{z_f \to \infty} \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & \frac{z_f}{z_n-z_f} & \frac{z_n z_f}{z_n-z_f} \\ 0 & 0 & -1 & 0\end{pmatrix} = \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & -1 & -z_n \\ 0 & 0 & -1 & 0\end{pmatrix}

and for reverse depth

  \lim \limits_{z_f \to \infty} \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & -\frac{z_f}{z_n-z_f}-1& -\frac{z_n z_f}{z_n-z_f} \\ 0 & 0 & -1 & 0\end{pmatrix} = \begin{pmatrix}s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & 0 & z_n \\ 0 & 0 & -1 & 0\end{pmatrix}

Note how the matrices are suddenly free of approximate numerical computations, resulting in less rounding and truncation errors.

Precision Analysis

With all these possibilities, which one will be the fit best for a given scenario? I try to answer this question by comparing depth resolution for each projection type. The idea is simple: View space depth is sampled in regular intervals between z_n and z_f. Each sampled depth value is transformed into clip space and then into the selected buffer format. The next adjacent buffer value is then projected back into view space. This gives the minimum view space distance two objects need to be apart so they don’t map to the same buffer value – hence the minimum separation required to avoid z-fighting.

The following graph overlays depth resolution for the different projection methods. Click the legend on top of the graph to toggle individual curves. The slider on the bottom lets you restrict the displayed depth range. The zoom button resamples the graph for the current depth range. Use the reset button to reset the depth range.

Near Plane Far Plane Buffer Type


Results depend a lot on the buffer type. For floating point buffers, reverse depth clearly beats the other candidates: It has the lowest error rates and is impressively stable w.r.t. extremely close near planes. As the distance between the near and far plane increases z_n-z_f starts to drop more and more mantissa bits of z_n, effectively making the projection converge gracefully to the reverse infinite far plane projection. On top of that, on the AMD GCN architecture floating point comes at the same cost as 24-bit integer depth.

With integer buffers, linear Z would be the method of choice if it wouldn’t entail the massive performance overhead. Reverse depth seems to perform slightly better than standard depth, especially towards the far plane. But both methods share the sensitivity to small near plane values.

Reverse depth on OpenGL

In order to get proper reverse depth working on OpenGL one needs to work around OpenGL’s definition of [-1,1] clip space. Ways to do so would either be via the extension arb_clip_control or a combination of gl_DepthRangedNV and a custom far clipping plane. The result should then be the same as reverse depth on DirectX. I included the Reversed OpenGL curve to visualise the depth resolution if clip space is kept at [-1,1] and the regular window transform glDepthRange(0,1) is used to transform clip space values into buffer values.


Raspberry Pi

So, I got myself a new gadget for christmas: A Raspberry Pi model B. For those of you who are not familiar: It’s basically a credit card sized computer with the computing power of a cell phone. Well, a somewhat slow cellphone to tell the truth. Not really the right device to do any fancy computing. One really cool thing though: the integrated BCM2835 chip has a built-in GPU with support for full HD video encoding/decoding and OpenGL 2.0 ES. And all at the power consumption of about 3.5 Watts. Perfect for a little, always on, media center to stream youtube videos and music or watch movies!

First thing, I loaded XBian, a Raspberry Pi optimized Linux distribution with XBMC preinstalled onto a memory card and booted up. Worked like a charm, out of the box you get a full media center solution with support for video playback and audio streaming and plugins that let you browse youtube and dailymotion. Sweet!

IR control

Up to that point I had used my mouse as input and this was getting a bit tedious: A mouse to control the Raspberry, one remote to control the TV and another one to control the sound bar. Way too many devices with a two year old kid running around ;). So I decided to simplify the system. The Raspberry has eight general purpose input/output (GPIO) ports. They can quite easily be programmed and thus used to control all sorts of electronic circuits. At first I hooked up an IR receiver like shown in this Adafruit tutorial. Once I had configured LIRC this allowed me to control XBMC with any remote control. Next I built an IR sender circuit like described here. This allowed me to send IR signals as well. As a final step I configured my programmable remote to mimic some device unknown to TV and sound bar and used lircrc in combination with irsend to control both. Let me illustrate with an example: The volume up button on my remote sends a volume up command which only the Raspberry can understand. Based on the contents of lircrc, the Raspberry then sends the appropriate volume up command for the sound bar. Likewise, a channel up command will be translated to a command for the TV. This allowed me to get rid of two of the three input devices as everything can now be controlled with one remote only.

I should probably add that I struggled for quite a while to get this to work reliably. Sometimes the TV would not react, sometimes multiple button presses were sent etc. Finally it turned out that the issue was related to the low processing power of the CPU: Under heavy load, lircd would be interrupted by the OS quite often in order to run other tasks and thus mistimed the IR impulses. The issue went away once I increased the priority of the lircd daemon.

More fun stuff

This project was so much fun that I started to to look into something else: What about supervising system and network stats? So I wrote a little python script that would grab internet transfer stats from my router, and temperature and free memory from the on-chip sensors in the Raspberry. The script would run every 5 minutes and log the results to a mysql database. I then installed the apache web server and made a little website that shows the data in some neat charts. Some weeks later I also hooked up a temperature and humidity sensor to one of the free GPIOs.

And as the last project (for now 🙂 ) I built a custom tracker for my cellphone: I configured my phone to send a HTTP POST request with the current location to my public IP address (which I set up via dyndns). A php script then extracts the location and stores it to a database. Another small webpage then displays the data via the google maps API.

stats tracking1

And that’s it for now. Definitely a lot of fun and a nice departure from graphics programming. Note that I’ve left out a lot of the gory details, like the components in electronic circuits, or patching the lirc GPIO driver to fix some problems with IR inference from energy saving lamps. Don’t hesitate to contact me in case you’re interested and I can send you some more details.

WordPress Customization

After my work is done on XBox One game Ryse, I can finally invest some time in my blog again. Yay! For starters I have cleaned up the website code, as initially (due to time constraints) I had made all my changes directly to the wordpress source. Not a good idea, as these changes have to be painfully reapplied every time the software is updated.

Child Themes

A very convenient way to customize a wordpress site is the use of of child themes. The concept is simple: You create a new theme that inherits the settings of an existing theme and overwrite whatever you need to. To do so, simply create a folder for your theme in the wp-content/themes directory and put a file called style.css inside. Here’s the beginning of my style.css:

 Theme Name:     Twenty Eleven Customized
 Author:         Theodor Mader
 Author URI:
 Template:       twentyeleven
 Version:        1.0.0

@import url("../twentyeleven/style.css");

/* =Theme customization starts here
-------------------------------------------------------------- */

Note that the Theme Name and the Template entries are required. You can now overwrite the original theme’s styles by simply adding your code after the import statement.

Syntax Highlighter

I had also made quite some changes to the Syntax Highlighter Evolved plugin to better fit my blog’s style and contents. Fortunately the plugin is nicely customizable and with only a bunch of lines of code you can load a custom style sheet and custom syntax definitions.

In order to get a custom style sheet into syntax highlighter, you simply need to add a filter for syntaxhighlighter_themes and add the name of your style sheet the array of style sheets that gets passed in. Don’t forget to register your stylesheet with WordPress as well, for example via wp_register_style. Syntax definitions (or ‘brushes’ as they are called in Syntax Highlighter) can be added in a similar way via the syntaxhighlighter_brushes filter. Since these are java scripts they also need to be registered with WordPress via wp_register_script. For the sake of simplicity, I added the php code to functions.php of my child theme and do the script/style registration via the init action. Here’s the relevant code from my functions.php:

////////////// syntax highlighter customizations /////////////////

add_action( 'init', 'sh_register_customizations' );
add_filter( 'syntaxhighlighter_themes', 'sh_add_custom_style' );
add_filter( 'syntaxhighlighter_brushes', 'sh_add_custom_brushes' );
// Register with wordpress
function sh_register_customizations()
  $sh_customizations_uri = get_stylesheet_directory_uri() . '/syntaxhighlighter/';
  // customized sh style
  wp_register_style('syntaxhighlighter-theme-default_custom', $sh_customizations_uri . 'shCoreDefaultCustom.css', array('syntaxhighlighter-core'), '0.1');

  // custom sh brushes
  wp_register_script( 'syntaxhighlighter-brush-cg', $sh_customizations_uri . 'shBrushCg.js', array('syntaxhighlighter-core'), '0.1' );
  wp_register_script( 'syntaxhighlighter-brush-cppcustom', $sh_customizations_uri . 'shBrushCppCustom.js', array('syntaxhighlighter-core'), '0.1' );
  wp_register_script( 'syntaxhighlighter-brush-csharpcustom', $sh_customizations_uri . 'shBrushCSharpCustom.js', array('syntaxhighlighter-core'), '0.1' );

function sh_add_custom_style($themes)
  $themes['default_custom'] = 'Default Custom';
  return $themes;

function sh_add_custom_brushes( $brushes )
  $brushes['cg'] = 'cg';
  $brushes['cppcustom'] = 'cppcustom';
  $brushes['csharpcustom'] = 'csharpcustom';
  return $brushes;


Oh, and here’s a neat trick I picked up on another website: You can expand the syntaxhighlighter code boxes by adding for example width: 150%; (on hover) and a transition like transition: width 0.2s ease 0s; to the .syntaxhighlighter style.

.syntaxhighlighter {
    transition: width 0.2s ease 0s;

.syntaxhighlighter:hover {
    width: 150% !important;

Note that this scales up the boxes by a fixed 150% instead of the exact required size but unfortunately I haven’t found a way do to the proper scaling without some form of scripting. You can toggle overflow: visible; on hover but so far I couldn’t figure out how to make this a smooth transition via css alone.


The model renderer v.0.1

Allright, so here I was, having defined a couple of DataConverters like so:

// Used for positions, normals, tangents and bitangents and 3D texcoords
struct aiVector3DConverter : public ArrayDataConverter

// Used for colors
struct aiColor4DConverter : public ArrayDataConverter

// Used for 2D texcoords
struct aiVector3DToFloat2Converter : public ArrayDataConverter

These simple definitions allowed me to convert Vertex positions, normals, Colors and UVs to a DirectX 9 compatible data format. After model loading, I simply create a list of converters for each data stream in the mesh and then gather all vertex elements (via CopyType()) and the actual data (via CopyData()) into the corresponding buffers:

// build vertex declaration
std::vector vertexElements;
    for( unsigned int i=0; i<converters.size(); ++i )
        converters[i]->CopyType( vertexElements );

    vertexElements.push_back( endElement );

    context->Device()->CreateVertexDeclaration( &vertexElements[0], 
                                                &result.m_pVertexDeclaration );

// now create vertex buffer
    const int vertexBufferSize = importedMesh->mNumVertices * result.m_VertexSize;
    context->Device()->CreateVertexBuffer( vertexBufferSize, 0, 0, 
        D3DPOOL_DEFAULT, &result.m_pVertexBuffer, NULL );

    BYTE* vertexData;
    result.m_pVertexBuffer->Lock( 0, 0, reinterpret_cast( &vertexData ), 0 );

    BYTE* curOffset = reinterpret_cast( vertexData );

    for( unsigned int v=0; vmNumVertices; ++v )
       for( unsigned int i=0; i<converters.size(); ++i )
              converters[i]->CopyData( &meshData.mVertexData[v * meshData.mVertexSize], v );

    result.m_VertexCount = importedMesh->mNumVertices;

Done! Simple as that. Only the creation of all these converters is still somewhat messy with a check for existence of every stream type. But well… I figured I won’t overcomplicate stuff.

So, given the vertex declaration and the vertex buffer the next step was to create an index buffer. I won’t bother you with the code here as it is really straight forward. The only part that is worth mentioning is that I decided to use 16bit indices if possible i.e. if the number of vertices is less than 0xFFFF = 65535. This is actually a really simple thing to do but quite effective in saving video memory on lots of models. I’ve rarely come across meshes with more than 65535 vertices per material – at least on consoles.

Almost done now! What was missing now was a simple shader and code to set up the GPU.
In terms of shading I decided to go bare bones:

float4x4 WorldViewProjection : WORLDVIEWPROJECTION;
float4 LightDirection        : LIGHTDIRECTION;

struct VertexShaderInput
    float4 Position : POSITION;
    float3 Normal   : NORMAL;
    float2 TexCoord : TEXCOORD0;

struct VertexShaderOutput
    float4 Position : POSITION0;
    float3 Normal   : TEXCOORD1;
    float2 TexCoord : TEXCOORD0;

VertexShaderOutput Model_VS( VertexShaderInput input )
    VertexShaderOutput result;
    result.Position = mul( float4(,1), WorldViewProjection );
    result.Normal =;
    result.TexCoord = input.TexCoord;

    return result;

float4 Model_PS( VertexShaderOutput input ) : COLOR0
    const float ambient = 0.3f;
    float diffuse = saturate(dot(input.Normal, ) );

    return float4(float3(1,0,0) * (ambient + diffuse),1);

technique Model
    pass P0
        VertexShader = compile vs_2_0 Model_VS();
        PixelShader = compile ps_2_0 Model_PS();

So basically a simple lambert shader with some hard coded material parameters. Loading the shader using D3DX is really simple, a call to D3DXCreateEffectFromFile() is enough. And finally we can add the code for actually rendering the model:

void Model::Render( RenderContext* context )
    for( unsigned int m=0; m<mMeshes.size(); ++m )
        D3DXHANDLE hTechnique = mGetTechniqueByName( "Model" );
        D3DXHANDLE hWorldViewProjection = 
            mesh.m_pEffect->GetParameterBySemantic( NULL, "WORLDVIEWPROJECTION" );
        D3DXHANDLE hLightDirection = 
            mesh.m_pEffect->GetParameterBySemantic( NULL, "LIGHTDIRECTION" );

        context->Device()->SetVertexDeclaration( mesh.m_pVertexDeclaration );
        context->Device()->SetStreamSource( 0, mesh.m_pVertexBuffer, 
            0, mesh.m_VertexSize );
        context->Device()->SetIndices( mesh.m_pIndexBuffer );

        mesh.m_pEffect->SetMatrix( hWorldViewProjection, 
            &(context->GetViewMatrix() * context->GetProjectionMatrix()).data );

        D3DXVECTOR4 lightDirection = D3DXVECTOR4( 0, 1, 0, 1 );
        mesh.m_pEffect->SetVector( hLightDirection, &lightDirection );

        mesh.m_pEffect->SetTechnique( hTechnique );

        UINT cPasses;
        mesh.m_pEffect->Begin( &cPasses, 0 );

        for( unsigned int iPass = 0; iPass < cPasses; iPass++ )                 
            mesh.m_pEffect->BeginPass( iPass );

            HRESULT hr = context->Device()->DrawIndexedPrimitive( D3DPT_TRIANGLELIST, 
                            0, 0, mesh.m_VertexCount, 0, mesh.m_TriangleCount );



    context->Device()->SetStreamSource( 0, NULL, 0, 0 );
    context->Device()->SetIndices( NULL );

And that’s it! Lots of code this time 🙂


Import data conversion

Now that I had a basic application running and managed to import model data, I needed to look into how to get said data into a DirectX compatible format. So, without going into too many details: For each mesh I needed a Vertex Buffer, a Vertex Declaration and an Index Buffer. Since each mesh can come with a variety of different vertex data, I opted to implement a somewhat generic approach: I added an abstraction layer called DataConverter. A DataConverter has an input data type (either a assimp value type or an array of assimp values) and a function Copy() which will convert the data to a desired type and store it into a target array. Data Converters also know about the corresponding DirectX Vertex formats, they can generate a list of D3DVERTEXELEMENT9 elements and copy them to an array via the function CopyType(). Here’s the interface definition for a basic Data Converter

class DataConverter
    D3DDECLUSAGE mUsageType;
    int mUsageIndex;
    int mOffset;

    DataConverter( D3DDECLUSAGE usageType, int usageIndex, int offsetInBytes )
        : mUsageType( usageType )
        , mUsageIndex( usageIndex )
        , mOffset( offsetInBytes )

    virtual int Size() = 0;
    virtual void CopyType( std::vector<D3DVERTEXELEMENT9>& out_Type ) = 0;
    virtual void CopyData( BYTE* destination, int index ) = 0;

In order to make working with assimp value arrays easier I derived an ArrayDataConverter from DataConverter

template< typename T >
class ArrayDataConverter : public DataConverter
    const T* mSourceData;
    const int mSourceSize;

    ArrayDataConverter( D3DDECLUSAGE usageType, int usageIndex, int& offsetInBytes,
            const T* sourceData, int sourceSize )
        : DataConverter( usageType, usageIndex, offsetInBytes )
        , mSourceData( sourceData )
        , mSourceSize( sourceSize ) {}

    const T& GetElement( int index )
        assert( index >= 0 && index < mSourceSize );
        return mSourceData[index];

which basically adds a GetElement() function that lets us retrieve an element from the underlying source data. So, give these two definitions here’s the simple and elegant definition of the

struct aiVector3DToFloat2Converter : public ArrayDataConverter<aiVector3D>
    aiVector3DToFloat2Converter( D3DDECLUSAGE usageType, int usageIndex,
        int& offsetInBytes, const aiVector3D* sourceData, int sourceSize )
    : ArrayDataConverter<aiVector3D>( usageType, usageIndex, offsetInBytes, 
        sourceData, sourceSize ) 

    int Size() { return 2 * sizeof(float); }

    virtual void CopyType( std::vector<D3DVERTEXELEMENT9>& out_Type )
        D3DVERTEXELEMENT9 _result = 
            0, mOffset, D3DDECLTYPE_FLOAT2,
            D3DDECLMETHOD_DEFAULT, mUsageType, mUsageIndex 

        out_Type.push_back( _result );

    void CopyData( BYTE* destination, int elementIndex )
        const aiVector3D& element = GetElement( elementIndex );

        float data[] = { element[0], element[1] };
        memcpy( destination + mOffset, data, Size() );

Some more praise to the Assimp developers: The scene data is stored in a very understandable and easy to access format. Only took me a couple of hours to get that into DirectX compatible buffers. Great work guys!!