Gaussian Kernel Calculator

Did you ever wonder how some algorithm would perform with a slightly different Gaussian blur kernel? Well than this page might come in handy: just enter the desired standard deviation \sigma and the kernel size n (all units in pixels) and press the “Calculate Kernel” button. You’ll get the corresponding kernel weights for use in a one or two pass blur algorithm in two neat tables below.

Sigma   Kernel Size  

One dimensional Kernel

This kernel is useful for a two pass algorithm: First perform a horizontal blur with the weights below and then perform a vertical blur on the resulting image (or vice versa).

0.06136 0.24477 0.38774 0.24477 0.06136

Two dimensional Kernel

These weights below be used directly in a single pass blur algorithm: n^2 samples per pixel.

0.003765 0.015019 0.023792 0.015019 0.003765
0.015019 0.059912 0.094907 0.059912 0.015019
0.023792 0.094907 0.150342 0.094907 0.023792
0.015019 0.059912 0.094907 0.059912 0.015019
0.003765 0.015019 0.023792 0.015019 0.003765

Analysis & Implementation Details

Below you can find a plot of the continuous distribution function and the discrete kernel approximation. One thing to look out for are the tails of the distribution vs. kernel support: For the current configuration we have 1.24% of the curve’s area outside the discrete kernel. Note that the weights are renormalized such that the sum of all weights is one. Or in other words: the probability mass outside the discrete kernel is redistributed evenly to all pixels within the kernel.

The weights are calculated by numerical integration of the continuous gaussian distribution over each discrete kernel tap. Take a look at the java script source in case you are interested.


Linear Depth

Something that seems to come up again and again is the topic of linear vs. non-linear depth. If you take a look at the standard DirectX projection matrix and do the math for the z component, you’ll end up with something like this

  z' = \frac{z_f}{z_f - z_n} (1 - \frac{z_n}{z})

where z is the depth value before projection, z' is the depth value after projection and z_n, z_f correspond to the near and far planes. So projection actually transforms z into some variation of 1/z. The reason for this is simple: GPUs rasterize primitives in screen space and interpolate attribute data linearly in screen space as well. Linear depth z in view space, however, becomes non-linear after projection and thus cannot be correctly interpolated by simple linear interpolators. Conversely, it turns out that 1/z is linear in screen space. This is actually quite easy to see: Assume a plane in view space

  Ax + By + Cz = D

Perspective projection transforms view space x and y coordinates to

  x' = \frac{x}{z}, \qquad y' = \frac{y}{z}

Inserting these equations into the original plane equation yields

  A x' z + B y' z + C z = D

which gives us

  \frac{1}{z} = \frac{A}{D} x' + \frac{B}{D} y' + \frac{C}{D}

clearly showing that 1/z is a linear function of screen space x' and y'. This is illustrated quite nicely in this blog post by rendering ddx(z') and ddy(z') as color to the screen. The same holds for other generic attributes like texture coordinates: The GPU cannot directly interpolate u and v, but will interpolate u/z and v/z instead. The attribute value will then be reconstructed per pixel by multiplying by z.

Depth Precision

Now that we have established that the value that ends up in the depth buffer is not the depth but rather something related to 1/z, one might ask what kind of effect this will have on depth precision. After all, 1/z is a highly non-linear function that will significantly warp the original depth values. Check out the graph below: I plotted the resulting z' for the view space depth range z \in \{0,\dots,100\} for different near plane values z_n:znwarp2Notice how steep the function is on the first couple of meters. Almost the entire interval z'\in\{0,\dots,0.99\} is spent on the first couple of meters.

In order to test this result empirically I wrote a small program that will sample the range z \in \{z_n,\dots,z_f\} in regular intervals on the GPU, calculate the depth value z' after projection and write it to some depth buffer of choice. The buffer is then read back to the CPU and view space depth is reconstructed for each sample. This allows us to calculate the error of original depth value vs. reconstructed depth value. Here are the results for the formats DXGI_FORMAT_D16_UNORM and DXGI_FORMAT_D32_FLOAT with the following configuration: z_n = 0.1, z_f = 10000:
D16U_D32FNote how the error for DXGI_FORMAT_D16_UNORM quickly approaches ridiculous proportions; 16 bit integer depth in combination with a projective transform is definitely a no go! Here’s another plot to illustrate the error of DXGI_FORMAT_D32_FLOAT in more detail:D32FMuch better, though at the extremes we still get an error of over 100 meters. With some care though, this can greatly reduced: The shape of the hyperbolic z' curve is largely determined by the near plane distance z_n. Even a slight change from z_n=0.1 to z_n=0.25 reduces the maximal error from 1.4\% down to 0.26\%.

I also tested DXGI_FORMAT_D24_UNORM_S8_UINT but the results were so close to DXGI_FORMAT_D32_FLOAT that I can only conclude that the driver internally maps the depth format to 32 bit float. Not that much of a surprise, this is exactly what the the AMD GCN architecture does as well.

Practical Considerations

  • First of all: Make sure that your near plane is as far away from the camera as you can afford it. This will flatten the hyperbolic 1/z curve and provide much better depth precision far away from the viewer.
  • Unless you are in some crazy setting with hundreds of kilometers view distance and you are going for sub centimeter depth resolution, DXGI_FORMAT_D32_FLOAT should be good enough and on modern GPUs should come at no additional cost compared to DXGI_FORMAT_D24_UNORM_S8_UINT.
  • DXGI_FORMAT_D16_UNORM isn’t really a choice for projective transforms. It can be quite valuable for orthographic projections though (for example sun shadow maps), reducing bandwidth by half compared to a 32 bit format.

Linear Depth

And if you really really need linear depth you can write it via the SV_DEPTH semantic in the pixel shader. Beware though, you’ll loose the early Z unless you use the variant SV_DepthGreater, or SV_DepthLessEqual. Check out this blog post for more details. In most cases though I would argue that non linear depth is just fine.


Transform Axis Aligned Bounding Boxes

Axis aligned bounding boxes are quite often used in computer graphics, for example for fast collision detection. Often, you need to transform a given axis aligned bounding box by some matrix and then convert it into an axis aligned bounding box in the resulting coordinate space again. Here’s an efficient algorithm to do so:

public static BoundingBox transformBoundingBox(BoundingBox boundingBox, Matrix m)
    var xa = m.Right * boundingBox.Min.X;
    var xb = m.Right * boundingBox.Max.X;

    var ya = m.Up * boundingBox.Min.Y;
    var yb = m.Up * boundingBox.Max.Y;

    var za = m.Backward * boundingBox.Min.Z;
    var zb = m.Backward * boundingBox.Max.Z;

    return new BoundingBox(
        Vector3.Min(xa, xb) + Vector3.Min(ya, yb) + Vector3.Min(za, zb) + m.Translation,
        Vector3.Max(xa, xb) + Vector3.Max(ya, yb) + Vector3.Max(za, zb) + m.Translation

Note that Matrix.Right returns the Matrix’s first column vector, i.e. (M_{11} M_{21} M_{31} M_{41})^T and Matrix.Up and Matrix.Backward return the second and third column vectors respectively. So looking at the source code above, all you have to do is multiply the bounding box min and max values in each coordinate direction by the corresponding Matrix column vector and then combine them to generate the result. Why does this work?

Let’s think about the definition of an axis aligned bounding box: It can be described by a center position \mathbf{c} = (c_x c_y c_z)^T and offsets for each coordinate axis \mathbf{r} = (r_x r_y r_z)^T. We can then write the box’s delimiting positions as follows:

\mathbf{B} = \left[ \mathbf{\min} \begin{pmatrix} c_x \pm r_x \\ c_y \pm r_y \\ c_z \pm r_z \end{pmatrix}, \mathbf{\max} \begin{pmatrix} c_x \pm r_x \\ c_y \pm r_y \\ c_z \pm r_z \end{pmatrix} \right] = \left[ \mathbf{c} - \mathbf{r}, \mathbf{c} + \mathbf{r} \right]

Note the \mathbf{\min} and \mathbf{\max} operators are applied for each dimension independently. Since each dimension’s offset \mathbf{r} is added and subtracted independently from the other dimensions as well the \mathbf{\min}, \mathbf{\max} operators go over 2*2*2 = 2^3 = 8 corner positions. So, lets transform the positions by some matrix \mathbf{M} and then convert the resulting corners into an axis aligned bounding box \mathbf{B}' again by finding the minimum and maximum values in each coordinate direction:

\mathbf{B}' = \left[\mathbf{\min} \bigg(\mathbf{M}\begin{pmatrix} c_x \pm r_x \\ c_y \pm r_y \\ c_z \pm r_z \\ 1 \end{pmatrix} \bigg), \mathbf{\max} \bigg( \mathbf{M}\begin{pmatrix} c_x \pm r_x \\ c_y \pm r_y \\ c_z \pm r_z \\ 1 \end{pmatrix} \bigg) \right]

In my previous post Rant: Matrix Layouts I pointed out that we can split up matrix vector products into a sum over the product of the vector’s components with individual matrix columns. Let’s try this here, just for the \mathbf{\min} vector \mathbf{B}'_{min} of \mathbf{B}' first:

\mathbf{\min} \bigg(\mathbf{M}\begin{pmatrix} c_x \pm r_x \\ c_y \pm r_y \\ c_z \pm r_z \\ 1 \end{pmatrix}\bigg) = \mathbf{\min} \bigg( \mathbf{M}_{|1}(c_x \pm r_x) + \mathbf{M}_{|2}(c_y \pm r_y) + \mathbf{M}_{|3}(c_z \pm r_z) + \mathbf{M}_{|4}\bigg)

Note how something magic happened here: each component of \mathbf{c} and \mathbf{r} appears only once in the resulting equation and we only have two possible values for each component! Now we just need to split up the \mathbf{\min} operator:

\mathbf{B}_{min} = \mathbf{\min} \bigg( \mathbf{M}_{|1}(c_x \pm r_x) \bigg) + \mathbf{\min} \bigg(\mathbf{M}_{|2}(c_y \pm r_y)\bigg) + \mathbf{\min} \bigg(\mathbf{M}_{|3}(c_z \pm r_z)\bigg) + \mathbf{M}_{|4}

and we are done: We successfully decomposed the minimum over all eight bounding box coordinates into a component-wise minimum and a sum over all dimensions. Obviously, the same procedure holds for the maximum bounding box extent as well. Have another look at the code above again: It performs the very computation mentioned just before.


Frustum Splits

Recently, I have picked up work on one of my older projects again: Bounce. While being quite an elegant demonstration of collision detection via Binary Space Partitioning Trees, it lacks quite a bit on the visual side. So I decided to add some real-time shadows to the scene. I decided to implement cascaded shadow mapping as this is one of the most widely used approaches for rendering real-time shadows in games. I will post my thoughts on this blog and, of course, share the code. So check back every now and then in case you’re interested!

When it comes to cascaded shadow mapping, the first thing you have to do is split the view frustum into multiple parts. Each split will receive its own shadow map based on the idea that splits closer to the viewer cover less area and hence offer higher shadow map resolution. Check out the image blow: The view frustum is partitioned into four splits where each split is rendered in a different color.

But how can you calculate the coordinates of each sub frustum given the current camera? Consider the camera representation used in most games: a view matrix \mathbf{V} which transforms the geometry into view (eye) space and a projection matrix \mathbf{P}, combined with perspective division which projects the resulting view space geometry into clip space. So a world space vertex \mathbf{v} is transformed into it’s clip space position \mathbf{v}' like so:

  \label{test} \tilde{\mathbf{v}} = \mathbf{V} \mathbf{P} \mathbf{v} \qquad \mathbf{v}' = \tilde{\mathbf{v}}/\tilde{v}_w \qquad \qquad (1)

In DirectX clip space is defined as \{-1,\dots,1\} \times \{-1,\dots,1\} \times \{0,\dots,1\}. Being a (scaled) unit cube, it’s really easy to split clip space into sub frustums: Simply pick the corners (\pm 1, \pm 1) and the desired split depths (n,f) and the following points define your axis aligned frustum box:

\mathbf{p}_{min} = \begin{pmatrix}-1\\-1\\n\end{pmatrix} \qquad \mathbf{p}_{max} = \begin{pmatrix}1 \\ 1 \\ f \end{pmatrix}

Note that a little care has to be taken when picking the split depth values, as the distribution of z values in clip space is non-linear. Anyway, so having seen that its easy to define the split frustums in clip space, all we need to do now is convert clip space positions back to world space. This can be done by multiplying the clip space position with the inverse view and projection transforms and subsequently converting the result from homogenious coordinates to regular three dimensional coordinates:

\tilde{\mathbf{v}} = (\mathbf{V} \mathbf{P})^{-1} \mathbf{v}' \qquad \mathbf{v} = \tilde{\mathbf{v}}/\tilde{v}_w

Let’s see some code! The following function computes the corners of a split frustum in world space, given the distances of the near and far planes in clip space:

public IEnumerable<Vector3> splitFrustum(float clipSpaceNear, float clipSpaceFar, 
                                            Matrix viewProjectionInverse)
    var clipCorners = new[]
        new Vector3( -1,  1, clipSpaceNear ),
        new Vector3(  1,  1, clipSpaceNear ), 
        new Vector3(  1, -1, clipSpaceNear ),
        new Vector3( -1, -1, clipSpaceNear ), 
        new Vector3( -1,  1, clipSpaceFar  ),
        new Vector3(  1,  1, clipSpaceFar  ),
        new Vector3(  1, -1, clipSpaceFar  ),
        new Vector3( -1, -1, clipSpaceFar  )

    return clipCorners.Select(v =>
        var vt = Vector4.Transform(v, viewProjectionInverse);
        vt /= vt.W;

        return new Vector3(vt.X, vt.Y, vt.Z);

The only downside of this method is that we need to know the values clipSpaceNear and clipSpaceFar of the near and far plane in clip space – usually you only know them in view space. Not much of an issue though, as we can use formula (1) to convert view space depth into clip space.

float[] viewSpaceDepth = {-50.0f, -500.0f};
var clipSpaceDepth = viewSpaceDepth.Select(c =>
    var d = Vector4.Transform(new Vector3(0, 0, c), camera.projectionMatrix);
    return d.W != 0 ? d.Z / d.W : 0; 

Matrix viewProjInverse = Matrix.Invert(camera.viewMatrix * camera.projectionMatrix);
var frustumCorners = splitFrustum(clipSpaceDepth[0], clipSpaceDepth[1], 

One of the big advantages of this method is the fact that it works with arbitrary projection transforms, like for example orthographic projections as shown in the image below:



Published in the 2011 Siggraph Presentation Spherical Skinning with Dual-Quaternions and QTangents, Crytek proposed a highly efficient way of representing tangent space per vertex. Instead of storing the basis vectors, they store a quaternion representing the tangent space rotation and reconstruct the basis vectors in the shader. Being a simple and straight forward idea, I decided to implement this approach in the anima skinning sample.

As a first step, the tangent space vectors need to be converted into quaternion form. I did so by assembling a rotation matrix that represents the rotation of an arbitrary world space vector into tangent space:

\mathbf{T} = \left(\begin{array}{l}\mathbf{B}_{binormal} \\ \mathbf{B}_{tangent} \\ \mathbf{B}_{normal} \end{array}\right)

where \mathbf{B}_{binormal} denotes the tangent space binormal vector (in row form!) etc. This matrix can then be converted into a quaternion by one of the standard algorithms. Note, however, that special care has to be taken during this step: First of all, \mathbf{T} might not be a proper rotation matrix. It is reasonable to assume that orthonormality is guaranteed, i.e.

\mathbf{T}\mathbf{T}^T = \mathbf{T}^T \mathbf{T} = \mathbf{I}

for the identity matrix \mathbf{I}, but \mathbf{T} might still encode a reflection. If this is the case the matrix to quaternion will fail because unit quaternions cannot encode reflections. Fortunately, the matrix can be converted into a regular rotation by simply picking any basis vector and reflecting it as well. In my case I chose to always reflect the \mathbf{B}_{normal} vector. Have a look at the figure blow which illustrates the case where \mathbf{B}_{binormal} is reflected.

Note that after the reflection of the normal, handedness is restored again. Of course, reconstructing the resulting quaternion yields a tangent space with a flipped normal vector, so we need to un-flip it first before we can use it in any shading algorithms! Thus, we need to store a flag along with our quaternions that indicates if the normal vector need to be flipped after reconstruction. And this is where the smart guys from Crytek scored their points: Realizing that for a given quaternion \mathbf{q} it’s negate -\mathbf{q} represents the exact same rotation (albeit in opposite direction, but that won’t bother us here) we can enforce non negativity of any quaternion element without impact on the reconstructed tangent space. But this also means that we can use the sign of that very component to store our reflection flag! This brings our memory requirements for the whole tangent frame down to 4 floating point values. Not bad! The only thing that can go wrong now is when the chosen quaternion element is zero. Should not be an issue in theory because IEEE 754 makes a distinction between +0 and -0 but GPUs don’t always stick to this rule. In this case we can set the value of this component to a very small bias, say 1e-7. Here’s how I implemented the tangent space matrix to QTangent conversion:

// generate tangent frame rotation matrix
Math::Matrix3x4 tangentFrame(
    binormal.x,    binormal.y,    binormal.z,    0,
    tangent.x,     tangent.y,     tangent.z,     0,
    normal.x,      normal.y,      normal.z,      0

// flip y axis in case the tangent frame encodes a reflection
float scale = tangentFrame.Determinant() < 0 ? 1.0f : -1.0f;[2][0] *= scale;[2][1] *= scale;[2][2] *= scale;

// convert to quaternion
Math::Quaternion tangentFrameQuaternion = tangentFrame;

// make sure we don't end up with 0 as w component
    const float threshold = 0.000001f;
    const float renomalization = sqrt( 1.0f - threshold * threshold );

    if( abs(     { = > 0
                                            ? threshold
                                            : -threshold; *= renomalization; *= renomalization; *= renomalization;

// encode reflection into quaternion's w element by making sign of w negative
// if y axis needs to be flipped, positive otherwise
float qs = (scale<0 &&>0.f) || 
           (scale>0 &&<0) ? -1.f : 1.f; *= qs; *= qs; *= qs; *= qs;

On a side note: As implemented in the code above, reflection properties of a matrix can be detected via the matrix’s determinant, i.e. if a matrix \mathbf{T} encodes a reflection then

det(\mathbf{T}) = -1

holds. In order to obtain the world space tangent frame vectors we need to rotate the tangent frame quaternion into world space first. This can be done by concatenating it with the vertex’s blended bone transform. In the case of dual quaternion bone transforms we simply need to multiply the real part of the bone dual quaternion \hat{\mathbf{q}} = \mathbf{q}_r + \epsilon \mathbf{q}_d with the tangent space quaternion \mathbf{t}

\mathbf{t'} = \mathbf{q}_r * \mathbf{t}

The tangent space vectors can then be reconstructed via a standard quaternion to matrix routine. Note that it is advisable to only reconstruct two of the three basis vectors and recompute the third via a cross product in order to guarantee orthonormality.

float2x3 QuaternionToTangentBitangent( float4 q )
    return float2x3(
      1-2*(q.y*q.y + q.z*q.z),  2*(q.x*q.y + q.w*q.z),    2*(q.x*q.z - q.w*q.y),
      2*(q.x*q.y - q.w*q.z),    1-2*(q.x*q.x + q.z*q.z),  2*(q.y*q.z + q.w*q.x)

float3x3 GetTangentFrame( float4 worldTransform, TangentFrame tangentFrame )
    float4 q = QuaternionMultiply( worldTransform,  tangentFrame.Rotation );
    float2x3 tBt = QuaternionToTangentBitangent( q );

    return float3x3(
        cross(tBt[0],tBt[1]) * (tangentFrame.Rotation.w < 0 ? -1 : 1)