As mentioned in my previous post, cascaded shadow mapping is a technique to shadow large areas at reasonable memory and run-time cost. The basic idea is simple: Split the view frustum into several sub frustums and render a shadow map for each split. Since, naturally, splits closer to the viewer cover less area in world space (i.e. under perspective projection) you get better shadow map resolution close to the camera and less resolution further away. Check out the image below: Each split is visualized in a different color and the corresponding shadow map has been filled with a block pattern. All shadow maps are of the same resolution. You can clearly see how shadow maps closer to the viewer (red and green) allocate much higher detail per world space unit than shadow maps further away from the viewer (blue and yellow). This gives us a highly desired quality: Lots of resolution/details close to the viewer and less far away – without the need for a huge shadow map.

So how can we get this done? First we have to take the same steps as in regular shadow mapping: Define the shadow casting light i.e. light direction and position and the corresponding projection into the shadow map. In my case I chose to simulate sun light which can be represented by a simple orthogonal projection. I create the shadow transform like so:

// shadow view matrix
mShadowView = Matrix.CreateLookAt(mLightPosition, mLightPosition + mLightDirection, Vector3.Up);

// determine shadow projection based on model bounding sphere
{

var min = center - new Vector3(arena.BoundingSphere.Radius);
var max = center + new Vector3(arena.BoundingSphere.Radius);

mShadowProjection = Matrix.CreateOrthographicOffCenter( min.X, max.X, min.Y, max.Y, -max.Z, -min.Z);
}



So the shadow view matrix is a simple look at matrix placed at the position of the light mLightPosition and looking towards mLightDirection. As Up vector I just use the y-axis, which works fine as long as mLightDirection is not pointing straight upwards. Using this definition I can then transform the model’s bounding sphere center into shadow space (i.e. compute the position of the bounding sphere center relative to the light) and get the minimum axis aligned bounding box (in shadow space) encompassing the sphere by just adding and subtracting the bounding sphere radius along each coordinate axis. The resulting vectors min and max give us thus the shadow frustum we need to project into our shadow maps in order to cover the whole model. Note that view space looks along the negative z axis so we need to negate the Z values.

The resulting matrix mShadowTransform gives us thus the projection of world space coordinates into our shadow map. Now, in cascaded shadow mapping we have not one, but multiple shadows maps. We therefore need to define multiple versions of mShadowTransform, one for each shadow split. And we also need to align the splits to the view frustum. So let’s start with the view frustum: Lets say we split the frustum at constant distances from the viewer, say the first split should range from -1 to -100, the second split from -100 to -300 and the third split from -300 to -600. Given these split distances we can use the functionality described in the post Frustum Splits to figure out the world space positions of the sub frustum corners for each split. Once we know the world space positions for each split we’re almost done: We just need to adjust the projection matrix to focus on the the split frustum.

// determine clip space split distances
var splitDistances = new[] {-1, -100.0f, -300.0f, -600 }
.Select(d =>
{
var c = Vector4.Transform(new Vector3(0, 0, d), camera.projectionMatrix);
return c.Z / c.W;
})
.ToArray();

// determine split projections
var splitData = Enumerable.Range(0, mNumShadowSplits).Select(i =>
{
var n = splitDistances[i];
var f = splitDistances[i + 1];

// get frustum split corners and transform into shadow space
var frustumCorners = splitFrustum(n, f)

var min = frustumCorners.Aggregate((v1, v2) => Vector3.Min(v1, v2));
var max = frustumCorners.Aggregate((v1, v2) => Vector3.Max(v1, v2));

// determine the min/max z values based on arena bounding box
var minZ = -arenaBB.Max.Z;
var maxZ = -arenaBB.Min.Z;

// return orthographic projection
return new
{
Distance = f,
Projection = Matrix.CreateOrthographicOffCenter(min.X, max.X, min.Y, max.Y, minZ, maxZ)
};
}).ToArray();

// compute final split transforms


This gives us an array ShadowSplitProjections of matrices where each matrix projects the scene into a shadow map, from the view of the light. Note that I also store the clip space split distances, measured in view space from the camera’s center of projection. Have a look at the images below:

The images on the left show the view frustum split into three sub frustums, while the images on the right show the shadow transform’s frustum for each split.

This concludes the first step of implementing cascaded shadow mapping. In the next post I’ll show how the finally render the shadow maps and how to project them onto the scene. Hope you find it as interesting as I do 🙂

# Transform Axis Aligned Bounding Boxes

Axis aligned bounding boxes are quite often used in computer graphics, for example for fast collision detection. Often, you need to transform a given axis aligned bounding box by some matrix and then convert it into an axis aligned bounding box in the resulting coordinate space again. Here’s an efficient algorithm to do so:

public static BoundingBox transformBoundingBox(BoundingBox boundingBox, Matrix m)
{
var xa = m.Right * boundingBox.Min.X;
var xb = m.Right * boundingBox.Max.X;

var ya = m.Up * boundingBox.Min.Y;
var yb = m.Up * boundingBox.Max.Y;

var za = m.Backward * boundingBox.Min.Z;
var zb = m.Backward * boundingBox.Max.Z;

return new BoundingBox(
Vector3.Min(xa, xb) + Vector3.Min(ya, yb) + Vector3.Min(za, zb) + m.Translation,
Vector3.Max(xa, xb) + Vector3.Max(ya, yb) + Vector3.Max(za, zb) + m.Translation
);


Note that Matrix.Right returns the Matrix’s first column vector, i.e. $(M_{11} M_{21} M_{31} M_{41})^T$ and Matrix.Up and Matrix.Backward return the second and third column vectors respectively. So looking at the source code above, all you have to do is multiply the bounding box min and max values in each coordinate direction by the corresponding Matrix column vector and then combine them to generate the result. Why does this work?

Let’s think about the definition of an axis aligned bounding box: It can be described by a center position $\mathbf{c} = (c_x c_y c_z)^T$ and offsets for each coordinate axis $\mathbf{r} = (r_x r_y r_z)^T$. We can then write the box’s delimiting positions as follows:

$\mathbf{B} = \left[ \mathbf{\min} \begin{pmatrix} c_x \pm r_x \\ c_y \pm r_y \\ c_z \pm r_z \end{pmatrix}, \mathbf{\max} \begin{pmatrix} c_x \pm r_x \\ c_y \pm r_y \\ c_z \pm r_z \end{pmatrix} \right] = \left[ \mathbf{c} - \mathbf{r}, \mathbf{c} + \mathbf{r} \right]$

Note the $\mathbf{\min}$ and $\mathbf{\max}$ operators are applied for each dimension independently. Since each dimension’s offset $\mathbf{r}$ is added and subtracted independently from the other dimensions as well the $\mathbf{\min}$, $\mathbf{\max}$ operators go over $2*2*2 = 2^3 = 8$ corner positions. So, lets transform the positions by some matrix $\mathbf{M}$ and then convert the resulting corners into an axis aligned bounding box $\mathbf{B}'$ again by finding the minimum and maximum values in each coordinate direction:

$\mathbf{B}' = \left[\mathbf{\min} \bigg(\mathbf{M}\begin{pmatrix} c_x \pm r_x \\ c_y \pm r_y \\ c_z \pm r_z \\ 1 \end{pmatrix} \bigg), \mathbf{\max} \bigg( \mathbf{M}\begin{pmatrix} c_x \pm r_x \\ c_y \pm r_y \\ c_z \pm r_z \\ 1 \end{pmatrix} \bigg) \right]$

In my previous post Rant: Matrix Layouts I pointed out that we can split up matrix vector products into a sum over the product of the vector’s components with individual matrix columns. Let’s try this here, just for the $\mathbf{\min}$ vector $\mathbf{B}'_{min}$ of $\mathbf{B}'$ first:

$\mathbf{\min} \bigg(\mathbf{M}\begin{pmatrix} c_x \pm r_x \\ c_y \pm r_y \\ c_z \pm r_z \\ 1 \end{pmatrix}\bigg) = \mathbf{\min} \bigg( \mathbf{M}_{|1}(c_x \pm r_x) + \mathbf{M}_{|2}(c_y \pm r_y) + \mathbf{M}_{|3}(c_z \pm r_z) + \mathbf{M}_{|4}\bigg)$

Note how something magic happened here: each component of $\mathbf{c}$ and $\mathbf{r}$ appears only once in the resulting equation and we only have two possible values for each component! Now we just need to split up the $\mathbf{\min}$ operator:

$\mathbf{B}_{min} = \mathbf{\min} \bigg( \mathbf{M}_{|1}(c_x \pm r_x) \bigg) + \mathbf{\min} \bigg(\mathbf{M}_{|2}(c_y \pm r_y)\bigg) + \mathbf{\min} \bigg(\mathbf{M}_{|3}(c_z \pm r_z)\bigg) + \mathbf{M}_{|4}$

and we are done: We successfully decomposed the minimum over all eight bounding box coordinates into a component-wise minimum and a sum over all dimensions. Obviously, the same procedure holds for the maximum bounding box extent as well. Have another look at the code above again: It performs the very computation mentioned just before.

# Frustum Splits

Recently, I have picked up work on one of my older projects again: Bounce. While being quite an elegant demonstration of collision detection via Binary Space Partitioning Trees, it lacks quite a bit on the visual side. So I decided to add some real-time shadows to the scene. I decided to implement cascaded shadow mapping as this is one of the most widely used approaches for rendering real-time shadows in games. I will post my thoughts on this blog and, of course, share the code. So check back every now and then in case you’re interested!

When it comes to cascaded shadow mapping, the first thing you have to do is split the view frustum into multiple parts. Each split will receive its own shadow map based on the idea that splits closer to the viewer cover less area and hence offer higher shadow map resolution. Check out the image blow: The view frustum is partitioned into four splits where each split is rendered in a different color.

But how can you calculate the coordinates of each sub frustum given the current camera? Consider the camera representation used in most games: a view matrix $\mathbf{V}$ which transforms the geometry into view (eye) space and a projection matrix $\mathbf{P}$, combined with perspective division which projects the resulting view space geometry into clip space. So a world space vertex $\mathbf{v}$ is transformed into it’s clip space position $\mathbf{v}'$ like so:

$\label{test} \tilde{\mathbf{v}} = \mathbf{V} \mathbf{P} \mathbf{v} \qquad \mathbf{v}' = \tilde{\mathbf{v}}/\tilde{v}_w \qquad \qquad (1)$

In DirectX clip space is defined as $\{-1,\dots,1\} \times \{-1,\dots,1\} \times \{0,\dots,1\}$. Being a (scaled) unit cube, it’s really easy to split clip space into sub frustums: Simply pick the corners $(\pm 1, \pm 1)$ and the desired split depths $(n,f)$ and the following points define your axis aligned frustum box:

$\mathbf{p}_{min} = \begin{pmatrix}-1\\-1\\n\end{pmatrix} \qquad \mathbf{p}_{max} = \begin{pmatrix}1 \\ 1 \\ f \end{pmatrix}$

Note that a little care has to be taken when picking the split depth values, as the distribution of z values in clip space is non-linear. Anyway, so having seen that its easy to define the split frustums in clip space, all we need to do now is convert clip space positions back to world space. This can be done by multiplying the clip space position with the inverse view and projection transforms and subsequently converting the result from homogenious coordinates to regular three dimensional coordinates:

$\tilde{\mathbf{v}} = (\mathbf{V} \mathbf{P})^{-1} \mathbf{v}' \qquad \mathbf{v} = \tilde{\mathbf{v}}/\tilde{v}_w$

Let’s see some code! The following function computes the corners of a split frustum in world space, given the distances of the near and far planes in clip space:

public IEnumerable<Vector3> splitFrustum(float clipSpaceNear, float clipSpaceFar,
Matrix viewProjectionInverse)
{
var clipCorners = new[]
{
new Vector3( -1,  1, clipSpaceNear ),
new Vector3(  1,  1, clipSpaceNear ),
new Vector3(  1, -1, clipSpaceNear ),
new Vector3( -1, -1, clipSpaceNear ),
new Vector3( -1,  1, clipSpaceFar  ),
new Vector3(  1,  1, clipSpaceFar  ),
new Vector3(  1, -1, clipSpaceFar  ),
new Vector3( -1, -1, clipSpaceFar  )
};

return clipCorners.Select(v =>
{
var vt = Vector4.Transform(v, viewProjectionInverse);
vt /= vt.W;

return new Vector3(vt.X, vt.Y, vt.Z);
});
}


The only downside of this method is that we need to know the values clipSpaceNear and clipSpaceFar of the near and far plane in clip space – usually you only know them in view space. Not much of an issue though, as we can use formula $(1)$ to convert view space depth into clip space.

float[] viewSpaceDepth = {-50.0f, -500.0f};
var clipSpaceDepth = viewSpaceDepth.Select(c =>
{
var d = Vector4.Transform(new Vector3(0, 0, c), camera.projectionMatrix);
return d.W != 0 ? d.Z / d.W : 0;
}).ToArray();

Matrix viewProjInverse = Matrix.Invert(camera.viewMatrix * camera.projectionMatrix);
var frustumCorners = splitFrustum(clipSpaceDepth[0], clipSpaceDepth[1],
viewProjInverse).ToArray();


One of the big advantages of this method is the fact that it works with arbitrary projection transforms, like for example orthographic projections as shown in the image below: