Frustum Splits

Recently, I have picked up work on one of my older projects again: Bounce. While being quite an elegant demonstration of collision detection via Binary Space Partitioning Trees, it lacks quite a bit on the visual side. So I decided to add some real-time shadows to the scene. I decided to implement cascaded shadow mapping as this is one of the most widely used approaches for rendering real-time shadows in games. I will post my thoughts on this blog and, of course, share the code. So check back every now and then in case you’re interested!

When it comes to cascaded shadow mapping, the first thing you have to do is split the view frustum into multiple parts. Each split will receive its own shadow map based on the idea that splits closer to the viewer cover less area and hence offer higher shadow map resolution. Check out the image blow: The view frustum is partitioned into four splits where each split is rendered in a different color.

But how can you calculate the coordinates of each sub frustum given the current camera? Consider the camera representation used in most games: a view matrix $\mathbf{V}$ which transforms the geometry into view (eye) space and a projection matrix $\mathbf{P}$, combined with perspective division which projects the resulting view space geometry into clip space. So a world space vertex $\mathbf{v}$ is transformed into it’s clip space position $\mathbf{v}'$ like so:

$\label{test} \tilde{\mathbf{v}} = \mathbf{V} \mathbf{P} \mathbf{v} \qquad \mathbf{v}' = \tilde{\mathbf{v}}/\tilde{v}_w \qquad \qquad (1)$

In DirectX clip space is defined as $\{-1,\dots,1\} \times \{-1,\dots,1\} \times \{0,\dots,1\}$. Being a (scaled) unit cube, it’s really easy to split clip space into sub frustums: Simply pick the corners $(\pm 1, \pm 1)$ and the desired split depths $(n,f)$ and the following points define your axis aligned frustum box:

$\mathbf{p}_{min} = \begin{pmatrix}-1\\-1\\n\end{pmatrix} \qquad \mathbf{p}_{max} = \begin{pmatrix}1 \\ 1 \\ f \end{pmatrix}$

Note that a little care has to be taken when picking the split depth values, as the distribution of z values in clip space is non-linear. Anyway, so having seen that its easy to define the split frustums in clip space, all we need to do now is convert clip space positions back to world space. This can be done by multiplying the clip space position with the inverse view and projection transforms and subsequently converting the result from homogenious coordinates to regular three dimensional coordinates:

$\tilde{\mathbf{v}} = (\mathbf{V} \mathbf{P})^{-1} \mathbf{v}' \qquad \mathbf{v} = \tilde{\mathbf{v}}/\tilde{v}_w$

Let’s see some code! The following function computes the corners of a split frustum in world space, given the distances of the near and far planes in clip space:

public IEnumerable<Vector3> splitFrustum(float clipSpaceNear, float clipSpaceFar,
Matrix viewProjectionInverse)
{
var clipCorners = new[]
{
new Vector3( -1,  1, clipSpaceNear ),
new Vector3(  1,  1, clipSpaceNear ),
new Vector3(  1, -1, clipSpaceNear ),
new Vector3( -1, -1, clipSpaceNear ),
new Vector3( -1,  1, clipSpaceFar  ),
new Vector3(  1,  1, clipSpaceFar  ),
new Vector3(  1, -1, clipSpaceFar  ),
new Vector3( -1, -1, clipSpaceFar  )
};

return clipCorners.Select(v =>
{
var vt = Vector4.Transform(v, viewProjectionInverse);
vt /= vt.W;

return new Vector3(vt.X, vt.Y, vt.Z);
});
}


The only downside of this method is that we need to know the values clipSpaceNear and clipSpaceFar of the near and far plane in clip space – usually you only know them in view space. Not much of an issue though, as we can use formula $(1)$ to convert view space depth into clip space.

float[] viewSpaceDepth = {-50.0f, -500.0f};
var clipSpaceDepth = viewSpaceDepth.Select(c =>
{
var d = Vector4.Transform(new Vector3(0, 0, c), camera.projectionMatrix);
return d.W != 0 ? d.Z / d.W : 0;
}).ToArray();

Matrix viewProjInverse = Matrix.Invert(camera.viewMatrix * camera.projectionMatrix);
var frustumCorners = splitFrustum(clipSpaceDepth[0], clipSpaceDepth[1],
viewProjInverse).ToArray();


One of the big advantages of this method is the fact that it works with arbitrary projection transforms, like for example orthographic projections as shown in the image below: