Frustum Splits

Recently, I have picked up work on one of my older projects again: Bounce. While being quite an elegant demonstration of collision detection via Binary Space Partitioning Trees, it lacks quite a bit on the visual side. So I decided to add some real-time shadows to the scene. I decided to implement cascaded shadow mapping as this is one of the most widely used approaches for rendering real-time shadows in games. I will post my thoughts on this blog and, of course, share the code. So check back every now and then in case you’re interested!

When it comes to cascaded shadow mapping, the first thing you have to do is split the view frustum into multiple parts. Each split will receive its own shadow map based on the idea that splits closer to the viewer cover less area and hence offer higher shadow map resolution. Check out the image blow: The view frustum is partitioned into four splits where each split is rendered in a different color.

But how can you calculate the coordinates of each sub frustum given the current camera? Consider the camera representation used in most games: a view matrix \mathbf{V} which transforms the geometry into view (eye) space and a projection matrix \mathbf{P}, combined with perspective division which projects the resulting view space geometry into clip space. So a world space vertex \mathbf{v} is transformed into it’s clip space position \mathbf{v}' like so:

  \label{test} \tilde{\mathbf{v}} = \mathbf{V} \mathbf{P} \mathbf{v} \qquad \mathbf{v}' = \tilde{\mathbf{v}}/\tilde{v}_w \qquad \qquad (1)

In DirectX clip space is defined as \{-1,\dots,1\} \times \{-1,\dots,1\} \times \{0,\dots,1\}. Being a (scaled) unit cube, it’s really easy to split clip space into sub frustums: Simply pick the corners (\pm 1, \pm 1) and the desired split depths (n,f) and the following points define your axis aligned frustum box:

\mathbf{p}_{min} = \begin{pmatrix}-1\\-1\\n\end{pmatrix} \qquad \mathbf{p}_{max} = \begin{pmatrix}1 \\ 1 \\ f \end{pmatrix}

Note that a little care has to be taken when picking the split depth values, as the distribution of z values in clip space is non-linear. Anyway, so having seen that its easy to define the split frustums in clip space, all we need to do now is convert clip space positions back to world space. This can be done by multiplying the clip space position with the inverse view and projection transforms and subsequently converting the result from homogenious coordinates to regular three dimensional coordinates:

\tilde{\mathbf{v}} = (\mathbf{V} \mathbf{P})^{-1} \mathbf{v}' \qquad \mathbf{v} = \tilde{\mathbf{v}}/\tilde{v}_w

Let’s see some code! The following function computes the corners of a split frustum in world space, given the distances of the near and far planes in clip space:

public IEnumerable<Vector3> splitFrustum(float clipSpaceNear, float clipSpaceFar, 
                                            Matrix viewProjectionInverse)
    var clipCorners = new[]
        new Vector3( -1,  1, clipSpaceNear ),
        new Vector3(  1,  1, clipSpaceNear ), 
        new Vector3(  1, -1, clipSpaceNear ),
        new Vector3( -1, -1, clipSpaceNear ), 
        new Vector3( -1,  1, clipSpaceFar  ),
        new Vector3(  1,  1, clipSpaceFar  ),
        new Vector3(  1, -1, clipSpaceFar  ),
        new Vector3( -1, -1, clipSpaceFar  )

    return clipCorners.Select(v =>
        var vt = Vector4.Transform(v, viewProjectionInverse);
        vt /= vt.W;

        return new Vector3(vt.X, vt.Y, vt.Z);

The only downside of this method is that we need to know the values clipSpaceNear and clipSpaceFar of the near and far plane in clip space – usually you only know them in view space. Not much of an issue though, as we can use formula (1) to convert view space depth into clip space.

float[] viewSpaceDepth = {-50.0f, -500.0f};
var clipSpaceDepth = viewSpaceDepth.Select(c =>
    var d = Vector4.Transform(new Vector3(0, 0, c), camera.projectionMatrix);
    return d.W != 0 ? d.Z / d.W : 0; 

Matrix viewProjInverse = Matrix.Invert(camera.viewMatrix * camera.projectionMatrix);
var frustumCorners = splitFrustum(clipSpaceDepth[0], clipSpaceDepth[1], 

One of the big advantages of this method is the fact that it works with arbitrary projection transforms, like for example orthographic projections as shown in the image below:



Published in the 2011 Siggraph Presentation Spherical Skinning with Dual-Quaternions and QTangents, Crytek proposed a highly efficient way of representing tangent space per vertex. Instead of storing the basis vectors, they store a quaternion representing the tangent space rotation and reconstruct the basis vectors in the shader. Being a simple and straight forward idea, I decided to implement this approach in the anima skinning sample.

As a first step, the tangent space vectors need to be converted into quaternion form. I did so by assembling a rotation matrix that represents the rotation of an arbitrary world space vector into tangent space:

\mathbf{T} = \left(\begin{array}{l}\mathbf{B}_{binormal} \\ \mathbf{B}_{tangent} \\ \mathbf{B}_{normal} \end{array}\right)

where \mathbf{B}_{binormal} denotes the tangent space binormal vector (in row form!) etc. This matrix can then be converted into a quaternion by one of the standard algorithms. Note, however, that special care has to be taken during this step: First of all, \mathbf{T} might not be a proper rotation matrix. It is reasonable to assume that orthonormality is guaranteed, i.e.

\mathbf{T}\mathbf{T}^T = \mathbf{T}^T \mathbf{T} = \mathbf{I}

for the identity matrix \mathbf{I}, but \mathbf{T} might still encode a reflection. If this is the case the matrix to quaternion will fail because unit quaternions cannot encode reflections. Fortunately, the matrix can be converted into a regular rotation by simply picking any basis vector and reflecting it as well. In my case I chose to always reflect the \mathbf{B}_{normal} vector. Have a look at the figure blow which illustrates the case where \mathbf{B}_{binormal} is reflected.

Note that after the reflection of the normal, handedness is restored again. Of course, reconstructing the resulting quaternion yields a tangent space with a flipped normal vector, so we need to un-flip it first before we can use it in any shading algorithms! Thus, we need to store a flag along with our quaternions that indicates if the normal vector need to be flipped after reconstruction. And this is where the smart guys from Crytek scored their points: Realizing that for a given quaternion \mathbf{q} it’s negate -\mathbf{q} represents the exact same rotation (albeit in opposite direction, but that won’t bother us here) we can enforce non negativity of any quaternion element without impact on the reconstructed tangent space. But this also means that we can use the sign of that very component to store our reflection flag! This brings our memory requirements for the whole tangent frame down to 4 floating point values. Not bad! The only thing that can go wrong now is when the chosen quaternion element is zero. Should not be an issue in theory because IEEE 754 makes a distinction between +0 and -0 but GPUs don’t always stick to this rule. In this case we can set the value of this component to a very small bias, say 1e-7. Here’s how I implemented the tangent space matrix to QTangent conversion:

// generate tangent frame rotation matrix
Math::Matrix3x4 tangentFrame(
    binormal.x,    binormal.y,    binormal.z,    0,
    tangent.x,     tangent.y,     tangent.z,     0,
    normal.x,      normal.y,      normal.z,      0

// flip y axis in case the tangent frame encodes a reflection
float scale = tangentFrame.Determinant() < 0 ? 1.0f : -1.0f;[2][0] *= scale;[2][1] *= scale;[2][2] *= scale;

// convert to quaternion
Math::Quaternion tangentFrameQuaternion = tangentFrame;

// make sure we don't end up with 0 as w component
    const float threshold = 0.000001f;
    const float renomalization = sqrt( 1.0f - threshold * threshold );

    if( abs(     { = > 0
                                            ? threshold
                                            : -threshold; *= renomalization; *= renomalization; *= renomalization;

// encode reflection into quaternion's w element by making sign of w negative
// if y axis needs to be flipped, positive otherwise
float qs = (scale<0 &&>0.f) || 
           (scale>0 &&<0) ? -1.f : 1.f; *= qs; *= qs; *= qs; *= qs;

On a side note: As implemented in the code above, reflection properties of a matrix can be detected via the matrix’s determinant, i.e. if a matrix \mathbf{T} encodes a reflection then

det(\mathbf{T}) = -1

holds. In order to obtain the world space tangent frame vectors we need to rotate the tangent frame quaternion into world space first. This can be done by concatenating it with the vertex’s blended bone transform. In the case of dual quaternion bone transforms we simply need to multiply the real part of the bone dual quaternion \hat{\mathbf{q}} = \mathbf{q}_r + \epsilon \mathbf{q}_d with the tangent space quaternion \mathbf{t}

\mathbf{t'} = \mathbf{q}_r * \mathbf{t}

The tangent space vectors can then be reconstructed via a standard quaternion to matrix routine. Note that it is advisable to only reconstruct two of the three basis vectors and recompute the third via a cross product in order to guarantee orthonormality.

float2x3 QuaternionToTangentBitangent( float4 q )
    return float2x3(
      1-2*(q.y*q.y + q.z*q.z),  2*(q.x*q.y + q.w*q.z),    2*(q.x*q.z - q.w*q.y),
      2*(q.x*q.y - q.w*q.z),    1-2*(q.x*q.x + q.z*q.z),  2*(q.y*q.z + q.w*q.x)

float3x3 GetTangentFrame( float4 worldTransform, TangentFrame tangentFrame )
    float4 q = QuaternionMultiply( worldTransform,  tangentFrame.Rotation );
    float2x3 tBt = QuaternionToTangentBitangent( q );

    return float3x3(
        cross(tBt[0],tBt[1]) * (tangentFrame.Rotation.w < 0 ? -1 : 1)


Normal Mapping

Normal mapping is a highly popular technique in video game development and real time rendering applications where the polygon count needs to be kept as low as possible. It allows us, at reasonable cost, to add high frequency surface detail to arbitrary surfaces of low polygon count. The basic idea is simple: Encode the extra detail in an additional texture and use during shading.

In the case of normal mapping we simulate additional surface detail by storing the orientation of the surface normal at each texel. This gives us much more fine grained normal information than vertex normals as the texture covers the model in a much more fine-grained fashion. The straight forward way to accomplish this would be to simply store the exact surface normal at each texel. At run-time all we’d have to do is sample the texture in the fragment shader and replace the interpolated normal with the sampled one. This approach is usually called Object Space Normal Mapping as the normals are defined relative to the whole object. While being simple and efficient, this approach has some significant draw-backs: We cannot animate the vertices of the model as this would change their world space orientation and we cannot reuse the same map on mirrored geometry. As a result most games and applications use Tangent Space Normal Mapping where, as the name implies, the normals are specified relative the the object’s local tangent space.

Have a look at the image above: We define local tangent space by the orthogonal basis vectors ‘normal’, ‘tangent’ and ‘binormal’. Relative to this coordinate system, we can represent any other normal vector (in red the image above) by a linear combination of the three basis vectors:

\mathbf{n}' = b \mathbf{B}_{binormal} + t \mathbf{B}_{tangent} + n \mathbf{B}_{normal}

where \mathbf{B}_{binormal} denotes the basis vector called ‘binormal’, etc. We will thus store in our normal map the coefficients of the linear combination: \{b, t, n\} and evaluate the equation above in the fragment shader. Though, where will we get the basis vectors from? Usually the tangent space basis vectors are computed during model export or import. The standard procedure does so by aligning tangent and binormal with the direction of the UV coordinates and computing the normal vector by a cross product of tangent and binormal. Details can be found in various publications. Once the tangent space basis vectors are known for each vertex, they are passed to the vertex shader. The vertex shader transforms the vectors into world space according to it’s associated world space transform and then passes them on to the fragment shader. So.. Let’s have a look at some code:

In order to get normal, tangent and binormal data into the DirectX vertex buffer I added converters for each vector to the mesh importer:

if( mesh->HasTangentsAndBitangents() )
    converters.push_back( new aiVector3DConverter( D3DDECLUSAGE_TEXCOORD, 5, 
        offset, mesh->mNormals, vertexCount ) );
    offset += converters.back()->Size();

    converters.push_back( new aiVector3DConverter( D3DDECLUSAGE_TEXCOORD, 6, 
        offset, mesh->mTangents, vertexCount ) );
    offset += converters.back()->Size();

    converters.push_back( new aiVector3DConverter( D3DDECLUSAGE_TEXCOORD, 7, 
        offset, mesh->mBitangents, vertexCount ) );
    offset += converters.back()->Size();

Note that they are sent to the GPU via the TEXCOORD5TEXCOORD7 semantics. The vertex shader input needed to reflect this fact, so I extended the VertexShaderInput struct with the definition of tangent frame data:

struct TangentFrame
    float3 Normal     : TEXCOORD5;
    float3 Tangent    : TEXCOORD6;
    float3 Binormal   : TEXCOORD7;

struct VertexShaderInput
    float4 Position                    : POSITION;
    TangentFrame TangentFrameData;
    float2 TexCoord                    : TEXCOORD0;
    float4 BlendWeights                : BLENDWEIGHT0;
    uint4 BlendIndices                 : BLENDINDICES0;

The Vertex shader itself now needs to transform the tangent space vectors into world space and pass them on to the pixel shader:

struct VertexShaderOutput
    float4 Position                    : POSITION0;
    float2 TexCoord                    : TEXCOORD0;
    float3x3 TangentFrame              : TEXCOORD2;

VertexShaderOutput Model_VS( VertexShaderInput input )
    VertexShaderOutput result;
    result.TexCoord = input.TexCoord;

    float3x4 blendedTransform =
        BoneTransforms[input.BlendIndices.x] * input.BlendWeights.x +
        BoneTransforms[input.BlendIndices.y] * input.BlendWeights.y +
        BoneTransforms[input.BlendIndices.z] * input.BlendWeights.z +
        BoneTransforms[input.BlendIndices.w] * input.BlendWeights.w;

    // position
        float4 posH = float4(, 1.f );
        float4 blendedPosition = mul( posH, blendedTransform );

        result.Position =  mul( blendedPosition, ViewProjection );

    // tangent space
        float3x3 tf = float3x3(

        result.TangentFrame = mul( tf, transpose((float3x3)blendedTransform) );

    return result;

The tangent space vectors, now in world space, are then interpolated across the current polygon and then passed on to the fragment shader. In the fragment shader we sample the normal map and compute the surface normal according to the equation above. Beware of an implementation detail: Depending on the texture format, individual texels store values in the range of \{0,\dots,1\}. Our tangent space coefficients, however, can range from \{-1\dots1\}. So we need to remap the texel values before plugging them into the equation above:

float4 Model_PS( VertexShaderOutput input ) : COLOR0
    float3 textureNormal = tex2D( NormalSampler, input.TexCoord ).xyz*2-1;
    float3 normal = normalize( mul( textureNormal, input.TangentFrame ) );

    const float ambient = 0.75f;
    float diffuseFactor = 0.25f;

    float diffuse = dot(normal, );
    float4 textureColor = tex2D( DiffuseSampler, input.TexCoord );

    return textureColor * (ambient + diffuse * diffuseFactor);

Lets look at the result in pictures: Below you can see the albedo texture for frank’s shoes and the corresponding normal map. 
Next you can see the frank model with a simple lambert shader with and without normal map. You can clearly make out the additional detail the normal map adds to the model.


Dual Quaternion Skinning

In the last two posts we’ve reviewed skeletal animation and smooth skinning based on matrices (which, by incidence, is usually called linear blend skinning), so now we’re ready to look at an alternative: dual quaternion skinning. Linear blend skinning has some serious disadvantages that come down to one simple truth: The weighted average W' of all bone transforms that influence one vertex

\mathbf{v}' = \sum_i w_i \mathbf{W}_i(\mathbf{v}) = \underbrace{(\sum_i w_i \mathbf{W}_i )}_{\mathbf{W}'}(\mathbf{v}) = \mathbf{W}'(\mathbf{v})

is simply not a satisfying method of blending the transforms of multiple bones. I am going to discuss the issues in more detail in another post later, but let’s just say for now that the linear blending of rotation matrices does result in a number of undesired artifacts.

Dual Quaternions provide a solution for some of these artifacts, as the interpolation of rotations also blends the rotation centers. Le me point you to the following paper, which provides some more in-depth explanation as well as an upper error bound on the accuracy of rotation interpolation.

Without going into too much detail, let me point out that dual quaternions are an extension to regular quaternions, where two quaternions \mathbf{q}_r and \mathbf{q}_d are concatenated via the dual element \epsilon:

\hat{\mathbf{q}} = \mathbf{q}_r + \epsilon \mathbf{q}_d, \quad \epsilon^2 = 0

What makes dual quaternions so interesting is the fact that each unit dual quaternion can be interpreted as a rigid body tranform representing a rotation and a translation. I’m going to spare you with the formulas for addition, multiplication, inverse etc of dual quaternions here but lets just say that knowing the fundamental property of \epsilon^2 = 0 we can easily derive them ourselves. As the paper also points out, the concatenation of two rigid body transforms represented by unit dual quaternions is accomplished by multiplying both dual quaternions. The weighted blending of multiple unit dual quaternions corresponds to an approximate interpolation of the corresponding transforms. Perfect circumstances for skinning!!

So lets get our hands dirty and dive into some code: In principle, all I am doing is replace the current code representation of a joint transform (4×3 Matrix) by a Dual Quaternion. Since I wanted to keep the linear blend skinning code path and I wanted to avoid copy/pasting lots of code, I decided to templetize the shared code, making the joint transform type a variable parameter.So as a first step I updated the skeleton class:

template<class BoneTransform>
class SkeletonGeneric
    std::vector<int> mParents;
    std::vector<BoneTransform> mTransforms;
    std::vector<BoneTransform> mBindingTransforms;

    virtual BoneTransform Concatenate( const BoneTransform& first, 
                                       const BoneTransform& second ) const
        return second * first;


Note the new function Concatenate(), which is from now on used everywhere where two bone transforms need to be concatenated. I provide a default implementation that will just apply the multiplication operator: BoneTransform::operator*, but this definition can be easily overridden to do more specialized stuff.

Next, I defined a dual quaternion class and implemented all the basic dual quaternion math. Have a look at math.cpp in case you’re interested. The dual quaternion class also posesses a couple of specialized constructors that allow conversion of animation data (translation, rotation pairs) and binding transform data (4×3 matrix) into dual quaternion form. I also had to update the renderer and the UI in order to support switching between linear blend skinning and dual quaternion skinning on keypress, but I am going to leave out these boring details.

These steps concluded the CPU implementation: evaluating animation channels, computing joint world space transforms and uploading the transform data to the GPU work just the same as before. The only difference being that the amount of data that needs to be sent to the GPU every frame is quite a bit smaller: 8 float values instead of 12 per joint.

On the GPU side I needed to update the code for blending the bone transforms and the transformation of the vertex position and normal to work with dual quaternions. Blending is quite simple: Compute a weighted average of the bone transforms and renormalize:

float2x4 GetBlendedDualQuaternion( uint4 boneIndices, float4 boneWeights )
    float2x4 dq0 = GetBoneDualQuaternion( boneIndices.x );
    float2x4 dq1 = GetBoneDualQuaternion( boneIndices.y );
    float2x4 dq2 = GetBoneDualQuaternion( boneIndices.z );
    float2x4 dq3 = GetBoneDualQuaternion( boneIndices.w );

    float2x4 blendedDQ =
        dq0 * boneWeights.x +
        dq1 * boneWeights.y +
        dq2 * boneWeights.z +
        dq3 * boneWeights.w;

    float normDQ = length(blendedDQ[0]);
    return blendedDQ / normDQ;

Simliar to regular quaternions, a vertex is transformed via the ‘sandwich’ type operation:

\mathbf{\hat{v}}' = \mathbf{\hat{q}}\mathbf{\hat{v}}\overline{\mathbf{\hat{q}}}^*

where \hat{\mathbf{v}} is the vertex in dual quaternion form (just a translation) and \overline{\hat{q}}^* is the conjugate of q (well actually both types of conjugates that exist for dual quaternions). As shown in the paper, this operation can be simplified to

\mathbf{v}' = \mathbf{v} + 2 \overrightarrow{\mathbf{q}}_r \times ( \overrightarrow{\mathbf{q}}_r \times \mathbf{v} + \dot{q}_r \mathbf{v} ) + 2 [ \dot{q}_r \overrightarrow{\mathbf{q}}_d - \dot{q}_d \overrightarrow{\mathbf{q}}_r + \overrightarrow{\mathbf{q}}_r \times \overrightarrow{\mathbf{q}}_d ]

where \overrightarrow{\mathbf{q}} denotes the ‘imaginary’ or ‘vector’ part of quaternion \mathbf{q} and \dot{q} denotes the scalar part. Assuming that our dual quaternions only ever perform rotations and translations, the transform for the normal can be written as

\mathbf{n}' = \mathbf{n} + 2 \overrightarrow{\mathbf{q}}_r \times ( \overrightarrow{\mathbf{q}}_r \times \mathbf{n} + \dot{q}_r \mathbf{n} )

which corresponds to only evaluating the rotational part of the dual quaternion. So given these definitions I can now present the relevant parts of the vertex shader:

float3 transformPositionDQ( float3 position, float4 realDQ, float4 dualDQ )
    return position +
        2 * cross(, cross(, position) + realDQ.w*position ) +
        2 * (realDQ.w * - dualDQ.w * + 

float3 transformNormalDQ( float3 normal, float4 realDQ, float4 dualDQ )
    return normal + 2.0 * cross(, cross(, normal ) + 
                          realDQ.w * normal );

VertexShaderOutput DualQuaternionSkinning_VS( VertexShaderInput input )
    VertexShaderOutput result;
    result.TexCoord = input.TexCoord;
    result.Color = float4( 0.75, 0.75, 0.75, 1 );

    // blend bone DQs
    float2x4 blendedDQ
        = GetBlendedDualQuaternion( input.BlendIndices, input.BlendWeights );

    // transform position into clip space
    float3 blendedPosition = transformPositionDQ(, 
        blendedDQ[0], blendedDQ[1] );

    result.Position = mul( float4(, 1), ViewProjection );

    // transform normal vector into world space
    result.Normal = transformNormalDQ( input.Normal, blendedDQ[0], blendedDQ[1] );
    return result;


Skinning Reviewed

Last time we’ve reviewed skeletal animation. Lets now get into how we can deform a mesh based on that skeleton. What we want to achieve is that each mesh’s vertex moves along when the skeleton is animated. So the simplest idea is attach each vertex to one bone and transform the vertex’s position by that bone’s world transform. I.e the transformed position of a vertex v, attached to bone i with world space transform W_i is

\mathbf{v}' = \mathbf{W}_i( \mathbf{v} )

Remember though: Joint transforms always expect data relative to the joint’s local coordinate space. This means that we need to store the mesh’s vertices in the local coordinate system of their associated joint! Not a great restriction because our modeling tool can do that for us during export, but that also means that we cannot render the geometry without the skeleton.

Now, attaching each vertex to one bone has some serious limitations: It’s simply not a realistic model for most common materials. This is especially true for materials like skin or cloth. Consider human skin in joint areas: it stretches and wrinkles all over the place. So let’s make our model a little bit more complicated: Let’s allow each vertex to be influenced by more than one bone. Let us thus, for each vertex, define a list of bones \{\mathbf{W}_1,\dots,\mathbf{W}_N\} the vertex is attached to and a weight factor w_i, i \in \{1,\dots,N\} for each bone in that list. Provided that the weight factors are normalized \sum_i w_i = 1, we can then simply transform the vertex once by each bone and sum up the result, weighted by the weight factor

\mathbf{v}' = \sum_i w_i \mathbf{W}_i( \mathbf{v} )

Beware though: Aren’t we forgetting something? Right, joint transforms expect their data relative to their local coordinate system. But how can we accomplish that, given that each vertex can only be defined relative to one coordinates system? Of course, we could store N copies of the mesh but that would be a huge waste of memory. So let’s instead store our vertices in world space and transform them into joint local space whenever required. This can be accomplished by adding yet another transformation for each bone, the so called Binding Transform \mathbf{B}_i. This yields

\mathbf{v}' = \sum_i w_i \mathbf{W}_i * \mathbf{B}_i( v )

You might ask yourself: How the heck am I going to come up with these binding transforms? Well, the answer is simple: The binding transforms are just a snapshot of the skeleton’s world space joint transform at the time of binding. So whenever the artist poses the model and then binds the skin, the world space joint transforms are recorded for each bone and stored. Note that since a bone’s world space transform \mathbf{W}_i maps data from joint local space to world space, the corresponding binding transform is actually the inverse of \mathbf{W}_i

\mathbf{B}_i = \mathbf{W}_i^{-1}

Let’s look at some code now, hopefully this will help make things more clear. First we need to extend the existing skeleton class to store the binding transform for each bone.

class Skeleton
    std::vector<aiMatrix4x4> mParents;
    std::vector<aiMatrix4x4> mBindingTransforms;
    std::vector<int> mTransforms;


Now we can modify the getWorldTransform method to take into account the binding transform

aiMatrix4x4 Skeleton::getWorldTransform( int bone ) const
    int p = mParents[bone];
    aiMatrix4x4 result = mTransforms[bone] * mBindingTransforms[bone];

    while( p >= 0 )
        result = mTransforms[p] * result;
        p = mParents[p];

    return result;

Next, each vertex needs to know the index of the bones it is attached to, as well as the weighting factors. So I extended the vertex declaration to include a UBYTE4 vector storing up to four bone indices per vertex and a FLOAT4 vector storing the corresponding weight factors. In fact, I created a new type of data converter, which computes both values at once. Currently I am passing the bone indices and weights via the D3DDECLUSAGE_BLENDINDICES and D3DDECLUSAGE_BLENDWEIGHT semantics to the vertex shader. The bone matrices themselves are passed to the shader via an array of vertex shader constants. The vertex shader now looks like this (unimportant parts stripped for the sake of clarity):

float4 LightDirection : LIGHTDIRECTION;
float4x4 BoneTransforms[16] : BONETRANSFORMS;

struct VertexShaderInput
    float4 Position            : POSITION;
    float3 Normal              : NORMAL;
    float2 TexCoord            : TEXCOORD0;
    float4 BlendWeights        : BLENDWEIGHT0;
    uint4 BlendIndices         : BLENDINDICES0;

VertexShaderOutput Model_VS( VertexShaderInput input )
    VertexShaderOutput result;
    result.Normal =;
    result.TexCoord = input.TexCoord;
    result.Color = input.BlendWeights;

    float4 posH = float4(, 1.f );
    float4 blendedPosition =
        mul( posH, BoneTransforms[input.BlendIndices.x] ) * 
            input.BlendWeights.x + 
        mul( posH, BoneTransforms[input.BlendIndices.y] ) * 
            input.BlendWeights.y + 
        mul( posH, BoneTransforms[input.BlendIndices.z] ) * 
            input.BlendWeights.z + 
        mul( posH, BoneTransforms[input.BlendIndices.w] ) *

    result.Position =  mul( blendedPosition, ViewProjection );
    return result;