# Dual Quaternion Skinning

In the last two posts we’ve reviewed skeletal animation and smooth skinning based on matrices (which, by incidence, is usually called linear blend skinning), so now we’re ready to look at an alternative: dual quaternion skinning. Linear blend skinning has some serious disadvantages that come down to one simple truth: The weighted average $W'$ of all bone transforms that influence one vertex

$\mathbf{v}' = \sum_i w_i \mathbf{W}_i(\mathbf{v}) = \underbrace{(\sum_i w_i \mathbf{W}_i )}_{\mathbf{W}'}(\mathbf{v}) = \mathbf{W}'(\mathbf{v})$

is simply not a satisfying method of blending the transforms of multiple bones. I am going to discuss the issues in more detail in another post later, but let’s just say for now that the linear blending of rotation matrices does result in a number of undesired artifacts.

Dual Quaternions provide a solution for some of these artifacts, as the interpolation of rotations also blends the rotation centers. Le me point you to the following paper, which provides some more in-depth explanation as well as an upper error bound on the accuracy of rotation interpolation.

Without going into too much detail, let me point out that dual quaternions are an extension to regular quaternions, where two quaternions $\mathbf{q}_r$ and $\mathbf{q}_d$ are concatenated via the dual element $\epsilon$:

$\hat{\mathbf{q}} = \mathbf{q}_r + \epsilon \mathbf{q}_d, \quad \epsilon^2 = 0$

What makes dual quaternions so interesting is the fact that each unit dual quaternion can be interpreted as a rigid body tranform representing a rotation and a translation. I’m going to spare you with the formulas for addition, multiplication, inverse etc of dual quaternions here but lets just say that knowing the fundamental property of $\epsilon^2 = 0$ we can easily derive them ourselves. As the paper also points out, the concatenation of two rigid body transforms represented by unit dual quaternions is accomplished by multiplying both dual quaternions. The weighted blending of multiple unit dual quaternions corresponds to an approximate interpolation of the corresponding transforms. Perfect circumstances for skinning!!

So lets get our hands dirty and dive into some code: In principle, all I am doing is replace the current code representation of a joint transform (4×3 Matrix) by a Dual Quaternion. Since I wanted to keep the linear blend skinning code path and I wanted to avoid copy/pasting lots of code, I decided to templetize the shared code, making the joint transform type a variable parameter.So as a first step I updated the skeleton class:

template<class BoneTransform>
class SkeletonGeneric
{
std::vector<int> mParents;
std::vector<BoneTransform> mTransforms;
std::vector<BoneTransform> mBindingTransforms;

virtual BoneTransform Concatenate( const BoneTransform& first,
const BoneTransform& second ) const
{
return second * first;
}

...
};


Note the new function Concatenate(), which is from now on used everywhere where two bone transforms need to be concatenated. I provide a default implementation that will just apply the multiplication operator: BoneTransform::operator*, but this definition can be easily overridden to do more specialized stuff.

Next, I defined a dual quaternion class and implemented all the basic dual quaternion math. Have a look at math.cpp in case you’re interested. The dual quaternion class also posesses a couple of specialized constructors that allow conversion of animation data (translation, rotation pairs) and binding transform data (4×3 matrix) into dual quaternion form. I also had to update the renderer and the UI in order to support switching between linear blend skinning and dual quaternion skinning on keypress, but I am going to leave out these boring details.

These steps concluded the CPU implementation: evaluating animation channels, computing joint world space transforms and uploading the transform data to the GPU work just the same as before. The only difference being that the amount of data that needs to be sent to the GPU every frame is quite a bit smaller: 8 float values instead of 12 per joint.

On the GPU side I needed to update the code for blending the bone transforms and the transformation of the vertex position and normal to work with dual quaternions. Blending is quite simple: Compute a weighted average of the bone transforms and renormalize:

float2x4 GetBlendedDualQuaternion( uint4 boneIndices, float4 boneWeights )
{
float2x4 dq0 = GetBoneDualQuaternion( boneIndices.x );
float2x4 dq1 = GetBoneDualQuaternion( boneIndices.y );
float2x4 dq2 = GetBoneDualQuaternion( boneIndices.z );
float2x4 dq3 = GetBoneDualQuaternion( boneIndices.w );

float2x4 blendedDQ =
dq0 * boneWeights.x +
dq1 * boneWeights.y +
dq2 * boneWeights.z +
dq3 * boneWeights.w;

float normDQ = length(blendedDQ[0]);
return blendedDQ / normDQ;
}


Simliar to regular quaternions, a vertex is transformed via the ‘sandwich’ type operation:

$\mathbf{\hat{v}}' = \mathbf{\hat{q}}\mathbf{\hat{v}}\overline{\mathbf{\hat{q}}}^*$

where $\hat{\mathbf{v}}$ is the vertex in dual quaternion form (just a translation) and $\overline{\hat{q}}^*$ is the conjugate of $q$ (well actually both types of conjugates that exist for dual quaternions). As shown in the paper, this operation can be simplified to

$\mathbf{v}' = \mathbf{v} + 2 \overrightarrow{\mathbf{q}}_r \times ( \overrightarrow{\mathbf{q}}_r \times \mathbf{v} + \dot{q}_r \mathbf{v} ) + 2 [ \dot{q}_r \overrightarrow{\mathbf{q}}_d - \dot{q}_d \overrightarrow{\mathbf{q}}_r + \overrightarrow{\mathbf{q}}_r \times \overrightarrow{\mathbf{q}}_d ]$

where $\overrightarrow{\mathbf{q}}$ denotes the ‘imaginary’ or ‘vector’ part of quaternion $\mathbf{q}$ and $\dot{q}$ denotes the scalar part. Assuming that our dual quaternions only ever perform rotations and translations, the transform for the normal can be written as

$\mathbf{n}' = \mathbf{n} + 2 \overrightarrow{\mathbf{q}}_r \times ( \overrightarrow{\mathbf{q}}_r \times \mathbf{n} + \dot{q}_r \mathbf{n} )$

which corresponds to only evaluating the rotational part of the dual quaternion. So given these definitions I can now present the relevant parts of the vertex shader:

float3 transformPositionDQ( float3 position, float4 realDQ, float4 dualDQ )
{
return position +
2 * cross( realDQ.xyz, cross(realDQ.xyz, position) + realDQ.w*position ) +
2 * (realDQ.w * dualDQ.xyz - dualDQ.w * realDQ.xyz +
cross( realDQ.xyz, dualDQ.xyz));
}

float3 transformNormalDQ( float3 normal, float4 realDQ, float4 dualDQ )
{
return normal + 2.0 * cross( realDQ.xyz, cross( realDQ.xyz, normal ) +
realDQ.w * normal );
}

{
result.TexCoord = input.TexCoord;
result.Color = float4( 0.75, 0.75, 0.75, 1 );

// blend bone DQs
float2x4 blendedDQ
= GetBlendedDualQuaternion( input.BlendIndices, input.BlendWeights );

// transform position into clip space
float3 blendedPosition = transformPositionDQ( input.Position.xyz,
blendedDQ[0], blendedDQ[1] );

result.Position = mul( float4( blendedPosition.xyz, 1), ViewProjection );

// transform normal vector into world space
result.Normal = transformNormalDQ( input.Normal, blendedDQ[0], blendedDQ[1] );
return result;
}


# Skinning Reviewed

Last time we’ve reviewed skeletal animation. Lets now get into how we can deform a mesh based on that skeleton. What we want to achieve is that each mesh’s vertex moves along when the skeleton is animated. So the simplest idea is attach each vertex to one bone and transform the vertex’s position by that bone’s world transform. I.e the transformed position of a vertex $v$, attached to bone $i$ with world space transform $W_i$ is

$\mathbf{v}' = \mathbf{W}_i( \mathbf{v} )$

Remember though: Joint transforms always expect data relative to the joint’s local coordinate space. This means that we need to store the mesh’s vertices in the local coordinate system of their associated joint! Not a great restriction because our modeling tool can do that for us during export, but that also means that we cannot render the geometry without the skeleton.

Now, attaching each vertex to one bone has some serious limitations: It’s simply not a realistic model for most common materials. This is especially true for materials like skin or cloth. Consider human skin in joint areas: it stretches and wrinkles all over the place. So let’s make our model a little bit more complicated: Let’s allow each vertex to be influenced by more than one bone. Let us thus, for each vertex, define a list of bones $\{\mathbf{W}_1,\dots,\mathbf{W}_N\}$ the vertex is attached to and a weight factor $w_i, i \in \{1,\dots,N\}$ for each bone in that list. Provided that the weight factors are normalized $\sum_i w_i = 1$, we can then simply transform the vertex once by each bone and sum up the result, weighted by the weight factor

$\mathbf{v}' = \sum_i w_i \mathbf{W}_i( \mathbf{v} )$

Beware though: Aren’t we forgetting something? Right, joint transforms expect their data relative to their local coordinate system. But how can we accomplish that, given that each vertex can only be defined relative to one coordinates system? Of course, we could store N copies of the mesh but that would be a huge waste of memory. So let’s instead store our vertices in world space and transform them into joint local space whenever required. This can be accomplished by adding yet another transformation for each bone, the so called Binding Transform $\mathbf{B}_i$. This yields

$\mathbf{v}' = \sum_i w_i \mathbf{W}_i * \mathbf{B}_i( v )$

You might ask yourself: How the heck am I going to come up with these binding transforms? Well, the answer is simple: The binding transforms are just a snapshot of the skeleton’s world space joint transform at the time of binding. So whenever the artist poses the model and then binds the skin, the world space joint transforms are recorded for each bone and stored. Note that since a bone’s world space transform $\mathbf{W}_i$ maps data from joint local space to world space, the corresponding binding transform is actually the inverse of $\mathbf{W}_i$

$\mathbf{B}_i = \mathbf{W}_i^{-1}$

Let’s look at some code now, hopefully this will help make things more clear. First we need to extend the existing skeleton class to store the binding transform for each bone.

class Skeleton
{
std::vector<aiMatrix4x4> mParents;
std::vector<aiMatrix4x4> mBindingTransforms;
std::vector<int> mTransforms;

â€¦
};


Now we can modify the getWorldTransform method to take into account the binding transform

aiMatrix4x4 Skeleton::getWorldTransform( int bone ) const
{
int p = mParents[bone];
aiMatrix4x4 result = mTransforms[bone] * mBindingTransforms[bone];

while( p >= 0 )
{
result = mTransforms[p] * result;
p = mParents[p];
}

return result;
}


Next, each vertex needs to know the index of the bones it is attached to, as well as the weighting factors. So I extended the vertex declaration to include a UBYTE4 vector storing up to four bone indices per vertex and a FLOAT4 vector storing the corresponding weight factors. In fact, I created a new type of data converter, which computes both values at once. Currently I am passing the bone indices and weights via the D3DDECLUSAGE_BLENDINDICES and D3DDECLUSAGE_BLENDWEIGHT semantics to the vertex shader. The bone matrices themselves are passed to the shader via an array of vertex shader constants. The vertex shader now looks like this (unimportant parts stripped for the sake of clarity):

float4 LightDirection : LIGHTDIRECTION;
float4x4 BoneTransforms[16] : BONETRANSFORMS;

{
float4 Position            : POSITION;
float3 Normal              : NORMAL;
float2 TexCoord            : TEXCOORD0;
float4 BlendWeights        : BLENDWEIGHT0;
uint4 BlendIndices         : BLENDINDICES0;
};

{
result.Normal = input.Normal.xyz;
result.TexCoord = input.TexCoord;
result.Color = input.BlendWeights;

float4 posH = float4( input.Position.xyz, 1.f );

float4 blendedPosition =
mul( posH, BoneTransforms[input.BlendIndices.x] ) *
input.BlendWeights.x +
mul( posH, BoneTransforms[input.BlendIndices.y] ) *
input.BlendWeights.y +
mul( posH, BoneTransforms[input.BlendIndices.z] ) *
input.BlendWeights.z +
mul( posH, BoneTransforms[input.BlendIndices.w] ) *
input.BlendWeights.w;

result.Position =  mul( blendedPosition, ViewProjection );
return result;
}


# Skeletal Animation Reviewed

And now it’s getting interesting: Let’s look into how we can animate our models! The by far most popular method for animating rigid bodies in real-time is called skeletal animation, mostly due to its simplicity and run-time efficiency. As the name already suggests: We associate our model with a skeleton and perform all deformations based on this skeleton. Notice the similarity to human beings: our motions are constrained and guided by a skeleton as well.

Taking a closer look at the human skeleton, we can identify two major parts: joints and bones. Bones are rigid structures that don’t deform: Put too much pressure on a bone and it will break – something almost every one of us has experienced already :(. Joints in turn are dynamic: they can rotate with various degrees of freedom (e.g. shoulder joint vs. elbow joint). Each bone is attached to at least one joint and thus a rotation of the joint will cause a rotation of the attached bones. If you rotate your elbow joint for example you’ll see that the directly attached bones (Radius and Ulna, which connect the elbow joint with the wrist joint) are rotated around the elbow joint’s local rotation axis. But also the indirectly attached bones like in your fingers and wrist are rotated around the same axis as well. Our skeleton thus defines a hierarchy where the rotation of one joint will also rotate all joints (and bones) in the hierarchy ‘below’. Lets illustrate this with a simple example: Take a simple cylinder (1). We create a skeleton consisting of 4 bones and 5 joints and attach it to the cylinder (2). The skeleton hierarchy is simple: each bone has one child (3). Now let’s rotate joint 2 by a couple of degrees. All joints below joint 2 will rotate around the local rotation axis defined by joint 2, resulting in the deformed cylinder as shown in (4).

Left to right: cylinder (1), cylinder with skeleton (2), skeleton hierarchy (3), deformed cylinder (4)

So let’s recap what we’ve gathered so far:

• A skeleton represents a hierarchy of joints and bones
• Each joint can rotate around a local rotation axis
• The rotation of a given joint will cause rotation of all joints in the hierarchy below
• The mesh is bound to the hierarchy such that it will deform with it.

Now how can we model this mathematically? Let’s look at our wrist joint again and remember that it is connected to the elbow joint via the Radius bone. Consider a point in the local coordinate system of the wrist joint: We can express it’s position relative to the local coordinate system of the elbow joint by rotating it around the wrist joint and then translating it along the Radius bone:

$\mathbf{p}' = \mathbf{R}_{Wrist}(\mathbf{p} ) + \mathbf{T}_{Radius}$

Going one step up the hierarchy, we can express it’s position relative to the local coordinate system of the shoulder joint by rotating $\mathbf{p}'$ around the elbow joint and translating it along the Humerus bone:

$\mathbf{q}' = \mathbf{R}_{Elbow}( \mathbf{p}' ) + \mathbf{T}_{Humerus}$

Inserting $\mathbf{p}'$ into the formula for $\mathbf{q}'$ yields

$\mathbf{q}' = \mathbf{R}_{Elbow}( \mathbf{R}_{Wrist}( \mathbf{p} ) + \mathbf{T}_{Radius} ) + \mathbf{T}_{Humerus}$

which simply corresponds to a concatenation of the transforms for the wrist joint and the elbow joint. This example shows that if we express each joint’s transform in the local coordinate system of it’s parent, it is extremely simple to transform a point local to one joint into world space: all we need to do is concatenate the joint’s transform with the transforms of it’s predecessors in the skeleton hierarchy and transform the point by the result.
Formally, defining the rotation and translation transform of joint $i$ as $\mathbf{A}_i$ and the concatenation of two transforms as $*$ we can write the world space transform of joint $i$ as

$\mathbf{W}_i = \mathbf{A}_1 * \mathbf{A}_2 * \dots * \mathbf{A}_i$

for the sequence $0,1,2,..,i-1$ of parents of joint $i$. This gives us a simple method of computing the world space transforms of all joints in the skeleton. In order to deform the skeleton over time, all we need to do is animate the local transforms $\mathbf{A}_i$ of the joints. This is usually done via keyframe animation, where we store a sequence of deformation parameters over time. For example, if we want to animate the bend of the elbow joint we’d store the sequence of rotation values for the elbow joint along with the corresponding animation time as a sequence of (rotation,time) values. At runtime we’d search the sequence for the rotation value that best matches the current animation time and then change the rotation of $\mathbf{A}_i$ to represent that value.

Please note that we’ve implicitly encoded the rotation of a given joint and the translation of it’s corresponding bone into one transform. Thus, in the so defined skeleton we don’t need to distinguish between bones and joints anymore and we will use the term bone and joint interchangeably.

Let’s look at some code now: I defined a character’s skeleton as a simple class that stores the list of joint transforms as a stl vector of 4×4 Matrices. In order to represent the hierarchy structure, I store the index of each joint’s parent as integer value in another stl vector:

class Skeleton
{
std::vector<aiMatrix4x4> mParents;
std::vector<int> mTransforms;

â€¦
};


In this convention we can find the transform of a given joint $i$ in mTransforms[i] and it’s parent in mParents[i]. If a given joint doesn’t have a parent I set mParents[i] = -1;. As discussed above, we can now simply compute a joint’s world transform by concatenating it’s local transform with the transforms of all parents:

aiMatrix4x4 Skeleton::getWorldTransform( int bone ) const
{
int p = mParents[bone];
aiMatrix4x4 result = mTransforms[bone];

while( p <= 0 )
{
result = mTransforms[p] * result;
p = mParents[p];
}

return result;
}


# The model renderer v.0.1

Allright, so here I was, having defined a couple of DataConverters like so:

// Used for positions, normals, tangents and bitangents and 3D texcoords
struct aiVector3DConverter : public ArrayDataConverter

// Used for colors
struct aiColor4DConverter : public ArrayDataConverter

// Used for 2D texcoords
struct aiVector3DToFloat2Converter : public ArrayDataConverter


These simple definitions allowed me to convert Vertex positions, normals, Colors and UVs to a DirectX 9 compatible data format. After model loading, I simply create a list of converters for each data stream in the mesh and then gather all vertex elements (via CopyType()) and the actual data (via CopyData()) into the corresponding buffers:


// build vertex declaration
std::vector vertexElements;
{
for( unsigned int i=0; i<converters.size(); ++i )
converters[i]->CopyType( vertexElements );

D3DVERTEXELEMENT9 endElement = D3DDECL_END();
vertexElements.push_back( endElement );

context->Device()->CreateVertexDeclaration( &vertexElements[0],
&result.m_pVertexDeclaration );
}

// now create vertex buffer
{
const int vertexBufferSize = importedMesh->mNumVertices * result.m_VertexSize;
context->Device()->CreateVertexBuffer( vertexBufferSize, 0, 0,
D3DPOOL_DEFAULT, &result.m_pVertexBuffer, NULL );

BYTE* vertexData;
result.m_pVertexBuffer->Lock( 0, 0, reinterpret_cast( &vertexData ), 0 );

BYTE* curOffset = reinterpret_cast( vertexData );

for( unsigned int v=0; vmNumVertices; ++v )
{
for( unsigned int i=0; i<converters.size(); ++i )
converters[i]->CopyData( &meshData.mVertexData[v * meshData.mVertexSize], v );
}

result.m_pVertexBuffer->Unlock();
result.m_VertexCount = importedMesh->mNumVertices;
}


Done! Simple as that. Only the creation of all these converters is still somewhat messy with a check for existence of every stream type. But well… I figured I won’t overcomplicate stuff.

So, given the vertex declaration and the vertex buffer the next step was to create an index buffer. I won’t bother you with the code here as it is really straight forward. The only part that is worth mentioning is that I decided to use 16bit indices if possible i.e. if the number of vertices is less than 0xFFFF = 65535. This is actually a really simple thing to do but quite effective in saving video memory on lots of models. I’ve rarely come across meshes with more than 65535 vertices per material – at least on consoles.

Almost done now! What was missing now was a simple shader and code to set up the GPU.
In terms of shading I decided to go bare bones:

float4x4 WorldViewProjection : WORLDVIEWPROJECTION;
float4 LightDirection        : LIGHTDIRECTION;

{
float4 Position : POSITION;
float3 Normal   : NORMAL;
float2 TexCoord : TEXCOORD0;
};

{
float4 Position : POSITION0;
float3 Normal   : TEXCOORD1;
float2 TexCoord : TEXCOORD0;
};

{
result.Position = mul( float4(input.Position.xyz,1), WorldViewProjection );
result.Normal = input.Normal.xyz;
result.TexCoord = input.TexCoord;

return result;
}

float4 Model_PS( VertexShaderOutput input ) : COLOR0
{
const float ambient = 0.3f;
float diffuse = saturate(dot(input.Normal, LightDirection.xyz ) );

return float4(float3(1,0,0) * (ambient + diffuse),1);
}

technique Model
{
pass P0
{
}
}


So basically a simple lambert shader with some hard coded material parameters. Loading the shader using D3DX is really simple, a call to D3DXCreateEffectFromFile() is enough. And finally we can add the code for actually rendering the model:

void Model::Render( RenderContext* context )
{
for( unsigned int m=0; m<mMeshes.size(); ++m )
{
D3DXHANDLE hTechnique = mGetTechniqueByName( "Model" );
D3DXHANDLE hWorldViewProjection =
mesh.m_pEffect->GetParameterBySemantic( NULL, "WORLDVIEWPROJECTION" );
D3DXHANDLE hLightDirection =
mesh.m_pEffect->GetParameterBySemantic( NULL, "LIGHTDIRECTION" );

context->Device()->SetVertexDeclaration( mesh.m_pVertexDeclaration );
context->Device()->SetStreamSource( 0, mesh.m_pVertexBuffer,
0, mesh.m_VertexSize );
context->Device()->SetIndices( mesh.m_pIndexBuffer );

mesh.m_pEffect->SetMatrix( hWorldViewProjection,
&(context->GetViewMatrix() * context->GetProjectionMatrix()).data );

D3DXVECTOR4 lightDirection = D3DXVECTOR4( 0, 1, 0, 1 );
mesh.m_pEffect->SetVector( hLightDirection, &lightDirection );

mesh.m_pEffect->SetTechnique( hTechnique );

UINT cPasses;
mesh.m_pEffect->Begin( &cPasses, 0 );

for( unsigned int iPass = 0; iPass < cPasses; iPass++ )
{
mesh.m_pEffect->BeginPass( iPass );

HRESULT hr = context->Device()->DrawIndexedPrimitive( D3DPT_TRIANGLELIST,
0, 0, mesh.m_VertexCount, 0, mesh.m_TriangleCount );

mesh.m_pEffect->EndPass();
}

mesh.m_pEffect->End();
}

context->Device()->SetStreamSource( 0, NULL, 0, 0 );
context->Device()->SetIndices( NULL );
}


And that’s it! Lots of code this time 🙂