Normal Mapping

Normal mapping is a highly popular technique in video game development and real time rendering applications where the polygon count needs to be kept as low as possible. It allows us, at reasonable cost, to add high frequency surface detail to arbitrary surfaces of low polygon count. The basic idea is simple: Encode the extra detail in an additional texture and use during shading.

In the case of normal mapping we simulate additional surface detail by storing the orientation of the surface normal at each texel. This gives us much more fine grained normal information than vertex normals as the texture covers the model in a much more fine-grained fashion. The straight forward way to accomplish this would be to simply store the exact surface normal at each texel. At run-time all we’d have to do is sample the texture in the fragment shader and replace the interpolated normal with the sampled one. This approach is usually called Object Space Normal Mapping as the normals are defined relative to the whole object. While being simple and efficient, this approach has some significant draw-backs: We cannot animate the vertices of the model as this would change their world space orientation and we cannot reuse the same map on mirrored geometry. As a result most games and applications use Tangent Space Normal Mapping where, as the name implies, the normals are specified relative the the object’s local tangent space.

Have a look at the image above: We define local tangent space by the orthogonal basis vectors ‘normal’, ‘tangent’ and ‘binormal’. Relative to this coordinate system, we can represent any other normal vector (in red the image above) by a linear combination of the three basis vectors:

\mathbf{n}' = b \mathbf{B}_{binormal} + t \mathbf{B}_{tangent} + n \mathbf{B}_{normal}

where \mathbf{B}_{binormal} denotes the basis vector called ‘binormal’, etc. We will thus store in our normal map the coefficients of the linear combination: \{b, t, n\} and evaluate the equation above in the fragment shader. Though, where will we get the basis vectors from? Usually the tangent space basis vectors are computed during model export or import. The standard procedure does so by aligning tangent and binormal with the direction of the UV coordinates and computing the normal vector by a cross product of tangent and binormal. Details can be found in various publications. Once the tangent space basis vectors are known for each vertex, they are passed to the vertex shader. The vertex shader transforms the vectors into world space according to it’s associated world space transform and then passes them on to the fragment shader. So.. Let’s have a look at some code:

In order to get normal, tangent and binormal data into the DirectX vertex buffer I added converters for each vector to the mesh importer:

if( mesh->HasTangentsAndBitangents() )
    converters.push_back( new aiVector3DConverter( D3DDECLUSAGE_TEXCOORD, 5, 
        offset, mesh->mNormals, vertexCount ) );
    offset += converters.back()->Size();

    converters.push_back( new aiVector3DConverter( D3DDECLUSAGE_TEXCOORD, 6, 
        offset, mesh->mTangents, vertexCount ) );
    offset += converters.back()->Size();

    converters.push_back( new aiVector3DConverter( D3DDECLUSAGE_TEXCOORD, 7, 
        offset, mesh->mBitangents, vertexCount ) );
    offset += converters.back()->Size();

Note that they are sent to the GPU via the TEXCOORD5TEXCOORD7 semantics. The vertex shader input needed to reflect this fact, so I extended the VertexShaderInput struct with the definition of tangent frame data:

struct TangentFrame
    float3 Normal     : TEXCOORD5;
    float3 Tangent    : TEXCOORD6;
    float3 Binormal   : TEXCOORD7;

struct VertexShaderInput
    float4 Position                    : POSITION;
    TangentFrame TangentFrameData;
    float2 TexCoord                    : TEXCOORD0;
    float4 BlendWeights                : BLENDWEIGHT0;
    uint4 BlendIndices                 : BLENDINDICES0;

The Vertex shader itself now needs to transform the tangent space vectors into world space and pass them on to the pixel shader:

struct VertexShaderOutput
    float4 Position                    : POSITION0;
    float2 TexCoord                    : TEXCOORD0;
    float3x3 TangentFrame              : TEXCOORD2;

VertexShaderOutput Model_VS( VertexShaderInput input )
    VertexShaderOutput result;
    result.TexCoord = input.TexCoord;

    float3x4 blendedTransform =
        BoneTransforms[input.BlendIndices.x] * input.BlendWeights.x +
        BoneTransforms[input.BlendIndices.y] * input.BlendWeights.y +
        BoneTransforms[input.BlendIndices.z] * input.BlendWeights.z +
        BoneTransforms[input.BlendIndices.w] * input.BlendWeights.w;

    // position
        float4 posH = float4(, 1.f );
        float4 blendedPosition = mul( posH, blendedTransform );

        result.Position =  mul( blendedPosition, ViewProjection );

    // tangent space
        float3x3 tf = float3x3(

        result.TangentFrame = mul( tf, transpose((float3x3)blendedTransform) );

    return result;

The tangent space vectors, now in world space, are then interpolated across the current polygon and then passed on to the fragment shader. In the fragment shader we sample the normal map and compute the surface normal according to the equation above. Beware of an implementation detail: Depending on the texture format, individual texels store values in the range of \{0,\dots,1\}. Our tangent space coefficients, however, can range from \{-1\dots1\}. So we need to remap the texel values before plugging them into the equation above:

float4 Model_PS( VertexShaderOutput input ) : COLOR0
    float3 textureNormal = tex2D( NormalSampler, input.TexCoord ).xyz*2-1;
    float3 normal = normalize( mul( textureNormal, input.TangentFrame ) );

    const float ambient = 0.75f;
    float diffuseFactor = 0.25f;

    float diffuse = dot(normal, );
    float4 textureColor = tex2D( DiffuseSampler, input.TexCoord );

    return textureColor * (ambient + diffuse * diffuseFactor);

Lets look at the result in pictures: Below you can see the albedo texture for frank’s shoes and the corresponding normal map. 
Next you can see the frank model with a simple lambert shader with and without normal map. You can clearly make out the additional detail the normal map adds to the model.


Dual Quaternion Skinning

In the last two posts we’ve reviewed skeletal animation and smooth skinning based on matrices (which, by incidence, is usually called linear blend skinning), so now we’re ready to look at an alternative: dual quaternion skinning. Linear blend skinning has some serious disadvantages that come down to one simple truth: The weighted average W' of all bone transforms that influence one vertex

\mathbf{v}' = \sum_i w_i \mathbf{W}_i(\mathbf{v}) = \underbrace{(\sum_i w_i \mathbf{W}_i )}_{\mathbf{W}'}(\mathbf{v}) = \mathbf{W}'(\mathbf{v})

is simply not a satisfying method of blending the transforms of multiple bones. I am going to discuss the issues in more detail in another post later, but let’s just say for now that the linear blending of rotation matrices does result in a number of undesired artifacts.

Dual Quaternions provide a solution for some of these artifacts, as the interpolation of rotations also blends the rotation centers. Le me point you to the following paper, which provides some more in-depth explanation as well as an upper error bound on the accuracy of rotation interpolation.

Without going into too much detail, let me point out that dual quaternions are an extension to regular quaternions, where two quaternions \mathbf{q}_r and \mathbf{q}_d are concatenated via the dual element \epsilon:

\hat{\mathbf{q}} = \mathbf{q}_r + \epsilon \mathbf{q}_d, \quad \epsilon^2 = 0

What makes dual quaternions so interesting is the fact that each unit dual quaternion can be interpreted as a rigid body tranform representing a rotation and a translation. I’m going to spare you with the formulas for addition, multiplication, inverse etc of dual quaternions here but lets just say that knowing the fundamental property of \epsilon^2 = 0 we can easily derive them ourselves. As the paper also points out, the concatenation of two rigid body transforms represented by unit dual quaternions is accomplished by multiplying both dual quaternions. The weighted blending of multiple unit dual quaternions corresponds to an approximate interpolation of the corresponding transforms. Perfect circumstances for skinning!!

So lets get our hands dirty and dive into some code: In principle, all I am doing is replace the current code representation of a joint transform (4×3 Matrix) by a Dual Quaternion. Since I wanted to keep the linear blend skinning code path and I wanted to avoid copy/pasting lots of code, I decided to templetize the shared code, making the joint transform type a variable parameter.So as a first step I updated the skeleton class:

template<class BoneTransform>
class SkeletonGeneric
    std::vector<int> mParents;
    std::vector<BoneTransform> mTransforms;
    std::vector<BoneTransform> mBindingTransforms;

    virtual BoneTransform Concatenate( const BoneTransform& first, 
                                       const BoneTransform& second ) const
        return second * first;


Note the new function Concatenate(), which is from now on used everywhere where two bone transforms need to be concatenated. I provide a default implementation that will just apply the multiplication operator: BoneTransform::operator*, but this definition can be easily overridden to do more specialized stuff.

Next, I defined a dual quaternion class and implemented all the basic dual quaternion math. Have a look at math.cpp in case you’re interested. The dual quaternion class also posesses a couple of specialized constructors that allow conversion of animation data (translation, rotation pairs) and binding transform data (4×3 matrix) into dual quaternion form. I also had to update the renderer and the UI in order to support switching between linear blend skinning and dual quaternion skinning on keypress, but I am going to leave out these boring details.

These steps concluded the CPU implementation: evaluating animation channels, computing joint world space transforms and uploading the transform data to the GPU work just the same as before. The only difference being that the amount of data that needs to be sent to the GPU every frame is quite a bit smaller: 8 float values instead of 12 per joint.

On the GPU side I needed to update the code for blending the bone transforms and the transformation of the vertex position and normal to work with dual quaternions. Blending is quite simple: Compute a weighted average of the bone transforms and renormalize:

float2x4 GetBlendedDualQuaternion( uint4 boneIndices, float4 boneWeights )
    float2x4 dq0 = GetBoneDualQuaternion( boneIndices.x );
    float2x4 dq1 = GetBoneDualQuaternion( boneIndices.y );
    float2x4 dq2 = GetBoneDualQuaternion( boneIndices.z );
    float2x4 dq3 = GetBoneDualQuaternion( boneIndices.w );

    float2x4 blendedDQ =
        dq0 * boneWeights.x +
        dq1 * boneWeights.y +
        dq2 * boneWeights.z +
        dq3 * boneWeights.w;

    float normDQ = length(blendedDQ[0]);
    return blendedDQ / normDQ;

Simliar to regular quaternions, a vertex is transformed via the ‘sandwich’ type operation:

\mathbf{\hat{v}}' = \mathbf{\hat{q}}\mathbf{\hat{v}}\overline{\mathbf{\hat{q}}}^*

where \hat{\mathbf{v}} is the vertex in dual quaternion form (just a translation) and \overline{\hat{q}}^* is the conjugate of q (well actually both types of conjugates that exist for dual quaternions). As shown in the paper, this operation can be simplified to

\mathbf{v}' = \mathbf{v} + 2 \overrightarrow{\mathbf{q}}_r \times ( \overrightarrow{\mathbf{q}}_r \times \mathbf{v} + \dot{q}_r \mathbf{v} ) + 2 [ \dot{q}_r \overrightarrow{\mathbf{q}}_d - \dot{q}_d \overrightarrow{\mathbf{q}}_r + \overrightarrow{\mathbf{q}}_r \times \overrightarrow{\mathbf{q}}_d ]

where \overrightarrow{\mathbf{q}} denotes the ‘imaginary’ or ‘vector’ part of quaternion \mathbf{q} and \dot{q} denotes the scalar part. Assuming that our dual quaternions only ever perform rotations and translations, the transform for the normal can be written as

\mathbf{n}' = \mathbf{n} + 2 \overrightarrow{\mathbf{q}}_r \times ( \overrightarrow{\mathbf{q}}_r \times \mathbf{n} + \dot{q}_r \mathbf{n} )

which corresponds to only evaluating the rotational part of the dual quaternion. So given these definitions I can now present the relevant parts of the vertex shader:

float3 transformPositionDQ( float3 position, float4 realDQ, float4 dualDQ )
    return position +
        2 * cross(, cross(, position) + realDQ.w*position ) +
        2 * (realDQ.w * - dualDQ.w * + 

float3 transformNormalDQ( float3 normal, float4 realDQ, float4 dualDQ )
    return normal + 2.0 * cross(, cross(, normal ) + 
                          realDQ.w * normal );

VertexShaderOutput DualQuaternionSkinning_VS( VertexShaderInput input )
    VertexShaderOutput result;
    result.TexCoord = input.TexCoord;
    result.Color = float4( 0.75, 0.75, 0.75, 1 );

    // blend bone DQs
    float2x4 blendedDQ
        = GetBlendedDualQuaternion( input.BlendIndices, input.BlendWeights );

    // transform position into clip space
    float3 blendedPosition = transformPositionDQ(, 
        blendedDQ[0], blendedDQ[1] );

    result.Position = mul( float4(, 1), ViewProjection );

    // transform normal vector into world space
    result.Normal = transformNormalDQ( input.Normal, blendedDQ[0], blendedDQ[1] );
    return result;


Skinning Reviewed

Last time we’ve reviewed skeletal animation. Lets now get into how we can deform a mesh based on that skeleton. What we want to achieve is that each mesh’s vertex moves along when the skeleton is animated. So the simplest idea is attach each vertex to one bone and transform the vertex’s position by that bone’s world transform. I.e the transformed position of a vertex v, attached to bone i with world space transform W_i is

\mathbf{v}' = \mathbf{W}_i( \mathbf{v} )

Remember though: Joint transforms always expect data relative to the joint’s local coordinate space. This means that we need to store the mesh’s vertices in the local coordinate system of their associated joint! Not a great restriction because our modeling tool can do that for us during export, but that also means that we cannot render the geometry without the skeleton.

Now, attaching each vertex to one bone has some serious limitations: It’s simply not a realistic model for most common materials. This is especially true for materials like skin or cloth. Consider human skin in joint areas: it stretches and wrinkles all over the place. So let’s make our model a little bit more complicated: Let’s allow each vertex to be influenced by more than one bone. Let us thus, for each vertex, define a list of bones \{\mathbf{W}_1,\dots,\mathbf{W}_N\} the vertex is attached to and a weight factor w_i, i \in \{1,\dots,N\} for each bone in that list. Provided that the weight factors are normalized \sum_i w_i = 1, we can then simply transform the vertex once by each bone and sum up the result, weighted by the weight factor

\mathbf{v}' = \sum_i w_i \mathbf{W}_i( \mathbf{v} )

Beware though: Aren’t we forgetting something? Right, joint transforms expect their data relative to their local coordinate system. But how can we accomplish that, given that each vertex can only be defined relative to one coordinates system? Of course, we could store N copies of the mesh but that would be a huge waste of memory. So let’s instead store our vertices in world space and transform them into joint local space whenever required. This can be accomplished by adding yet another transformation for each bone, the so called Binding Transform \mathbf{B}_i. This yields

\mathbf{v}' = \sum_i w_i \mathbf{W}_i * \mathbf{B}_i( v )

You might ask yourself: How the heck am I going to come up with these binding transforms? Well, the answer is simple: The binding transforms are just a snapshot of the skeleton’s world space joint transform at the time of binding. So whenever the artist poses the model and then binds the skin, the world space joint transforms are recorded for each bone and stored. Note that since a bone’s world space transform \mathbf{W}_i maps data from joint local space to world space, the corresponding binding transform is actually the inverse of \mathbf{W}_i

\mathbf{B}_i = \mathbf{W}_i^{-1}

Let’s look at some code now, hopefully this will help make things more clear. First we need to extend the existing skeleton class to store the binding transform for each bone.

class Skeleton
    std::vector<aiMatrix4x4> mParents;
    std::vector<aiMatrix4x4> mBindingTransforms;
    std::vector<int> mTransforms;


Now we can modify the getWorldTransform method to take into account the binding transform

aiMatrix4x4 Skeleton::getWorldTransform( int bone ) const
    int p = mParents[bone];
    aiMatrix4x4 result = mTransforms[bone] * mBindingTransforms[bone];

    while( p >= 0 )
        result = mTransforms[p] * result;
        p = mParents[p];

    return result;

Next, each vertex needs to know the index of the bones it is attached to, as well as the weighting factors. So I extended the vertex declaration to include a UBYTE4 vector storing up to four bone indices per vertex and a FLOAT4 vector storing the corresponding weight factors. In fact, I created a new type of data converter, which computes both values at once. Currently I am passing the bone indices and weights via the D3DDECLUSAGE_BLENDINDICES and D3DDECLUSAGE_BLENDWEIGHT semantics to the vertex shader. The bone matrices themselves are passed to the shader via an array of vertex shader constants. The vertex shader now looks like this (unimportant parts stripped for the sake of clarity):

float4 LightDirection : LIGHTDIRECTION;
float4x4 BoneTransforms[16] : BONETRANSFORMS;

struct VertexShaderInput
    float4 Position            : POSITION;
    float3 Normal              : NORMAL;
    float2 TexCoord            : TEXCOORD0;
    float4 BlendWeights        : BLENDWEIGHT0;
    uint4 BlendIndices         : BLENDINDICES0;

VertexShaderOutput Model_VS( VertexShaderInput input )
    VertexShaderOutput result;
    result.Normal =;
    result.TexCoord = input.TexCoord;
    result.Color = input.BlendWeights;

    float4 posH = float4(, 1.f );
    float4 blendedPosition =
        mul( posH, BoneTransforms[input.BlendIndices.x] ) * 
            input.BlendWeights.x + 
        mul( posH, BoneTransforms[input.BlendIndices.y] ) * 
            input.BlendWeights.y + 
        mul( posH, BoneTransforms[input.BlendIndices.z] ) * 
            input.BlendWeights.z + 
        mul( posH, BoneTransforms[input.BlendIndices.w] ) *

    result.Position =  mul( blendedPosition, ViewProjection );
    return result;


Skeletal Animation Reviewed

And now it’s getting interesting: Let’s look into how we can animate our models! The by far most popular method for animating rigid bodies in real-time is called skeletal animation, mostly due to its simplicity and run-time efficiency. As the name already suggests: We associate our model with a skeleton and perform all deformations based on this skeleton. Notice the similarity to human beings: our motions are constrained and guided by a skeleton as well.

Taking a closer look at the human skeleton, we can identify two major parts: joints and bones. Bones are rigid structures that don’t deform: Put too much pressure on a bone and it will break – something almost every one of us has experienced already :(. Joints in turn are dynamic: they can rotate with various degrees of freedom (e.g. shoulder joint vs. elbow joint). Each bone is attached to at least one joint and thus a rotation of the joint will cause a rotation of the attached bones. If you rotate your elbow joint for example you’ll see that the directly attached bones (Radius and Ulna, which connect the elbow joint with the wrist joint) are rotated around the elbow joint’s local rotation axis. But also the indirectly attached bones like in your fingers and wrist are rotated around the same axis as well. Our skeleton thus defines a hierarchy where the rotation of one joint will also rotate all joints (and bones) in the hierarchy ‘below’. Lets illustrate this with a simple example: Take a simple cylinder (1). We create a skeleton consisting of 4 bones and 5 joints and attach it to the cylinder (2). The skeleton hierarchy is simple: each bone has one child (3). Now let’s rotate joint 2 by a couple of degrees. All joints below joint 2 will rotate around the local rotation axis defined by joint 2, resulting in the deformed cylinder as shown in (4).

Left to right: cylinder (1), cylinder with skeleton (2), skeleton hierarchy (3), deformed cylinder (4)

So let’s recap what we’ve gathered so far:

  • A skeleton represents a hierarchy of joints and bones
  • Each joint can rotate around a local rotation axis
  • The rotation of a given joint will cause rotation of all joints in the hierarchy below
  • The mesh is bound to the hierarchy such that it will deform with it.

Now how can we model this mathematically? Let’s look at our wrist joint again and remember that it is connected to the elbow joint via the Radius bone. Consider a point in the local coordinate system of the wrist joint: We can express it’s position relative to the local coordinate system of the elbow joint by rotating it around the wrist joint and then translating it along the Radius bone:

\mathbf{p}' = \mathbf{R}_{Wrist}(\mathbf{p} ) + \mathbf{T}_{Radius}

Going one step up the hierarchy, we can express it’s position relative to the local coordinate system of the shoulder joint by rotating \mathbf{p}' around the elbow joint and translating it along the Humerus bone:

\mathbf{q}' = \mathbf{R}_{Elbow}( \mathbf{p}' ) + \mathbf{T}_{Humerus}

Inserting \mathbf{p}' into the formula for \mathbf{q}' yields

\mathbf{q}' = \mathbf{R}_{Elbow}( \mathbf{R}_{Wrist}( \mathbf{p} ) + \mathbf{T}_{Radius} ) + \mathbf{T}_{Humerus}

which simply corresponds to a concatenation of the transforms for the wrist joint and the elbow joint. This example shows that if we express each joint’s transform in the local coordinate system of it’s parent, it is extremely simple to transform a point local to one joint into world space: all we need to do is concatenate the joint’s transform with the transforms of it’s predecessors in the skeleton hierarchy and transform the point by the result.
Formally, defining the rotation and translation transform of joint i as \mathbf{A}_i and the concatenation of two transforms as * we can write the world space transform of joint i as

\mathbf{W}_i = \mathbf{A}_1 * \mathbf{A}_2 * \dots * \mathbf{A}_i

for the sequence 0,1,2,..,i-1 of parents of joint i. This gives us a simple method of computing the world space transforms of all joints in the skeleton. In order to deform the skeleton over time, all we need to do is animate the local transforms \mathbf{A}_i of the joints. This is usually done via keyframe animation, where we store a sequence of deformation parameters over time. For example, if we want to animate the bend of the elbow joint we’d store the sequence of rotation values for the elbow joint along with the corresponding animation time as a sequence of (rotation,time) values. At runtime we’d search the sequence for the rotation value that best matches the current animation time and then change the rotation of \mathbf{A}_i to represent that value.

Please note that we’ve implicitly encoded the rotation of a given joint and the translation of it’s corresponding bone into one transform. Thus, in the so defined skeleton we don’t need to distinguish between bones and joints anymore and we will use the term bone and joint interchangeably.

Let’s look at some code now: I defined a character’s skeleton as a simple class that stores the list of joint transforms as a stl vector of 4×4 Matrices. In order to represent the hierarchy structure, I store the index of each joint’s parent as integer value in another stl vector:

class Skeleton
    std::vector<aiMatrix4x4> mParents;
    std::vector<int> mTransforms;


In this convention we can find the transform of a given joint i in mTransforms[i] and it’s parent in mParents[i]. If a given joint doesn’t have a parent I set mParents[i] = -1;. As discussed above, we can now simply compute a joint’s world transform by concatenating it’s local transform with the transforms of all parents:

aiMatrix4x4 Skeleton::getWorldTransform( int bone ) const
int p = mParents[bone];
aiMatrix4x4 result = mTransforms[bone];

while( p <= 0 ) { result = mTransforms[p] * result; p = mParents[p]; } return result; } [/sourcecode]


Rant: Matrix Layouts

Matrices are, by definition, simple mathematical structures consisting of a couple of numbers organized into rows and columns. To the graphics programmer, usually the 4×4 matrix type (4 rows and 4 columns) is of most interest due to it’s ability to perform arbitrary linear transforms on a given (homogenous) 3D vector. Let’s define some notation first. Let M be a 4×4 matrix as follows:

\mathbf{M} = \begin{pmatrix} m_{11} & m_{12} & m_{13} & m_{14} \\ m_{21} & m_{22} & m_{23} & m_{24} \\ m_{31} &m_{32} & m_{33} & m_{34} \\ m_{41} & m_{42} & m_{43} & m_{44} \end{pmatrix} = \begin{pmatrix}m_{11} & m_{21} & m_{31} & m_{41} \\ m_{12} & m_{22} & m_{32} & m_{42} \\ m_{13} &m_{23} & m_{33} & m_{43} \\ m_{14} & m_{24} & m_{34} & m_{44}\end{pmatrix}^T

where the T denotes the transpose, which swaps rows and columns of a given matrix.
Multiplication with a homogenous vector \mathbf{v} = \begin{pmatrix}v_x & v_y & v_z & v_w\end{pmatrix}^T from the right can then be defined as follows

\mathbf{M} \mathbf{v} = \begin{pmatrix}m_{11} v_x + m_{12} v_y + m_{13} v_z + m_{14} v_w \\ m_{21} v_x + m_{22} v_y + m_{23} v_z + m_{24} v_w \\ m_{31} v_x + m_{32} v_y + m_{33} v_z + m_{34} v_w \\ m_{41} v_x + m_{42} v_y + m_{43} v_z + m_{44} v_w \end{pmatrix} = \mathbf{m_{|1}} v_x + \mathbf{m_{|2}} v_y + \mathbf{m_{|3}} v_z + \mathbf{m_{|4}} v_w

where \mathbf{m_{|i}} denotes the i-th matrix column. The multiplication of a vector with a matrix from the right corresponds thus to the sum of the matrix columns, weighted by the vector’s components. Conversely, multiplying a matrix with a vector from the left can be seen as a weighted sum of the matrix rows:

\mathbf{v}^T \mathbf{M} = \begin{pmatrix}m_{11} v_x + m_{21} v_y + m_{31} v_z + m_{41} v_w \\ m_{12} v_x + m_{22} v_y + m_{32} v_z + m_{42} v_w\\ m_{13} v_x + m_{23} v_y + m_{33} v_z + m_{43} v_w\\ m_{14} v_x + m_{24} v_y + m_{34} v_z + m_{44} v_w \end{pmatrix} = \mathbf{m_{\overline{1}}} v_x + \mathbf{m_{\overline{2}}} v_y + \mathbf{m_{\overline{3}}} v_z + \mathbf{m_{\overline{4}}} v_w

where \mathbf{m_{\overline{i}}} denotes the i-th matrix row. Since the transpose operator swaps a matrix’s rows and columns, we can conclude that multiplying a vector by a matrix from the right is equivalent to multiplying the same vector with the matrix’s transpose from the left. This works in accordance with the transposition rules

(\mathbf{M_1} \mathbf{M_2})^T = \mathbf{M_2}^T \mathbf{M_1}^T\quad\text{and}\quad(\mathbf{M}^T)^T = \mathbf{M}

Given these definitions, lets now look at the matrix type we encounter most in graphics programming: Combined rotation and translation transforms. Matrices of this form consist of a 3×3 rotational part \mathbf{R} and a 3×1 translational part \mathbf{t}:

\mathbf{M} = \begin{pmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0} & 1 \end{pmatrix} = \begin{pmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \\ 0 & 0 & 0 & 1\end{pmatrix}

Note that this definition already implicitly defines the multiplication order – it only works when multiplying from the right. Let \mathbf{v} = (v_x, v_y, v_z, 1)^T:

\mathbf{v'} = \mathbf{M} \mathbf{v} = \mathbf{R} \mathbf{v} + \mathbf{t}

If we wanted to multiply \mathbf{v} from the left we’d have to use the transposition rules:

\mathbf{v'} = (\mathbf{M} \mathbf{v})^T = \mathbf{v}^T \mathbf{M}^T

which again illustrates the point I made before: multiplying a matrix by a vector from the right is equivalent to multiplying it’s transpose from the left. Why am I insisting on this fact so much? Guess what: different companies have, as usual not been able to agree on a common standard. As a result, DirectX assumes multiplication from the left and OpenGL and Assimp assume multipliation from the right. This means that we can’t just take a matrix from Assimp to DirectX, we first have to transpose it.

Now, why not make things even more complicated: Lets think about how we actually store our matrices in code! Since computer memory is adressed in a linear fashion we have to map the values of a matrix to a one-dimensional string of numbers. We can do so in two ways: iterate through the matrix column by colum (column-major) or row by row (row-major).

[m_{11} m_{12} m_{13} m_{14} m_{21} \cdots]\xleftarrow{\text{row major}}\begin{pmatrix} m_{11} & m_{12} & m_{13} & m_{14} \\ m_{21} & m_{22} & m_{23} & m_{24} \\ m_{31} &m_{32} & m_{33} & m_{34} \\ m_{41} & m_{42} & m_{43} & m_{44} \end{pmatrix}\xrightarrow{\text{column major}}[m_{11} m_{21} m_{31} m_{41} m_{12} \cdots]

And guess what: Once again the difference between the two corresponds to a transposition, i.e. if you read a row-major matrix as column-major you’ll actually have that matrix’s transpose. So let the fun begin, lets convert matrices from one software package to another: the first mistake you can make is to mix up the memory layouts when copying the matrix data. This will result in all your matrices being transposed. The second thing that can go wrong is the multiplication order: multiply from left instead of from the right and vice versa. Yet another transposition of the matrix. And the cruel thing is, if you are ‘in luck’ and you make both mistakes at once you might actually never notice because double transposition cancels out as shown before. Only once you need to access individual matrix values you’ll realize that something is off and you might scratch your head for a while looking for the reason. So it really pays off to figure out matrix memory layout and multiplication order before starting to mix matrices of two different software packages.