Rant: Matrix Layouts

Matrices are, by definition, simple mathematical structures consisting of a couple of numbers organized into rows and columns. To the graphics programmer, usually the 4×4 matrix type (4 rows and 4 columns) is of most interest due to it’s ability to perform arbitrary linear transforms on a given (homogenous) 3D vector. Let’s define some notation first. Let M be a 4×4 matrix as follows:

\mathbf{M} = \begin{pmatrix} m_{11} & m_{12} & m_{13} & m_{14} \\ m_{21} & m_{22} & m_{23} & m_{24} \\ m_{31} &m_{32} & m_{33} & m_{34} \\ m_{41} & m_{42} & m_{43} & m_{44} \end{pmatrix} = \begin{pmatrix}m_{11} & m_{21} & m_{31} & m_{41} \\ m_{12} & m_{22} & m_{32} & m_{42} \\ m_{13} &m_{23} & m_{33} & m_{43} \\ m_{14} & m_{24} & m_{34} & m_{44}\end{pmatrix}^T

where the T denotes the transpose, which swaps rows and columns of a given matrix.
Multiplication with a homogenous vector \mathbf{v} = \begin{pmatrix}v_x & v_y & v_z & v_w\end{pmatrix}^T from the right can then be defined as follows

\mathbf{M} \mathbf{v} = \begin{pmatrix}m_{11} v_x + m_{12} v_y + m_{13} v_z + m_{14} v_w \\ m_{21} v_x + m_{22} v_y + m_{23} v_z + m_{24} v_w \\ m_{31} v_x + m_{32} v_y + m_{33} v_z + m_{34} v_w \\ m_{41} v_x + m_{42} v_y + m_{43} v_z + m_{44} v_w \end{pmatrix} = \mathbf{m_{|1}} v_x + \mathbf{m_{|2}} v_y + \mathbf{m_{|3}} v_z + \mathbf{m_{|4}} v_w

where \mathbf{m_{|i}} denotes the i-th matrix column. The multiplication of a vector with a matrix from the right corresponds thus to the sum of the matrix columns, weighted by the vector’s components. Conversely, multiplying a matrix with a vector from the left can be seen as a weighted sum of the matrix rows:

\mathbf{v}^T \mathbf{M} = \begin{pmatrix}m_{11} v_x + m_{21} v_y + m_{31} v_z + m_{41} v_w \\ m_{12} v_x + m_{22} v_y + m_{32} v_z + m_{42} v_w\\ m_{13} v_x + m_{23} v_y + m_{33} v_z + m_{43} v_w\\ m_{14} v_x + m_{24} v_y + m_{34} v_z + m_{44} v_w \end{pmatrix} = \mathbf{m_{\overline{1}}} v_x + \mathbf{m_{\overline{2}}} v_y + \mathbf{m_{\overline{3}}} v_z + \mathbf{m_{\overline{4}}} v_w

where \mathbf{m_{\overline{i}}} denotes the i-th matrix row. Since the transpose operator swaps a matrix’s rows and columns, we can conclude that multiplying a vector by a matrix from the right is equivalent to multiplying the same vector with the matrix’s transpose from the left. This works in accordance with the transposition rules

(\mathbf{M_1} \mathbf{M_2})^T = \mathbf{M_2}^T \mathbf{M_1}^T\quad\text{and}\quad(\mathbf{M}^T)^T = \mathbf{M}

Given these definitions, lets now look at the matrix type we encounter most in graphics programming: Combined rotation and translation transforms. Matrices of this form consist of a 3×3 rotational part \mathbf{R} and a 3×1 translational part \mathbf{t}:

\mathbf{M} = \begin{pmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0} & 1 \end{pmatrix} = \begin{pmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \\ 0 & 0 & 0 & 1\end{pmatrix}

Note that this definition already implicitly defines the multiplication order – it only works when multiplying from the right. Let \mathbf{v} = (v_x, v_y, v_z, 1)^T:

\mathbf{v'} = \mathbf{M} \mathbf{v} = \mathbf{R} \mathbf{v} + \mathbf{t}

If we wanted to multiply \mathbf{v} from the left we’d have to use the transposition rules:

\mathbf{v'} = (\mathbf{M} \mathbf{v})^T = \mathbf{v}^T \mathbf{M}^T

which again illustrates the point I made before: multiplying a matrix by a vector from the right is equivalent to multiplying it’s transpose from the left. Why am I insisting on this fact so much? Guess what: different companies have, as usual not been able to agree on a common standard. As a result, DirectX assumes multiplication from the left and OpenGL and Assimp assume multipliation from the right. This means that we can’t just take a matrix from Assimp to DirectX, we first have to transpose it.

Now, why not make things even more complicated: Lets think about how we actually store our matrices in code! Since computer memory is adressed in a linear fashion we have to map the values of a matrix to a one-dimensional string of numbers. We can do so in two ways: iterate through the matrix column by colum (column-major) or row by row (row-major).

[m_{11} m_{12} m_{13} m_{14} m_{21} \cdots]\xleftarrow{\text{row major}}\begin{pmatrix} m_{11} & m_{12} & m_{13} & m_{14} \\ m_{21} & m_{22} & m_{23} & m_{24} \\ m_{31} &m_{32} & m_{33} & m_{34} \\ m_{41} & m_{42} & m_{43} & m_{44} \end{pmatrix}\xrightarrow{\text{column major}}[m_{11} m_{21} m_{31} m_{41} m_{12} \cdots]

And guess what: Once again the difference between the two corresponds to a transposition, i.e. if you read a row-major matrix as column-major you’ll actually have that matrix’s transpose. So let the fun begin, lets convert matrices from one software package to another: the first mistake you can make is to mix up the memory layouts when copying the matrix data. This will result in all your matrices being transposed. The second thing that can go wrong is the multiplication order: multiply from left instead of from the right and vice versa. Yet another transposition of the matrix. And the cruel thing is, if you are ‘in luck’ and you make both mistakes at once you might actually never notice because double transposition cancels out as shown before. Only once you need to access individual matrix values you’ll realize that something is off and you might scratch your head for a while looking for the reason. So it really pays off to figure out matrix memory layout and multiplication order before starting to mix matrices of two different software packages.

Import data conversion

Now that I had a basic application running and managed to import model data, I needed to look into how to get said data into a DirectX compatible format. So, without going into too many details: For each mesh I needed a Vertex Buffer, a Vertex Declaration and an Index Buffer. Since each mesh can come with a variety of different vertex data, I opted to implement a somewhat generic approach: I added an abstraction layer called DataConverter. A DataConverter has an input data type (either a assimp value type or an array of assimp values) and a function Copy() which will convert the data to a desired type and store it into a target array. Data Converters also know about the corresponding DirectX Vertex formats, they can generate a list of D3DVERTEXELEMENT9 elements and copy them to an array via the function CopyType(). Here’s the interface definition for a basic Data Converter

class DataConverter
    D3DDECLUSAGE mUsageType;
    int mUsageIndex;
    int mOffset;

    DataConverter( D3DDECLUSAGE usageType, int usageIndex, int offsetInBytes )
        : mUsageType( usageType )
        , mUsageIndex( usageIndex )
        , mOffset( offsetInBytes )

    virtual int Size() = 0;
    virtual void CopyType( std::vector<D3DVERTEXELEMENT9>& out_Type ) = 0;
    virtual void CopyData( BYTE* destination, int index ) = 0;

In order to make working with assimp value arrays easier I derived an ArrayDataConverter from DataConverter

template< typename T >
class ArrayDataConverter : public DataConverter
    const T* mSourceData;
    const int mSourceSize;

    ArrayDataConverter( D3DDECLUSAGE usageType, int usageIndex, int& offsetInBytes,
            const T* sourceData, int sourceSize )
        : DataConverter( usageType, usageIndex, offsetInBytes )
        , mSourceData( sourceData )
        , mSourceSize( sourceSize ) {}

    const T& GetElement( int index )
        assert( index >= 0 && index < mSourceSize );
        return mSourceData&#91;index&#93;;

which basically adds a <code>GetElement()</code> function that lets us retrieve an element from the underlying source data. So, give these two definitions here’s the simple and elegant definition of the

struct aiVector3DToFloat2Converter : public ArrayDataConverter<aiVector3D>
    aiVector3DToFloat2Converter( D3DDECLUSAGE usageType, int usageIndex,
        int& offsetInBytes, const aiVector3D* sourceData, int sourceSize )
    : ArrayDataConverter<aiVector3D>( usageType, usageIndex, offsetInBytes, 
        sourceData, sourceSize ) 

    int Size() { return 2 * sizeof(float); }

    virtual void CopyType( std::vector<D3DVERTEXELEMENT9>& out_Type )
        D3DVERTEXELEMENT9 _result = 
            0, mOffset, D3DDECLTYPE_FLOAT2,
            D3DDECLMETHOD_DEFAULT, mUsageType, mUsageIndex 

        out_Type.push_back( _result );

    void CopyData( BYTE* destination, int elementIndex )
        const aiVector3D& element = GetElement( elementIndex );

        float data[] = { element[0], element[1] };
        memcpy( destination + mOffset, data, Size() );

Some more praise to the Assimp developers: The scene data is stored in a very understandable and easy to access format. Only took me a couple of hours to get that into DirectX compatible buffers. Great work guys!!


Model Import with Assimp

Here we go: I’ve decided to make the sample open source and host it on google code. Hopefully this step will raise public awareness and make it easier for other developers to download and reuse the code.

As a first step I’ve hijacked a DirectX 9 programming assignment I did a while ago and removed all assignment specific code. What I’m left with at this point is a bare bones program skeleton of a windowed DirectX 9 application, a frame rate timer and a very rudimentary display list implementation. I’ve also added Assimp to the project and the first test model import worked fine – well at least it didn’t throw any errors. I don’t have a renderer yet so there’s no way of verifying the imported data 🙂

The model import itself is surprisingly simple: Create an instance of Assmp::Importer and call Assimp::Importer::ReadFile() [The following code is mostly taken from the Assimp tutorial]

Assimp::Importer modelImporter;
const aiScene* scene = modelImporter.ReadFile( mFileName,
aiProcess_CalcTangentSpace |
aiProcess_Triangulate |
aiProcess_JoinIdenticalVertices |
aiProcess_ConvertToLeftHanded |

// If the import failed, report it
if( !scene)
std::cout << modelImporter.GetErrorString() << std::endl; MessageBox( NULL, modelImporter.GetErrorString(), "import failed", MB_OK ); return false; } [/sourcecode] That’s it. Obviously there’s a million parameters you can pass to ReadFile() but they are mostly self explaining. The most important one in my setting would be aiProcess_ConvertToLeftHanded because Assimp assumes a Right Handed Coordinate System whereas DirectX uses a left hand coordinate system by default.

In case the import fails Assimp::ModelImporter returns a NULL pointer and you can get extended error information via Assimp::ModelImporter::GetErrorString().


New Project

Exciting stuff ahead: I’ve decided to start working on a new project!! To be honest: it’s about time, I haven’t worked on any personal stuff since my BSP tree sample and I’m itching to get my hands on some exciting project. Even better still: My friend Ryan Lewis, an amazing 3D artist and tech guy is on board as well!

So, what will it be? Well, this time I’m going to tackle something new: Skinned animation. A while ago, I’ve come across an interesting paper about using dual quaternions to represent rigid body motion. As we all know, unit Quaternions are an efficient way to represent rotations in space, eliminating a lot of the problems inherent to matrix representations.  Unit dual quaternions extend the space of representable transforms to include translations as well. As it turns out, interpolating between rigid body motions can be approximated by something as simple as computing a weighted average of the corresponding unit dual quaternions and a normalization of the result: which makes the use of dual quaternions for skinning very efficient and suitable for GPU implementation.

Another more recent paper that caught my attention is Crytek’s definition of QTangents. While based on the pretty straight forward idea of representing Tangent Space by a unit Quaternion, Crytek applied their engineering ingenuity and managed to losslessly compress the resulting unit quaternions down to four floating point components without introducing singularities. In their optimized setting this brings the data requirements for tangent spaces from eight 16bit floating point values down to four 16 bit floating point values. Doesn’t sound like much, but consider the hundred thousands of vertices that get pushed through the graphics pipeline every frame. And remember, on consoles every byte of main and video memory counts!