Maths - Matrix Calculus

Prerequisites

If you are not familiar with matrix algebra you may like to look at the following pages first:

matrix algebra

In the applications section it is important to realise there are different standards and conventions and therefore it is a good idea to familarise yourself with the standards used on this site:

standards

Matrix differentiation

To differentiate a matrix with respect to a variable, say 'x', we individually differentiate each element with respect to 'x'. So if:

[f(x)]=

f00(x)	f01(x)	f02(x)
f10(x)	f11(x)	f12(x)
f20(x)	f21(x)	f22(x)

then:

[ d f(x) / dx]=

d f00(x) / dx	d f01(x) / dx	d f02(x) / dx
d f10(x) / dx	d f11(x) / dx	d f12(x) / dx
d f20(x) / dx	d f21(x) / dx	d f22(x) / dx

So to give a more specific example if:

[f(x)]=

xⁿ	sin(x)	tan(x)
e^x	x²	x³
cos(x)	3	0

then:

[ d f(x) / dx]=

n*x^n-1	cos(x)	sec²(x)
e^x	2*x	3*x²
-sin(x)	0	0

So this is quite simple, provided that we can differentiate the elements of a matrix, we can differentiate the whole matrix.

Matrix Differentiation with respect to another Matrix

Since division of one square matrix, with non-zero determinant, by another will give a result (unlike vectors) we can define differentiation with respect to another matrix.

What are the rules of such differentiation? What applications does it have?

gears

Could we use it in this situation? Imagine that two matrices represent the angular position of two gears, can we differentiate one with respect to another to get the ratio of the gears?

Jacobian matrix

There are more complicated types of differentiation, for instance the Jacobian which gives all combinations of the partial differentials of elements of the vector with the other elements.

J (x1,x2...xn) =

d y1 / dx1	...	d y1 / dxn
...	...	...
d yn / dx1	...	d yn / dxn

This is related to vector calculus such as grad, div and curl.

Applications

Here we are considering the simple differentiation of a matrix at the top of this page. However, when we start to look at applications it becomes less simple.

For instance, with linear movement we just use v = dx/dt and we treat velocity v as being the same thing as dx/dt but with rotation the equation is more complicated:
[~w]*[R(t)] = [d R(t) / dt]

This is also discussed on the angularvelocity page and in particular Mark Ioffe was kind enough to send me a proof of this, see this page.

What I want to do is understand the deeper reasons for this extra complexity. I think this involves these factors:

These are time varying quantities, of course v = dx/dt works for time varying quantities but at least if we have constant velocity (and so constant linear momentum) then dx/dt will be constant. But if [R(t)] represents the orientation of an object rotating at a constant angular velocity (and constant angular momentum) then [d R(t) / dt] will still vary with time but [~w] = [d R(t) / dt]*[R(t)]^-1 will not vary with time and therefore is a better representation of angular velocity.
I think differentiation is related to the addition operation but rotations are combined using matrix multiplication, not addition. When I say "differentiation is related to the addition operation" I mean: when we add a small increment to time we get a small increment to distance, differentiation is the limit when these additions. So is there a mathematical theory that relates small incremental multiplications to conventional differentiation?

Two Dimensional Case

Rotation matrix is: [R]=

cos(a)	-sin(a)
sin(a)	cos(a)

We want the object to be rotating at a constant angular velocity of 'w'. Therefore we replace the angle 'a' with w*t as follows:

Rotation matrix is: [R]=

cos(wt)	-sin(wt)
sin(wt)	cos(wt)

So if we differentiate this with respect to time we get:

d[R] / d t=

-w*sin(wt)	-w*cos(wt)
w*cos(wt)	-w*sin(wt)

So, as already described, this is a time varying quantity even though it is rotating at a constant angular velocity. However we can factor this matrix as follows:

-w*sin(wt)	-w*cos(wt)
w*cos(wt)	-w*sin(wt)

0	-w
w	0

cos(a)	-sin(a)
sin(a)	cos(a)

so if we let

[~w]

0	-w
w	0

Then we get:

[d R(t) / dt] = [~w]*[R(t)]

So we now have a non-time-varying matrix to represent angular velocity.

Three Dimensional Case

_a(t)= [R(t)]_l

where:

_a(t) = a point represented by a vector in absolute coordinates which is moving (i.e. is a function of time)
[R(t)] = a rotation matrix which is a function of time.
_l = a fixed point represented by a vector in the objects local coordinates.

In other words, if we take a fixed point on an object, and transform the object by multiplying it with a rotation matrix, which is a function of time, then we will get a vector which is rotating as defined by the matrix.

If we want to get the velocity of this vector then we need to differentiate the matrix t give,

_a(t)= [(t)]_l

where:

_a(t) = the velocity of the point in the first equation
[(t)] = the matrix from the first equation which has now been differentiated.

We want to prove that: (t)=[~w] R(t)

In other words that:

(t)=

0	-w_a	w_h
w_a	0	-w_b
-w_h	w_b	0

R(t)

R(t) can be expressed in terms of euler angles (as explained on this page)

(t)=

0	-w_a	w_h
w_a	0	-w_b
-w_h	w_b	0

ch*ca	-chsacb + sh*sb	chsasb + sh*cb
sa	ca*cb	-ca*sb
-sh*ca	shsacb + ch*sb	- shsasb + ch*cb

(t)=

-w_asa - w_hsh*ca	-w_acacb + w_hshsacb + w_hch*sb	w_acasb - w_hshsasb + w_hch*cb
w_achca- w_bshca	-w_achsacb + w_ashsb + w_bshsacb + w_bchsb	w_achsasb + w_ashcb - w_bshsasb + w_bchcb
-w_hchca + w_b*sa	w_hchsacb -w_hshsb + _wbca*cb	-w_hchsasb -w_hshcb - w_bca*sb

Assume that we have a matrix representing the position and orientation of a solid object, this transforms relative coordinates into global coordinates as follows,

[point in global coordinates]=

m₀₀	m₀₁	m₀₂
m₁₀	m₁₁	m₁₂
m₂₀	m₂₁	m₂₂

[point in relative coordinates]

This matrix can be expressed in terms of euler angles (as explained on this page) using standard aeroplane conventions:

ch*ca	-chsacb + sh*sb	chsasb + sh*cb
sa	ca*cb	-ca*sb
-sh*ca	shsacb + ch*sb	- shsasb + ch*cb

where:

ch = cos(heading)
sh = sin(heading)
heading = rotation about y
ca = cos(attitude)
sa = sin(attitude)
attitude = angle about z (applied second)
cb = cos(bank)
sb = sin(bank)
bank= angle about x (applied last)

So we can get the rate of change of the point by differentiating the matrix:

[d(point in global coordinates)/dt]=

d(m₀₀)/dt	d(m₀₁)/dt	d(m₀₂)/dt
d(m₁₀)/dt	d(m₁₁)/dt	d(m₁₂)/dt
d(m₂₀)/dt	d(m₂₁)/dt	d(m₂₂)/dt

[point in relative coordinates]

So in terms of angles this gives:

[dR/dt] =

d(ch*ca)/dt	-d(chsacb)/dt+d(sh*sb)/dt	d(chsasb)/dt+d(sh*cb)/dt
d(sa)/dt	d(ca*cb)/dt	-d(ca*sb)/dt
-d(sh*ca)/dt	d(shsacb) /dt+ d(ch*sb)/dt	-d(shsasb)/dt+d(ch*cb)/dt

using the following differention rules:

d(x*y) = x d(y) + y d(x)
d sx / dt = d sx / dx * d x / dt = cx * wx
d cx= / dt = d cx / dx * d x / dt = -sx * wx

therefore d(x*y)/t = x wy + y wx

where wx = angular velocity about x

so this gives:

d(cx cy) = cx d(cy) + cy d(cx) = - cx sy wy - cy sx wx

d(sx cy) = sx d(cy) + cy d(sx) = -sx sy wy + cy cx wx

d(cx sy) = cx d(sy) + sy d(cx) = cx cy wy - sy sx wx

d(sx sy) = sx d(sy) + sy d(sx) = sx cy wy + sy cx wx

d(cx sy cz) = cx sy d cz + d(cx sy) cz = - cx sy sz wz + cx cy cz wy - sy sx cz wx

d(cx sy sz) = cx sy d sz + d(cx sy) sz = cx sy cz wz + cx cy sz wy - sy sx sz wx

d(sx sy cz) =

[dR/dt] =

- ch * sa * wa - ca * sh * wh	- ch sa sb wb + ch ca cb wa - sa sh cb wh + sh cb wb + sb ch wh	ch sa cb wb + ch ca sb wa - sa sh sb wh -sh sb wb + cb ch wh
ca * wa	- ca sb wb - cb sa wa	-ca cb wb + sb sa wa
sh * sa * wa - ca * ch * wh

This does not seem to be giving the right answer, can anyone see what I'm doing wrong?

Try this with global euler angles:

We want to prove that: (t)=[~w] R(t)

In other words that:

(t)=

0	-w_z	w_y
w_z	0	-w_x
-w_y	w_x	0

R(t)

R(t) can be expressed in terms of euler angles (as explained on this page)

(t)=

0	-w_z	w_y
w_z	0	-w_x
-w_y	w_x	0

cy * cz	sxsycz-cx*sz	cxsycz+sx*sz
cy * sz	sxsycz +cx*cz	cxsysz-sx*cz
-sy	sx*cy	cx*cy

(t)=

-w_zcysz - w_y*sy	-w_zsxsycz -w_zcxcz + w_ysx*cy	-w_zcxsysz+w_zsxcz + w_ycx*cy
w_zcycz + w_x*sy	w_zsxsycz-w_zcxsz-w_xsx*cy	w_zcxsycz+w_zsxsz-w_xcx*cy
-w_ycycz + w_xcysz	-w_ysxsycz+w_ycx*sz	-w_ycxsycz-w_ysxsz + w_xcxsysz-w_xsxcz

Assume that we have a matrix representing the position and orientation of a solid object, this transforms relative coodinates into global coordinates as follows,

[point in global coordinates]=

m₀₀	m₀₁	m₀₂
m₁₀	m₁₁	m₁₂
m₂₀	m₂₁	m₂₂

[point in relative coordinates]

This matrix can be expressed in terms of euler angles (as explained on this page) using standard aeroplane conventions:

cy * cz	sxsycz-cx*sz	cxsycz+sx*sz
cy * sz	sxsycz +cx*cz	cxsysz-sx*cz
-sy	sx*cy	cx*cy

where:

ch = cos(heading)
sh = sin(heading)
heading = rotation about y
ca = cos(attitude)
sa = sin(attitude)
attitude = angle about z (applied second)
cb = cos(bank)
sb = sin(bank)
bank= angle about x (applied last)

So we can get the rate of change of the point by differentiating the matrix:

[d(point in global coordinates)/dt]=

d(m₀₀)/dt	d(m₀₁)/dt	d(m₀₂)/dt
d(m₁₀)/dt	d(m₁₁)/dt	d(m₁₂)/dt
d(m₂₀)/dt	d(m₂₁)/dt	d(m₂₂)/dt

[point in relative coordinates]

So in terms of angles this gives:

[dR/dt] =

d(cy * cz)/dt	d(sxsycz)/dt-d(cx*sz)/dt	d(cxsycz)/dt+d(sx*sz)/dt
d(cy * sz)/dt	d(sxsycz)/dt+d(cx*cz)/dt	d(cxsysz)/dt-d(sx*cz)/dt
-d(sy)/dt	d(sx*cy) /dt	d(cx*cy)/dt

using the following differention rules:

d(x*y) = x d(y) + y d(x)
d sx / dt = d sx / dx * d x / dt = cx * wx
d cx= / dt = d cx / dx * d x / dt = -sx * wx

therefore d(x*y)/t = x wy + y wx

where wx = angular velocity about x

so this gives:

d(cx cy) = cx d(cy) + cy d(cx) = - cx sy wy - cy sx wx

d(sx cy) = sx d(cy) + cy d(sx) = -sx sy wy + cy cx wx

d(cx sy) = cx d(sy) + sy d(cx) = cx cy wy - sy sx wx

d(sx sy) = sx d(sy) + sy d(sx) = sx cy wy + sy cx wx

d(cx sy cz) = cx sy d cz + d(cx sy) cz = - cx sy sz wz + cx cy cz wy - sy sx cz wx

d(cx sy sz) = cx sy d sz + d(cx sy) sz = cx sy cz wz + cx cy sz wy - sy sx sz wx

d(sx sy cz) =

[dR/dt] =

- cy sz wz - cz sy wy
cy cz wz - sz sy wy
-cy * wy

This does not seem to be giving the right answer, can anyone see what I'm doing wrong?

Try this with global euler angles:

We want to prove that: (t)=[~w] R(t)

In other words that:

(t)=

0	-w_z	w_y
w_z	0	-w_x
-w_y	w_x	0

R(t)

R(t) can be expressed in terms of euler angles (as explained on this page)

(t)=

0	-w_z	w_y
w_z	0	-w_x
-w_y	w_x	0

cy * cz	cy * sz	-sy
sxsycz - cx*sz	sxsysz + cx*cz	sx*cy
cxsycz + sx*sz	cxsysz - sx*cz	cx*cy

(t)=

-w_zsxsycz + w_zcxsz + w_ycxsycz + w_ysxsz	-w_zsxsysz - w_zcxcz + w_ycxsysz - w_ysxcz	-w_zsxcy + w_ycxcy
w_zcycz - w_xcxsycz - w_xsx*sz	w_zcysz - w_xcxsysz + w_xsx*cz	-syw_{z -w_xcx*cy}
-w_ycycz + w_xsxsycz - w_xcx*sz	-w_ycysz + w_xsxsysz + w_xcx*cz	w_ysy + w_xsx*cy

Assume that we have a matrix representing the position and orientation of a solid object, this transforms relative coodinates into global coordinates as follows,

[point in global coordinates]=

m₀₀	m₀₁	m₀₂
m₁₀	m₁₁	m₁₂
m₂₀	m₂₁	m₂₂

[point in relative coordinates]

This matrix can be expressed in terms of euler angles (as explained on this page) using standard aeroplane conventions:

cy * cz	cy * sz	-sy
sxsycz - cx*sz	sxsysz + cx*cz	sx*cy
cxsycz + sx*sz	cxsysz - sx*cz	cx*cy

where:

ch = cos(heading)
sh = sin(heading)
heading = rotation about y
ca = cos(attitude)
sa = sin(attitude)
attitude = angle about z (applied second)
cb = cos(bank)
sb = sin(bank)
bank= angle about x (applied last)

So we can get the rate of change of the point by differentiating the matrix:

[d(point in global coordinates)/dt]=

d(m₀₀)/dt	d(m₀₁)/dt	d(m₀₂)/dt
d(m₁₀)/dt	d(m₁₁)/dt	d(m₁₂)/dt
d(m₂₀)/dt	d(m₂₁)/dt	d(m₂₂)/dt

[point in relative coordinates]

So in terms of angles this gives:

[dR/dt] =

d(cy * cz)/dt	d(cy * sz)/dt	-d(sy)/dt
d(sxsycz)/dt-d(cx*sz)/dt	d(sxsysz)/dt+d(cx*cz)/dt	d(sx*cy)/dt
d(cxsycz)/dt+d(sx*sz)/dt	d(cxsysz)/dt-d(sx*cz)/dt	d(cx*cy)/dt

using the following differention rules:

d(x*y) = x d(y) + y d(x)
d sx / dt = d sx / dx * d x / dt = cx * wx
d cx= / dt = d cx / dx * d x / dt = -sx * wx

therefore d(x*y)/t = x wy + y wx

where wx = angular velocity about x

so this gives:

d(cx cy) = cx d(cy) + cy d(cx) = - cx sy wy - cy sx wx

d(sx cy) = sx d(cy) + cy d(sx) = -sx sy wy + cy cx wx

d(cx sy) = cx d(sy) + sy d(cx) = cx cy wy - sy sx wx

d(sx sy) = sx d(sy) + sy d(sx) = sx cy wy + sy cx wx

d(cx sy cz) = cx sy d cz + d(cx sy) cz = - cx sy sz wz + cx cy cz wy - sy sx cz wx

d(cx sy sz) = cx sy d sz + d(cx sy) sz = cx sy cz wz + cx cy sz wy - sy sx sz wx

d(sx sy cz) =

[dR/dt] =

- cy sz wz - cz sy wy

This does not seem to be giving the right answer, can anyone see what I'm doing wrong?

An example

object rotating about the x-axis with an angular velocity of w_x

R(t)=

1	0	0
0	cos(w_x t)	-sin(w_x t)
0	sin(w_x t)	cos(w_x t)

So if we differentie this matrix with respect to time (by individually differentiating each element) we get:

(t)=

1	0	0
0	- w_x sin(w_x t)	- w_x cos(w_x t)
0	w_x cos(w_x t)	- w_x sin(w_x t)

Notice that this is just R(t) but mutiplied by w_x and rotated by 90 degrees. So we can seperate out these factors as follows:

(t)=

0	0	0
0	0	-w_x
0	w_x	0

1	0	0
0	cos(w_x t)	- sin(w_x t)
0	sin(w_x t)	cos(w_x t)

So in this case we can differentiate R(t) to give (t) just by multiplying by a constant (not a function of time) matrix.

What if the object is rotating about the y axis, in this case,

_y(t)=

0	0	w_y
0	0	0
-w_y	0	0

cos(w_y t)	0	- sin(w_y t)
0	1	0
sin(w_y t)	0	cos(w_y t)

What if the object is rotating about both the x axis and the y axis, in this case we can use the following identity,

d/dt (YZ) =X * d/dt (Y) + d/dt (X) * Y

to give,

_y(t)= R_x(t) * _y(t) + _x(t) * R_y(t)

(t)=

1	0	0
0	cos(w_x t)	-sin(w_x t)
0	sin(w_x t)	cos(w_x t)

0	0	w_y
0	0	0
-w_y	0	0

cos(w_y t)	0	- sin(w_y t)
0	1	0
sin(w_y t)	0	cos(w_y t)

0	0	0
0	0	-w_x
0	w_x	0

1	0	0
0	cos(w_x t)	- sin(w_x t)
0	sin(w_x t)	cos(w_x t)

cos(w_y t)	0	- sin(w_y t)
0	1	0
sin(w_y t)	0	cos(w_y t)

So,

(t)= R_x(t)

0	0	w_y
0	0	0
-w_y	0	0

R_y(t) +

0	0	0
0	0	-w_x
0	w_x	0

R_x(t)R_y(t)

I was hoping to show that,

(t)=

0	0	w_y
0	0	-w_x
-w_y	w_x	0

R(t)

But I cant work out how to get there can anyone help?

and if the object is rotating about all 3 axis then:

(t)=

0	-w_z	w_y
w_z	0	-w_x
-w_y	w_x	0

R(t)

(t)=[~w] R(t)

Derivation of

Maths - Matrix Calculus

Reginald E. Bednar

January 26, 2004

Notation:

Form inertial to body coordinate transformation matrix and its inverse:

Use fact that position vector in body frame is a constant:

Form body frame unit vectors in inertial frame coordinates:

For derivative of body frame unit vectors in inertial frame coordinates:

Express inertial to body coordinate transformation matrix and derivative with respect to time of body to inertial coordinate transformation matrix in terms of these vectors:

The relationship between a unit vector and its time derivative is given by:

Expressing this relationship in the body frame, this yields:

Using above, substituting (1,0,0), (0,1,0), and (0,0,1) successively for , get:

Noting that unit vectors in body frame are:

Want to compute:

Body frame unit vectors are related to each other by:

Now resolve each element of matrix:

Express the body rotation rate in terms of inertial frame coordinates:

Resulting in the matrix in terms of rates:

Thus this matrix is a function of the body rotation rate vector expressed in inertial frame coordinates. This vector is typically not used in applications; the expression using the body rotation rate vector expressed in body frame coordinates is used instead. The latter vector is directly obtained from inertial measurement unit data, i.e., from rate gyros.

Dan Piponi (rotations@sigfpe.com) has kindly sent me information about this, here:

If you rotate a vector by a small angle around an axis then the rotation can be written approximately as

('x' is the cross product)

where:

w is a vector in the direction of the axis of rotation and the length of the vector is the size of the angle (in radians) (lets call this the 'rotation vector').
To prove this use the fact that for small angles, a, sin(a) is approximately a and do some basic trig.

Now define the function f(.) that converts vectors into skew-symmetric matrixes by:

Then it's easy to show by writing out components that w x v is f(w)v - ie.
doing a cross product with w is the same as multiplying by a certain matrix.

Define the 'inverse' function on matrices g(.) so that

So g(f(w))=w (though f(g(A))=A is only true if A is antisymmetric).

So now we can say that rotation around axis w by angle |w| is given by the matrix 1+f(w) for small |w|.

So suppose at time t the orientation of something is given by matrix A(t).
Then the change from t to t+dt must be a small rotation. In other words

A(t+dt)=(1+f(wdt))A(t) for small dt

where w is the angular velocity. (I hope you get that the rotation vector, for small dt, is given by wdt)

So f(w) = (A(t+wdt)A(t)^(-1) - 1)

Hence w is g(A(t+wdt)A(t)^(-1) - 1)

w = lim g(A(t+wdt)A(t)^(-1)-1)/dt as dt->0

Now A(t+wdt) is, to first order, A+wdt dA/dt. By dA/dt I mean simply the derivative of A with respect to time where you simply differentiate each element of the matrix with respect to time.

So now we get

w = g(dA/dt A^(-1))

Simple eh?

Let's do an example. Consider the matrix

At t = 0

So w = (0 0 1).

We already know that the matrix A defines rotation about the z-axis so this is correct. I'll leave you to check the calculation at general t - you should get the same result as A corresponds to constant angular velocity.

I've never actually seen this discussed in any textbook although if you generalise a bit this is an example of Lie algebra theory which is in many books. The operations f and g are sometimes known as the Hodge dual and are written as '*'.

A completely different approach to differentiating rotations is via geometric algebra and rotors. I don't see the need though as I think the above approach is fine. In fact I've used the above approach to differentiating rotations for finding minima and maxima of functions of rotations at work - traditionally the type of problem for which people use geometric algebra. But check the subject out anyway - search on 'geometric algebra hestenes' at google.com.

Have you ever studies the 'spinning top' mathematically? If you have you may spot the connection between the above and the differential equation dv/dt = w x v

Tell me what doesn't make sense!
--
Dan

Matrix integration

As with differentiation we can integrate a whole matrix by individually integrating each element. So if:

[f(x)]=

f00(x)	f01(x)	f02(x)
f10(x)	f11(x)	f12(x)
f20(x)	f21(x)	f22(x)

then:

[

f(x) dx]=

f00(x) dx	f01(x) dx	f02(x) dx
f10(x) dx	f11(x) dx	f12(x) dx
f20(x) dx	f21(x) dx	f22(x) dx

EuclideanSpace

home

grid image

Linear Functions

metadata block

see also:

quaternion equivalent of this

Correspondence about this page

Forum Discussion with Tadd

Book Shop - Further reading.

Where I can, I have put links to Amazon for books that are relevant to the subject, click on the appropriate country flag to get more details of the book or to buy it from them.

Multivariable Calculus with Matrices

Other Math Books

Commercial Software Shop

Where I can, I have put links to Amazon for commercial software, not directly related to the software project, but related to the subject being discussed, click on the appropriate country flag to get more details of the software or to buy it from them.

This site may have errors. Don't use for critical systems.