Tuesday, 20 June 2017

When i grows up...

...\(i\) wants to be a quaternion.

We can think of the complex numbers as two dimensional vectors, with a rule for multiplication. This multiplication rule is actually very well behaved, and for some time mathematicians tried and failed to find a similarly well behaved multiplication rule for three dimensional vectors. In fact the failure was not due to any lack of ability: it's impossible to find such a well-behaved multiplication rule for these vectors. Then William Rowan Hamilton had a flash of inspiration which gave him the insight that although a multiplication rule couldn't be found for three dimensional vectors, it could be done for four dimensional vectors. He was so excited by the realization that he carved the basic algebraic identities into the stonework of the bridge he was crossing at that time. The structure resulting from the multiplication rule he discovered is the quaternions.

Hamilton then devoted an immense amount of time and effort developing the mathematics of these quaternions and their physical applications. Unfortunately, he was far less effective a communicator of mathematics then he was an inventor (or discoverer) and quaternions were not adopted by the mathematical community.

So what is so nice about the complex number multiplication, what are the quaternions, and how well do the nice properties of the complex numbers generalize to them?

Geometry of complex multiplication

Multiplication of complex numbers is very rich geometrically. Probably the simplest observation we can make has to do with multiplication by \(i\). If \(z=a+ib\), so that \(z\) corresponds to the point \((a,b)\) in the plane, then \(iz=-b+ia\), which corresponds to the point \((-b,a)\): so multiplication by \(i\) has rotated this point by a right angle about the origin.

This is pretty, and suggests that a closer look is warranted, so let's take one.

The complex conjugate of \(z\) is \(\overline{z}=a-ib\), and the modulus of \(z\) is \[ |z| = \sqrt{z\overline{z}} = \sqrt{a^2+b^2}. \] Thinking of \(z\) as the point \((a,b)\), \(|z|\) is the distance to the origin, so it tells us the size of \(z\).

We also see that \(1/z=\overline{z}/|z|^2\) so that every complex number apart from zero has a multiplicative inverse. An excellent start. But complex multiplication has a more remarkable property still.

If \(z=a+ib\) and \(w=c+id\), then \[ \begin{split} |zw|^2&=|(ac-bd)+i(ad+bc)|^2\\ &=(ac-bd)^2+(ad+bc)^2\\ &=(ac)^2-2abcd+(bd)^2+(ad)^2+2abcd+(bc)^2\\ &=a^2c^2+b^2d^2+a^2d^2+b^2c^2\\ &=(a^2+b^2)(c^2+d^2)\\ &=|z|^2|w|^2 \end{split} \] so that \(|zw|=|z||w|\), so the size of the product is the product of the sizes.

A special collection of complex numbers are the ones satisfying \(|z|=1\), so they form the unit circle in the complex plane. The point on the circle with polar coordinates \((1,\theta)\) has Cartesian coordinates \((\cos(\theta),\sin(\theta)\)), and expressed as a complex number is \(\cos(\theta)+i\sin(\theta)\).

We can do a little more with these special complex numbers.

If \(z\) and \(w\) both have modulus \(1\), then so do \(1/z=\overline{z}\) and \(zw\), so the complex numbers of unit modulus are closed under multiplication and multiplicative inverse: when a collection of quantities has this properties we can it a group, and this particular group goes by the name \(U(1)\).

We can put this together with in a very pretty way.

If \(z,w \in U(1)\) with \(z=\cos(\theta)+i\sin(\theta)\) and \(w= \cos(\phi)+i\sin(\phi)\), the \(zw=\cos(\theta+\phi)+i\sin(\theta+\phi)\). Depending on how you define things, this either follows from or gives a proof of the standard trigonometry addition formulae.

There's also a useful special case, already mentioned above. If \(|z|=1\), so \(z=\cos(\theta)+i\sin(\theta)\), then \(iz=-\sin(\theta)+i\cos(\theta)\), which associates with each point on the unit circle a unit vector pointing tangent to the circle in the anti-clockwise direction.

Three dimensional vectors

This is all so neat and elegant that a great deal of effort was invested into trying to do the same thing with three component vectors. However, it couldn't be done. There's no way of multiplying three dimensional vectors to give a three dimensional vector with all these nice properties.

All is not actually lost: it is possible to multiply three vectors in a useful way, but you have to be prepared to extend what you mean by multiplication. The result is geometric algebra, and it has many attractions. I'll come back to this later on, but for the moment let's just accept that you just can't define a nice multiplication on three dimensional vectors, and go on to see how you can do it with four dimensional vectors.

In fact we will see that just about everything described above for the complex numbers has an analogue in the quaternions. (Though sometimes in a surprising way.)


So, what is a quaternion?

One way of describing them is that a quaternion, \(q\), is a sum of the form \[ q=a+bi+cj+dk \] where \(i,j\) and \(k\) are a new kind of number, like the usual square root of minus one, but related in a particular way. We have \[ i^2=j^2=k^2 = ijk = -1 \] and then we carry out arithmetic by multiplying out the expressions and using these relationships. It's like complex arithmetic but with more component parts (and a more complicated set of rules for how these different square roots of minus one interact).

Just as with the complex numbers, we can think of these objects as vectors: but now they have four components, a real one and three imaginary ones.

We can also define a quaternionic conjugate, \[ \overline{q} = a-bi-cj-dk \] and then we quickly get \[ q\overline{q}=a^2+b^2+c^2+d^2=|q|^2 \] which gives the modulus, or size, of the quaterion, and \[ q^{-1} = \frac{\overline{q}}{|q|^2} \] is the inverse of \(q\).

There's a lot packed up into that.

The complex numbers are hiding in there

The quaternions contain the complex numbers. This is easy to see, because it's obvious that if you set \(c=0=d\) then what is left is just the arithmetic of complex numbers.

What's slightly less obvious is that the complex numbers sit in there in more than one way. If you set \(b=0=d\), you get exactly the same thing except that the square root of minus one is now \(j\) (which should please the electrical engineers), and if you set \(b=0=c\) it is now \(k\).

In fact, you can think of the quaternions as the result of building complex numbers out of complex numbers: if \(z\) and \(w\) are both complex numbers, than \(z+wj\) is a quaternion, as long as \(i,j\) and \(k\) satisfy the rules up above.

Vector algebra is hiding in there

The rules for multiplying \(i,j\) and \(k\) have some very evocative consequences.

If I multiply \(ijk=-1\) by \(i\) I get \(i^2jk=-i\), so that \(jk=i\), and similarly \(ij=k\), \(ki=j\).

If I square \(ij=k\) I get \(ijij=-1\), so multiplying on the left by \(i\) and on the right by \(j\) I get \(ji=-ij\), and similarly \(ik=-ki\), \(jk=-kj\).

The relations look alot like what happens with the usual unit vectors \(\pmb{i,j,k}\) pointing in the \(x,y,z\) directions in three dimensional vector geometry. The square of each unit vector is like the dot product (except with the wrong sign), and the product of two different ones is just like the cross product.

In fact, this is such an important observation that I'll actually think of \(i,j\) and \(k\) as those unit vectors, and regard the purely imaginary quaterions as vectors in three dimensions, with the rules I gave up above for multiplication.

So, what do I get if I have two purely imaginary quaternions (or two three dimensional vectors) and multiply them together using the quaternion rule? If you haven't seen this before, it really is quite amazing. So let's start with \(q_1=a_1 i+b_1 j+c_1 k\) and \(q_2=a_2 i+b_2 j+c_2 k\). Then \[ \begin{split} q_1 q_2 &= (a_1 i+b_1 j+c_1 k)(a_2 i+b_2 j+c_2 k)\\ &= a_1a_2i^2 + b_1b_2 j^2 + c_1c_2 k^2 + a_1b_2 ij +a_1c_2ik + b_1a_2ji + b_1c_2jk+ c_1a_2 ki + c_1 b_2 kj\\ &= -(a_1b_1+a_2b_2+a_3c_3) +(b_1c_2-c_cb_2)i + (c_1a_2-a_1c_2)j+(a_1b_2-b_1a_2)k \end{split} \] and if we think of \(q_1\) and \(q_2\) as vectors, then \[ q_1 q_2 = -q_1.q_2 + q_1 \times q_2 \]


I haven't mentioned this explicitly, but we do have to be a bit careful. There is a price to pay for extending the multiplication of two dimensional vectors to that of four dimensional vectors, and it is that the resulting multiplication is not commutative, so we have to pay attention to the order that they come in when multiplying: in general, \(q_1q_2 \neq q_2 q_1\).

So, for example, if we have two quaternions \(q\) and \(r\), and we want to solve the equation \[ qs=r \] for the unknown quaternion \(s\), we have to calculate \(q^{-1}r\), not \(r q^{-1}\) which is (usually) different.

But this is OK. It's no worse that the situation with matrices, and we all have to learn to cope with that. In fact, one can represent quaternions as \((4 \times 4)\) matrices with their entries made out of the real and imaginary parts so that the matrix multiplication matches the quaternion multiplication. An interesting challenge is to try to figure out how to do that, and the representation of the complex numbers as \((2 \times 2)\) matrices gives a good starting point.

What have the quaternions ever done for us?

OK, so we can invent a quaternion algebra, and it looks as if it has a fair bit of the usual vector algebra coded into it somehow. But what is it good for?

I think a piece of maths is worthwhile if it can be used to solve problems, or if it gives a new insight into something, or if it's just downright pretty. Quaternions meet all three criteria. In fact Hamilton spent a long time writing a multi-volume treatise on the quaternions and their applications. For good or ill, he didn't succeed in persuading the mainstream community of their utility, though they do have their proponents even now, and are heavily used in a few niche areas.

I am going to take a look at just two aspects of the quaternions. The first is an application, involving using them to represent rotations. The second is a bit of pure geometry, involving the sphere in four dimensions.


The unit complex numbers can be used to describe rotations in the plane, so you might expect, or at least hope, that unit quaternions (i.e quaternions of modulus \(1\)) can be used to describe rotations in space.

They can, and what's more it can be done in a very beautiful way.

You might expect (I certainly did) that it would work like this. Given a point in three dimensional space, with coordinates \((x,y,z)\), I think of this as the purely imaginary quaternion \(X=xi+yj+zk\), then if \(q\) is any quaternion with \(|q|=1\), we see that \(|qX|=|q||X|=|X|\), so this preserves the size of \(X\) and it looks quite promising. But it's quite hard to see just which rotation (if any) corresponds to a given \(q\),and whether this gives all rotations. Fortunately, there's a better way.

First, let's think what determines a rotation: we need an axis of rotation, and an angle of rotation. Let's say that the axis of rotation is given by the vector \(n=n_xi+n_yj+n_zk\) where \(|n|=1\), and the angle of rotation is \(\theta\). Then we build the quaternion \[ q(\theta) = \cos\left(\frac{\theta}{2}\right) + n \sin\left(\frac{\theta}{2}\right) \] Now, here comes the sneaky bit.

Take any point \((x,y,z)\) and represent it by the purely imaginary quaternion \(X=xi+yj+zk\). Then \[ qX\overline{q} \] is the result of rotating \(X\) about \(n\) by the angle \(\theta\).

That is a bold claim. You need a reason to believe it. So write \[ X=\alpha n + \beta m \] where \(m\) is orthogonal to \(n\). (You'll notice that I've completely given up distinguishing between vectors and purely imaginary quaternions.) Then a couple of lines of algebra shows that \[ \begin{split} qX\overline{q} &= \left(\cos\left(\frac{\theta}{2}\right) + n \sin\left(\frac{\theta}{2}\right)\right) (\alpha n + \beta m) \left(\cos\left(\frac{\theta}{2}\right) - n \sin\left(\frac{\theta}{2}\right)\right)\\ &= \alpha n + \beta(m\cos(\theta)+(n \times m)\sin(\theta) \end{split} \] which makes it explicit that the component of \(X\) parallel to \(n\) is unchanged, and the component perpendicular to \(n\) is rotated by the angle \(\theta\) in the plane determined by \(m\) and \(n \times m\), which is the plane perpendicular to \(n\), so \(X\) has been rotated about the axis of rotation \(n\), just as claimed above.

This is unreasonably nice. Not only is there a quaternion corresponding to any rotation, but we have an extremely simple recipe for building it out of the axis and angle of rotation.

There is, of course, more. Not only is this a pretty way to represent rotations, it is a very useful one. If you want to control a robot arm, or model how a three dimensional object moves under the influence of a force, or calculate what the view looks like from a different position then you need an efficient way to represent rotations. Quaternions give us an efficient way to do this, and for this reason are used in robotics, dynamical simulations of systems of molecules and computer graphics.

On top of all this, you can use this quaternion representation to obtain more insight into the structure of \(SO(3)\), the group of three dimensional rotations. I won't develop that idea here, but just leave you with the observation that if \(q\) is a unit quaternion, then \(-q\) gives the same rotation as \(q\), so there's something interesting going on.

I've just given a very brief description of how to get rotations out of the quaternions. For much more, from a rather different perspective, @mathforge gives a development of the quaternions from rotations here, including an implementation.

The 3 sphere

I described how the unit complex numbers give us a way of finding a vector tangent to each point in a circle in the two dimensional plane. We can use the unit quaternions to see a nice bit of geometry in four dimensions. Now, the unit quaternions are defined by the squares of their components adding up to \(1\), so they are the points a distance \(1\) from the origin in four dimensional space. This is an object sometimes called a hypersphere, but which I will call the three-sphere, and denote by \(S^3\).

So if we take a point on \(S^3\), and multiply it by \(i\), what do we get? \[ \begin{split} X&=a+bi+cj+dk\\ iX&=-b+ai-dj+ck \end{split} \] so we see that \(iX.X=-ab+ba-cd+dc=0\). In other words, \(iX\) is orthogonal to \(X\). But that means that \(iX\) is a tangent vector to \(S^3\) at \(X\). (This is just the same as in two and three dimensions: the tangent space to a circle or a sphere at a point is given by all the vectors orthogonal to the position vector at that point.)

This means that by this process we can obtain a unit tangent vector at each point on \(S^3\), which clearly varies continuously over the sphere, since small changes in \(X\) give small changes in \(iX\).

We can now extract a rather indirect way of seeing why we can't have a multiplication on three dimensional vectors which has all the nice properties of complex and quaternionic multiplication. If there were, we could use this to build a vector field on the surface of a sphere in three dimensions: but the hairy ball theorem tells us that this is impossible, so there can't be a multiplication of three vectors which behaves like this.

You can start from here and start asking questions such as: Which surfaces admit a non-vanishing vector field? How many linearly independent vector fields can there be? What kind of things go wrong when there can't be one? This leads into worlds of differential and algebraic topology, which is maybe a step or several too far for this article. Instead, I'll return to a couple of topics raised up above.

But what about...

Geometric algebra

An alternative approach to multiplying vectors is given by geometric algebra. For three-dimensional vectors, this works in the following way. \[ i^2=j^2=k^2=1, \qquad ij=-ji \text{ etc} \] and \(ij,jk,ki,ijk\) are new objects. With this in place, we can multiply vectors together, at the expense of everything taking place in an eight-dimensional space.

On the one hand, this gives a powerful tool for calculating with vectors, as well demonstrated in Jason Merrill's blog post and it is well-behaved in that every vector has a multiplicative inverse. On the other hand, it is not completely well-behaved algebraically. If \(\pmb{n}\) is a unit vector, than \((1+\pmb{n})(1-\pmb{n})=1-1=0\), so non-zero quantities can multiply together to give zero.

Of course, neither geometric algebra nor quaternion algebra is right or wrong. Each gives a useful tool for working with vectors. I find the quaternions more aesthetically satisfying, but if I had to do a calculation I'd use whichever was the most convenient. (Assuming I could tell in advance, of course.)

Higher dimensions

So we've seen how to think of a pair of real numbers as a complex number; and how to think of a pair of complex numbers as a quaternion, though the result is no longer commutative. Can the trick be repeated? Can we take pairs of quaternions and get a multiplication on eight dimensional vectors?

Yes, we can, and the resulting objects are called octonions. This time the price we have to pay is that the multiplication is not associative. This is quite serious, as it means that we can't use matrices to represent these objects. Nevertheless, they give another interesting geometry which has been investigated both for its mathematical structure and in mathematical physics.

To finish with

I have to admit that hand calculation with quaternions can be a fairly tedious business. But you do get quite a lot of power in return, and I think they that we should all have at least a passing acquaintance with them.

There is, of course, a wealth of material available on the internet, and Google can be your friend if you ask it nicely. If you don't already know about it, the Internet Archive is an amazing resource where you can access many out of print texts, including Hamilton's books and many of his research papers. I'll just add one more link, to Pertti Lounesto's Clifford algebras and spinors. This is an exceptionally careful text, which provides a general framework where both quaternionic and geometric algebra live, and also demonstrates many applications to physics.

No comments:

Post a Comment