Wednesday, 28 June 2017

Finding greatest common divisors and all that.

Greatest Common Divisor

The problem is simple: given two positive integers, \(a\) and \(b\), how do we find \(\gcd(a,b)\), the greatest common divisor of \(a\) and \(b\) (i.e. the largest integer that \(a\) and \(b\) are both multiples of)?

Just about everybody learns how to do this at school. You break each of \(a\) and \(b\) down into a product of prime factors (where a prime number is one with exactly two factors, itself and \(1\)) and then multiply together all the common prime factors to get the biggest number that each is a multiple of. But this actually relies on quite a deep result. The procedure works because The Fundamental Theorem of Arithmetic says that any positive integer can be expressed as a product of primes, and it can only be done in one way (except for the order of the factors).

So, there are two issues here.

The first is in plain sight. It is the question of why you should believe the fundamental theorem of arithmetic. This might not seem like an issue. In fact, after a lot of experience in factorizing integers you may well feel that it is pretty obvious. But it isn't, and part of the intention of this post is to persuade you that it isn't.

The other is hidden, but comes into view when the numbers \(a\) and \(b\) get larger. It is the problem that working out the prime factorization of a number is really quite hard when the number is large.

There is, fortunately, an answer for both of these: Euclid's algorithm gives us a different way to compute the greatest common divisor of two integers, and, as a bonus, we can use it to prove that every integer has a unique prime factorization. This algorithm is to be found in Euclid's Elements, and is probably the oldest non-trivial algorithm in significant current use, being a vital ingredient in the RSA public key cryptosystem.

Euclid's Algorithm

The algorithm relies on a familiar fact from basic arithmetic, which goes by the appallingly inappropriate name of the division algorithm. It isn't an algorithm: an algorithm is a set of step-by-step instructions which tell you how to do something. This just tells you that a certain type of question has a particular type of answer. A couple of sources call it the division theorem because of this, but, alas, the terminology is very well established. I would rather be right than popular, though, so I will call it the division theorem.

But enough whining. What does this result say?

The Division Theorem If \(a\) and \(b\) are positive integers, then there is a unique positive integer \(q\), and a unique non-negative integer \(r \lt b\) such that \(a=qb+r\).

In other words, if you divide \(a\) by \(b\) you get a quotient and remainder, and there's only one right answer. This is something so entrenched in our experience that we take it very much for granted, and that's just what I'm going to do. But it has consequences.

If the remainder, \(r\), is zero, then \(a\) is a multiple of \(b\): we also say that \(b\) divides \(a\).

Back to the original question: I have two positive integers, \(a\) and \(b\), and I want to know the biggest number that divides into both of them. Let's suppose that \(a \gt b\). (If not, I can swap them over). Now, by the division theorem, I know that I can write \(a\) as \(qb+r\), where \(0\le r \lt b\). What's more, if \(d\) is any number that divides \(a\) and \(b\), since \(a-qb=r\), \(d\) divides \(r\). And since \(a=qb+r\), if \(d\) divides into \(b\) and \(r\), it also divides \(a\). That means that \(b\) and \(r\) have exactly the same common divisors as \(a\) and \(b\).

But something really useful has happened. I've replaced the problem of finding the greatest common divisor of \(a\) and \(b\) by finding the greatest common divisor of \(b\) and \(r\), and \(r\) is smaller than \(b\), which is smaller than \(a\). I've made the problem easier.

There are two possibilities: if \(r=0\), then \(b\) divides into \(a\), so it is the greatest common divisor of \(a\) and \(b\). If not, I can repeat the trick. Eventually the remainder will be \(0\), and I will have the greatest common divisor of my original \(a\) and \(b\).

This procedure is the essence of Euclid's algorithm. Let's see how it works with an example.

What is \(\gcd(60,25)\)? We calculate as follows: \[ \begin{split} 60 &= 2 \times 25 + 10\\ 25 &= 2 \times 10 + 5\\ 10 &= 2 \times 5 \end{split} \] This tells us that \(\gcd(60,25)=5\). But we get something not at all obvious from it. We can rewind the calculation like this: \[ \begin{split} 5 &= 25 - 2 \times 10\\ &= 25-2 \times (60-2 \times 25)\\ &= 5 \times 25 - 2 \times 60 \end{split} \] This tells us that we can express \(5\) as a combination of \(60\) and \(25\). But there's nothing special about these numbers: I could have rewound any appliation of Euclid's algorithm in just the same way.

So, given any \(a\) and \(b\), we can use Euclid's algorithm to find \(\gcd(a,b)\), and also to find integers \(m,n\) with the property that \(\gcd(a,b)=ma+nb\). This wasn't at all obvious how to do from the prime factorization approach. In fact, with the prime factorization approach it would never have occurred to me that this could even be done.

Thinking about this approach to finding the gcd of two numbers, we can see that \(\gcd(a,b)\) isn't just the biggest number that divides into both \(a\) and \(b\): it is actually a multiple of any other divisor. The reasoning is just what was used above: any divisor of \(a\) and \(b\) is also a divisor of \(r\), the remainder. Repeating this, we eventually find that it must be a divisor of the last non-zero remainder, i.e. it must be a divisor of \(\gcd(a,b)\).

You might wonder who cares? I think there are two immediate reasons to care, namely that this addresses both of the issues I raised earlier. The first reason is that you don't have to take it on trust that prime factorization works as advertised, since you can see just why this works. The second is more practical. It is that for large numbers, this algorithm is very fast: enormously fast compared to the best known algorithms for factorizing large numbers, which is important. It is the fact that factorizing large integers is (as far as we know) expensive while finding the greatest common divisor is cheap that makes RSA public key cryptography work.

There are other, less immediate reasons too. Using this, we can actually prove that prime factorization works as advertised. But before getting there, we need to see something useful that this tells us about prime numbers.

Prime numbers

Suppose that \(p\) is a prime number, and \(a\) is any number which is not a multiple of \(p\). Then we must have \(\gcd(a,p)=1\), which tells us that there are integers \(m,n\) such that \(ma+np=1\). Now comes the clever bit.

Let \(p\) be a prime number, and suppose that \(a\) and \(b\) are integers such that \(ab\) is a multiple of \(p\). Then either \(a\) or \(b\) must be a multiple of \(p\).

"But that's obvious!" you may be tempted to object. But if you think it's obvious, think about why: it's almost certainly because you're so accustomed to how prime factorization works. Unfortunately that's exactly the thing that I want to prove.

So, let's suppose that \(ab\) is a multiple of \(p\). If \(a\) is a multiple of \(p\), we're done. But what if \(a\) is not a multiple of \(p\)? Well, in that case \(\gcd(a,p)=1\), so there are integers \(m,n\) such that \(ma+np=1\). Now multiply this equation by \(b\). We get \[ b=mab+npb. \] But \(ab\) is a multiple of \(p\) and \(npb\) is a multiple of \(p\), so their sum - which is \(b\) - must also be a multiple of \(p\). That's what we needed.

Now we have everything we need to prove the

Fundamental Theorem of Arithmetic

At last, the proof

We can now consider an integer, \(n\). There are two possibilities: either \(n\) is a prime, or \(n\) is the product of two smaller factors. Then we think about each of them in turn. Eventually, we must reach a point where each of our factors is a prime.

But this is only part of the problem, and the easy part at that. How do we know that there is only one way to express \(n\) as a product of primes?

This is where we use that last fact. Suppose we have two prime factorizations of \(n\), say \[ n=p_1p_2 \ldots p_k = q_1q_2 \ldots q_l \] Then \(p_1\) is a factor of \(q_1q_2 \ldots q_l\). If \(p_1\) is not a factor of \(q_1\) (in which case \(p_1=q_1\)), then \(p_1\) is a factor of \(q_2\ldots q_l\), and by repeating this argument we find that \(p_1\) must be one of the \(q_i\). Then divide \(n\) by \(p_1\) and repeat the argument. Eventually we find that each of the prime factors in either factorization must also be a factor in the other factorization, so the two are actually the same (except possibly in a different order).

What is a prime number again?

I rushed past this idea up at the top, with the brief statement that a prime number is one with exactly two factors, itself and \(1\). This isn't quite satisfactory if we allow negative numbers, so let's be a little more general than that, and try to give a definition which will work in more generality.

We say that a number is a unit if it has a multiplicative inverse, and then that a factorization is proper if no factor is a unit. Then a prime is a number with no proper factorization. So this works when we allow negative numbers, since although we can write \(3=1\times 3 = -1 \times -3\), in each case one of the factors is a unit.

But this property wasn't really what we used to prove the fundamental theorem of arithmetic. We used this to show that any integer has a prime factorization, but not to show that it was unique. For that part of the argument, the important fact was that if a product is a multiple of a prime, then at least one of the factors in the product must be a multiple of the prime.

In fact there two ways of characterizing prime numbers.

One characterization is that \(p\) is prime if it has no proper factorization.

The other is that \(p\) is prime if, whenever \(ab\) is a multiple of \(p\) then at least one of \(a\) and \(b\) must be a multiple of \(p\). We saw above that every prime has this property: we just have to make sure that any number with this property is prime. But if \(n\) has a proper factorization into \(ab\) then \(n\) divides into \(ab\), but since \(n\) is larger than both \(a\) and \(b\) neither is a multiple of \(n\), so no number with a proper factorization can satisfy this property.

But it doesn't have to be like that.

Another number system

The integers aren't the only number system we could work with. For example, we might consider the Gaussian integers \(\mathbb{Z}[i]\), i.e. the complex numbers whose real and imaginar parts are both integers. More interestingly, we could, and we will, do something rather more exotic. We consider numbers of the form \(m+n\sqrt{-5}\), where \(m\) and \(n\) are both integers, and call this collection \(\mathbb{Z}[\sqrt{-5}]\). Note that as for the integers, the units here are just \(\pm 1\).

We now get something happening which cannot happen with the integers.

First, notice that \(|m+n\sqrt{-5}| = \sqrt{m^2+5n^2}\), and since these objects are complex numbers, if \(M,N \in \mathbb{Z}[\sqrt{-5}]\), \(|MN|=|M||N|\). Then in this number system, just as in \(\mathbb{Z}\), neither \(2\) nor \(3\) has a proper factorization; and neither do \(1 \pm \sqrt{-5}\). You can check this in each by writing down all the numbers with modulus less than the target value, and checking that you can't get the target by multiplying any collection of them together (unless one is a unit). But \[ 2 \times 3 = (1+\sqrt{-5})(1-\sqrt{-5}) = 6. \] So in this number system we see that \(1+\sqrt{-5}\) is a factor of \(2 \times 3\) but it is not a factor or either \(2\) or \(3\).

So in this case we do not get a unique factorization. It can't really be obvious that there is only one way to factorize a number into a product of terms which can't be factorized any further, because there are number systems where it doesn't happen!

Prime versus irreducible

In this more general context, it is necessary to split up our two notions of prime, because they no longer coincide. The standard way to do this is to call a number with no proper factorization an irreducible number, and to define a prime to be a number which has the property that if it is a factor of \(ab\) then it must be a factor of at least one of \(a\) or \(b\). (This is why on your first encounter with abstract algebra, the definition of prime can seem to have nothing to do with the familiar prime numbers.)

Then any prime number must be irreducible, but a number can be irreducible without being prime. For each number to have a unique prime factorization, it must also be the case that every irreducible number is prime.

Moving on

This may have already been more than enough for you. But if not, there are lots of extenstions to play with here.

We could play the game again with the Gaussian integers, and it turns out that again there is a unique factorization into irreducibles. And we could consider variations, such as numbers of the form \(m+n\sqrt{-k}\) where \(k\) is a positive integer. How does the choice of \(k\) affect the behaviour? We already know that when \(k=5\), unique factorization fails. But is this typical, or does it happen for other vaues? Is there a property of \(k\) that we can calculate that tells us?

We could try with polynomials with real coefficients: it turns out that division of polynomials is similar enough to division of integers that we have a division theorem and a Euclidean algorithm, that prime and irreducible coincide, and that any polynomial can be expressed uniquely as a product of primes (which here means linear factors or quadratics with no roots). We could ask how things change if we take different types of coefficients, say complex, or rational, or integer.

These two alone give a large toy box to play with and explore these ideas, but they're only a start. Explore, investigate, make up your own variations. If all else fails, look in a book on abstract algebra or ask the internet.

Calculating gcd redux

So, after all that, how should you actually calculate the gcd of two integers?

The answer is, as it almost always is, "it depends". If the numbers are small enough to be factorized easily, using the prime factorization is probably the quicker of the two. If they are large enough that factorization is a pain, then Euclid's algorithm will do the job with less pain.

And I think it's worth calculating a few both ways just to see how the same result is arrived at by such very different routes.

Tuesday, 20 June 2017

When i grows up...

...\(i\) wants to be a quaternion.

We can think of the complex numbers as two dimensional vectors, with a rule for multiplication. This multiplication rule is actually very well behaved, and for some time mathematicians tried and failed to find a similarly well behaved multiplication rule for three dimensional vectors. In fact the failure was not due to any lack of ability: it's impossible to find such a well-behaved multiplication rule for these vectors. Then William Rowan Hamilton had a flash of inspiration which gave him the insight that although a multiplication rule couldn't be found for three dimensional vectors, it could be done for four dimensional vectors. He was so excited by the realization that he carved the basic algebraic identities into the stonework of the bridge he was crossing at that time. The structure resulting from the multiplication rule he discovered is the quaternions.

Hamilton then devoted an immense amount of time and effort developing the mathematics of these quaternions and their physical applications. Unfortunately, he was far less effective a communicator of mathematics then he was an inventor (or discoverer) and quaternions were not adopted by the mathematical community.

So what is so nice about the complex number multiplication, what are the quaternions, and how well do the nice properties of the complex numbers generalize to them?

Geometry of complex multiplication

Multiplication of complex numbers is very rich geometrically. Probably the simplest observation we can make has to do with multiplication by \(i\). If \(z=a+ib\), so that \(z\) corresponds to the point \((a,b)\) in the plane, then \(iz=-b+ia\), which corresponds to the point \((-b,a)\): so multiplication by \(i\) has rotated this point by a right angle about the origin.

This is pretty, and suggests that a closer look is warranted, so let's take one.

The complex conjugate of \(z\) is \(\overline{z}=a-ib\), and the modulus of \(z\) is \[ |z| = \sqrt{z\overline{z}} = \sqrt{a^2+b^2}. \] Thinking of \(z\) as the point \((a,b)\), \(|z|\) is the distance to the origin, so it tells us the size of \(z\).

We also see that \(1/z=\overline{z}/|z|^2\) so that every complex number apart from zero has a multiplicative inverse. An excellent start. But complex multiplication has a more remarkable property still.

If \(z=a+ib\) and \(w=c+id\), then \[ \begin{split} |zw|^2&=|(ac-bd)+i(ad+bc)|^2\\ &=(ac-bd)^2+(ad+bc)^2\\ &=(ac)^2-2abcd+(bd)^2+(ad)^2+2abcd+(bc)^2\\ &=a^2c^2+b^2d^2+a^2d^2+b^2c^2\\ &=(a^2+b^2)(c^2+d^2)\\ &=|z|^2|w|^2 \end{split} \] so that \(|zw|=|z||w|\), so the size of the product is the product of the sizes.

A special collection of complex numbers are the ones satisfying \(|z|=1\), so they form the unit circle in the complex plane. The point on the circle with polar coordinates \((1,\theta)\) has Cartesian coordinates \((\cos(\theta),\sin(\theta)\)), and expressed as a complex number is \(\cos(\theta)+i\sin(\theta)\).

We can do a little more with these special complex numbers.

If \(z\) and \(w\) both have modulus \(1\), then so do \(1/z=\overline{z}\) and \(zw\), so the complex numbers of unit modulus are closed under multiplication and multiplicative inverse: when a collection of quantities has this properties we can it a group, and this particular group goes by the name \(U(1)\).

We can put this together with in a very pretty way.

If \(z,w \in U(1)\) with \(z=\cos(\theta)+i\sin(\theta)\) and \(w= \cos(\phi)+i\sin(\phi)\), the \(zw=\cos(\theta+\phi)+i\sin(\theta+\phi)\). Depending on how you define things, this either follows from or gives a proof of the standard trigonometry addition formulae.

There's also a useful special case, already mentioned above. If \(|z|=1\), so \(z=\cos(\theta)+i\sin(\theta)\), then \(iz=-\sin(\theta)+i\cos(\theta)\), which associates with each point on the unit circle a unit vector pointing tangent to the circle in the anti-clockwise direction.

Three dimensional vectors

This is all so neat and elegant that a great deal of effort was invested into trying to do the same thing with three component vectors. However, it couldn't be done. There's no way of multiplying three dimensional vectors to give a three dimensional vector with all these nice properties.

All is not actually lost: it is possible to multiply three vectors in a useful way, but you have to be prepared to extend what you mean by multiplication. The result is geometric algebra, and it has many attractions. I'll come back to this later on, but for the moment let's just accept that you just can't define a nice multiplication on three dimensional vectors, and go on to see how you can do it with four dimensional vectors.

In fact we will see that just about everything described above for the complex numbers has an analogue in the quaternions. (Though sometimes in a surprising way.)


So, what is a quaternion?

One way of describing them is that a quaternion, \(q\), is a sum of the form \[ q=a+bi+cj+dk \] where \(i,j\) and \(k\) are a new kind of number, like the usual square root of minus one, but related in a particular way. We have \[ i^2=j^2=k^2 = ijk = -1 \] and then we carry out arithmetic by multiplying out the expressions and using these relationships. It's like complex arithmetic but with more component parts (and a more complicated set of rules for how these different square roots of minus one interact).

Just as with the complex numbers, we can think of these objects as vectors: but now they have four components, a real one and three imaginary ones.

We can also define a quaternionic conjugate, \[ \overline{q} = a-bi-cj-dk \] and then we quickly get \[ q\overline{q}=a^2+b^2+c^2+d^2=|q|^2 \] which gives the modulus, or size, of the quaterion, and \[ q^{-1} = \frac{\overline{q}}{|q|^2} \] is the inverse of \(q\).

There's a lot packed up into that.

The complex numbers are hiding in there

The quaternions contain the complex numbers. This is easy to see, because it's obvious that if you set \(c=0=d\) then what is left is just the arithmetic of complex numbers.

What's slightly less obvious is that the complex numbers sit in there in more than one way. If you set \(b=0=d\), you get exactly the same thing except that the square root of minus one is now \(j\) (which should please the electrical engineers), and if you set \(b=0=c\) it is now \(k\).

In fact, you can think of the quaternions as the result of building complex numbers out of complex numbers: if \(z\) and \(w\) are both complex numbers, than \(z+wj\) is a quaternion, as long as \(i,j\) and \(k\) satisfy the rules up above.

Vector algebra is hiding in there

The rules for multiplying \(i,j\) and \(k\) have some very evocative consequences.

If I multiply \(ijk=-1\) by \(i\) I get \(i^2jk=-i\), so that \(jk=i\), and similarly \(ij=k\), \(ki=j\).

If I square \(ij=k\) I get \(ijij=-1\), so multiplying on the left by \(i\) and on the right by \(j\) I get \(ji=-ij\), and similarly \(ik=-ki\), \(jk=-kj\).

The relations look alot like what happens with the usual unit vectors \(\pmb{i,j,k}\) pointing in the \(x,y,z\) directions in three dimensional vector geometry. The square of each unit vector is like the dot product (except with the wrong sign), and the product of two different ones is just like the cross product.

In fact, this is such an important observation that I'll actually think of \(i,j\) and \(k\) as those unit vectors, and regard the purely imaginary quaterions as vectors in three dimensions, with the rules I gave up above for multiplication.

So, what do I get if I have two purely imaginary quaternions (or two three dimensional vectors) and multiply them together using the quaternion rule? If you haven't seen this before, it really is quite amazing. So let's start with \(q_1=a_1 i+b_1 j+c_1 k\) and \(q_2=a_2 i+b_2 j+c_2 k\). Then \[ \begin{split} q_1 q_2 &= (a_1 i+b_1 j+c_1 k)(a_2 i+b_2 j+c_2 k)\\ &= a_1a_2i^2 + b_1b_2 j^2 + c_1c_2 k^2 + a_1b_2 ij +a_1c_2ik + b_1a_2ji + b_1c_2jk+ c_1a_2 ki + c_1 b_2 kj\\ &= -(a_1b_1+a_2b_2+a_3c_3) +(b_1c_2-c_cb_2)i + (c_1a_2-a_1c_2)j+(a_1b_2-b_1a_2)k \end{split} \] and if we think of \(q_1\) and \(q_2\) as vectors, then \[ q_1 q_2 = -q_1.q_2 + q_1 \times q_2 \]


I haven't mentioned this explicitly, but we do have to be a bit careful. There is a price to pay for extending the multiplication of two dimensional vectors to that of four dimensional vectors, and it is that the resulting multiplication is not commutative, so we have to pay attention to the order that they come in when multiplying: in general, \(q_1q_2 \neq q_2 q_1\).

So, for example, if we have two quaternions \(q\) and \(r\), and we want to solve the equation \[ qs=r \] for the unknown quaternion \(s\), we have to calculate \(q^{-1}r\), not \(r q^{-1}\) which is (usually) different.

But this is OK. It's no worse that the situation with matrices, and we all have to learn to cope with that. In fact, one can represent quaternions as \((4 \times 4)\) matrices with their entries made out of the real and imaginary parts so that the matrix multiplication matches the quaternion multiplication. An interesting challenge is to try to figure out how to do that, and the representation of the complex numbers as \((2 \times 2)\) matrices gives a good starting point.

What have the quaternions ever done for us?

OK, so we can invent a quaternion algebra, and it looks as if it has a fair bit of the usual vector algebra coded into it somehow. But what is it good for?

I think a piece of maths is worthwhile if it can be used to solve problems, or if it gives a new insight into something, or if it's just downright pretty. Quaternions meet all three criteria. In fact Hamilton spent a long time writing a multi-volume treatise on the quaternions and their applications. For good or ill, he didn't succeed in persuading the mainstream community of their utility, though they do have their proponents even now, and are heavily used in a few niche areas.

I am going to take a look at just two aspects of the quaternions. The first is an application, involving using them to represent rotations. The second is a bit of pure geometry, involving the sphere in four dimensions.


The unit complex numbers can be used to describe rotations in the plane, so you might expect, or at least hope, that unit quaternions (i.e quaternions of modulus \(1\)) can be used to describe rotations in space.

They can, and what's more it can be done in a very beautiful way.

You might expect (I certainly did) that it would work like this. Given a point in three dimensional space, with coordinates \((x,y,z)\), I think of this as the purely imaginary quaternion \(X=xi+yj+zk\), then if \(q\) is any quaternion with \(|q|=1\), we see that \(|qX|=|q||X|=|X|\), so this preserves the size of \(X\) and it looks quite promising. But it's quite hard to see just which rotation (if any) corresponds to a given \(q\),and whether this gives all rotations. Fortunately, there's a better way.

First, let's think what determines a rotation: we need an axis of rotation, and an angle of rotation. Let's say that the axis of rotation is given by the vector \(n=n_xi+n_yj+n_zk\) where \(|n|=1\), and the angle of rotation is \(\theta\). Then we build the quaternion \[ q(\theta) = \cos\left(\frac{\theta}{2}\right) + n \sin\left(\frac{\theta}{2}\right) \] Now, here comes the sneaky bit.

Take any point \((x,y,z)\) and represent it by the purely imaginary quaternion \(X=xi+yj+zk\). Then \[ qX\overline{q} \] is the result of rotating \(X\) about \(n\) by the angle \(\theta\).

That is a bold claim. You need a reason to believe it. So write \[ X=\alpha n + \beta m \] where \(m\) is orthogonal to \(n\). (You'll notice that I've completely given up distinguishing between vectors and purely imaginary quaternions.) Then a couple of lines of algebra shows that \[ \begin{split} qX\overline{q} &= \left(\cos\left(\frac{\theta}{2}\right) + n \sin\left(\frac{\theta}{2}\right)\right) (\alpha n + \beta m) \left(\cos\left(\frac{\theta}{2}\right) - n \sin\left(\frac{\theta}{2}\right)\right)\\ &= \alpha n + \beta(m\cos(\theta)+(n \times m)\sin(\theta) \end{split} \] which makes it explicit that the component of \(X\) parallel to \(n\) is unchanged, and the component perpendicular to \(n\) is rotated by the angle \(\theta\) in the plane determined by \(m\) and \(n \times m\), which is the plane perpendicular to \(n\), so \(X\) has been rotated about the axis of rotation \(n\), just as claimed above.

This is unreasonably nice. Not only is there a quaternion corresponding to any rotation, but we have an extremely simple recipe for building it out of the axis and angle of rotation.

There is, of course, more. Not only is this a pretty way to represent rotations, it is a very useful one. If you want to control a robot arm, or model how a three dimensional object moves under the influence of a force, or calculate what the view looks like from a different position then you need an efficient way to represent rotations. Quaternions give us an efficient way to do this, and for this reason are used in robotics, dynamical simulations of systems of molecules and computer graphics.

On top of all this, you can use this quaternion representation to obtain more insight into the structure of \(SO(3)\), the group of three dimensional rotations. I won't develop that idea here, but just leave you with the observation that if \(q\) is a unit quaternion, then \(-q\) gives the same rotation as \(q\), so there's something interesting going on.

I've just given a very brief description of how to get rotations out of the quaternions. For much more, from a rather different perspective, @mathforge gives a development of the quaternions from rotations here, including an implementation.

The 3 sphere

I described how the unit complex numbers give us a way of finding a vector tangent to each point in a circle in the two dimensional plane. We can use the unit quaternions to see a nice bit of geometry in four dimensions. Now, the unit quaternions are defined by the squares of their components adding up to \(1\), so they are the points a distance \(1\) from the origin in four dimensional space. This is an object sometimes called a hypersphere, but which I will call the three-sphere, and denote by \(S^3\).

So if we take a point on \(S^3\), and multiply it by \(i\), what do we get? \[ \begin{split} X&=a+bi+cj+dk\\ iX&=-b+ai-dj+ck \end{split} \] so we see that \(iX.X=-ab+ba-cd+dc=0\). In other words, \(iX\) is orthogonal to \(X\). But that means that \(iX\) is a tangent vector to \(S^3\) at \(X\). (This is just the same as in two and three dimensions: the tangent space to a circle or a sphere at a point is given by all the vectors orthogonal to the position vector at that point.)

This means that by this process we can obtain a unit tangent vector at each point on \(S^3\), which clearly varies continuously over the sphere, since small changes in \(X\) give small changes in \(iX\).

We can now extract a rather indirect way of seeing why we can't have a multiplication on three dimensional vectors which has all the nice properties of complex and quaternionic multiplication. If there were, we could use this to build a vector field on the surface of a sphere in three dimensions: but the hairy ball theorem tells us that this is impossible, so there can't be a multiplication of three vectors which behaves like this.

You can start from here and start asking questions such as: Which surfaces admit a non-vanishing vector field? How many linearly independent vector fields can there be? What kind of things go wrong when there can't be one? This leads into worlds of differential and algebraic topology, which is maybe a step or several too far for this article. Instead, I'll return to a couple of topics raised up above.

But what about...

Geometric algebra

An alternative approach to multiplying vectors is given by geometric algebra. For three-dimensional vectors, this works in the following way. \[ i^2=j^2=k^2=1, \qquad ij=-ji \text{ etc} \] and \(ij,jk,ki,ijk\) are new objects. With this in place, we can multiply vectors together, at the expense of everything taking place in an eight-dimensional space.

On the one hand, this gives a powerful tool for calculating with vectors, as well demonstrated in Jason Merrill's blog post and it is well-behaved in that every vector has a multiplicative inverse. On the other hand, it is not completely well-behaved algebraically. If \(\pmb{n}\) is a unit vector, than \((1+\pmb{n})(1-\pmb{n})=1-1=0\), so non-zero quantities can multiply together to give zero.

Of course, neither geometric algebra nor quaternion algebra is right or wrong. Each gives a useful tool for working with vectors. I find the quaternions more aesthetically satisfying, but if I had to do a calculation I'd use whichever was the most convenient. (Assuming I could tell in advance, of course.)

Higher dimensions

So we've seen how to think of a pair of real numbers as a complex number; and how to think of a pair of complex numbers as a quaternion, though the result is no longer commutative. Can the trick be repeated? Can we take pairs of quaternions and get a multiplication on eight dimensional vectors?

Yes, we can, and the resulting objects are called octonions. This time the price we have to pay is that the multiplication is not associative. This is quite serious, as it means that we can't use matrices to represent these objects. Nevertheless, they give another interesting geometry which has been investigated both for its mathematical structure and in mathematical physics.

To finish with

I have to admit that hand calculation with quaternions can be a fairly tedious business. But you do get quite a lot of power in return, and I think they that we should all have at least a passing acquaintance with them.

There is, of course, a wealth of material available on the internet, and Google can be your friend if you ask it nicely. If you don't already know about it, the Internet Archive is an amazing resource where you can access many out of print texts, including Hamilton's books and many of his research papers. I'll just add one more link, to Pertti Lounesto's Clifford algebras and spinors. This is an exceptionally careful text, which provides a general framework where both quaternionic and geometric algebra live, and also demonstrates many applications to physics.

Tuesday, 13 June 2017

What is this thing called i?

Where did complex numbers come from?

You might think that people invented \(i\) just so that they had a solution to the quadratic equation \(x^2+1=0\). But no: it made perfect sense for a long time just to say that this quadratic (and any other quadratic with a negative discriminant) just doesn't have a solution. There was a much more compelling reason for pretending that there was a number whose square is \(-1\) than just making up solutions to quadratics that don't have them.

A few centuries ago, an algorithm was found to solve cubic equations. But something weird happened: on the way to finding a root of a cubic, sometimes you had to pretend that \(-1\) had a square root, and work with that for a while. It was all right, though, because this non-existent number always cancelled out, leaving behind a root that made sense. The fact that you can find genuine solutions to problems by using these funny objects as book-keeping devices suggested that maybe we should take the book-keeping devices seriously as a new kind of number. And once they were taken seriously, they turned out to be a door into a mathematical paradise of wonderful (and useful) structure.

It is a sad accident of history that this new kind of number, the square root of a negative quantity, has become known as an imaginary number. That gives the impression that they are somehow less real, somehow inferior to the ones that we call real numbers. In fact, they're no more imaginary than negative numbers, non-integer rational numbers, irrational algebraic numbers, or transcendental numbers. All of them are convenient extensions of the natural numbers with interesting and useful properties, and all of them are just as imaginary as the 'imaginary' numbers.

Of course, the imaginary numbers by themselves aren't very interesting or useful. They naturally cropped up in combination with the real numbers in objects of the form \(a+ib\) where \(a\) and \(b\) are real numbers and \(i^2=-1\), which we call complex numbers. \(a\) is called the real part of \(z\) and \(b\) is called the imaginary part. (Yes, the imaginary part is a real number. Deal with it.) If you forget for the moment that there is no such thing as \(i\), you can do algebra with them just by replacing \(i^2\) with \(-1\) whenever it occurs, and that is what was done to solve cubic equations.

This suggests that it's worth taking these complex numbers seriously as objects in their own right.

But what are they?

The simplest answer is basically what I just wrote above. Think of complex numbers as polynomials in \(i\) when you do algebra, and replace \(i^2\) by \(-1\) whenever it occurs. Then you can add and multiply complex numbers without any more difficulty than you have in dealing with polynomials.

If you have two complex numbers \(z=a+ib\) and \(w=c+id\), then \[ \begin{split} z+w&=(a+ib)+(c+id)\\ &=a+c+ib+id\\ &=(a+c)+i(b+d) \end{split} \] and \[ \begin{split} zw&=(a+ib)(c+id)\\ &=ac+ibc+aid+ibid\\ &= ac+i^2bd+bci+adi=(ac-bd)+(bc+ad)i \end{split} \]

And now the first bit of magic happens. Given \(z=a+ib\) we define a new thing called the complex conjugate of \(z\), which is \(\overline{z}=a-ib\). Then multiplying out the brackets gives \[ z\overline{z} = a^2+b^2 \] which is never negative (since \(a\) and \(b\) are both real numbers) and is only zero when \(z=0\). This quantity is so useful it gets a name: \(z\overline{z}=|z|^2\), and we call \(|z|\) the modulus of \(z\). Then we see that \[ z \times \frac{\overline{z}}{|z|^2} = 1 \] which means that we can divide one complex number by another: \[ \frac{w}{z} = w \times \frac{\overline{z}}{|z|^2} \] And, just as with real numbers, this works whenever \(z \neq 0\). Because of the way we've done this, the complex numbers have exactly the same algebraic properties as the real numbers, so all the calculations we could do before, we can still do: but now we can do more.

But what are they really?

So far, this is just a formal game. We pretend that there is a number whose square is \(-1\), and find that we can treat it just like an ordinary number and still do algebra. Amazingly, all seems to work. But it doesn't really tell us what this thing, this complex number, actually is.

One thing we can do is notice that a complex number is built out of two real numbers. We could think of those numbers as the coordinates of a point in the \((x,y)\)-plane, or as the components of the position vector of that point.

Then if we think of \(z\) as \((a,b)\) and \(w\) as \((c,d)\) then we have what we always had, a way of adding vectors: \[ z+w=(a+b,c+d) \] but on top of that, we have a way of multiplying vectors, \[ zw=(ac-bd,ad+bc) \] and the multiplication is algebraically well-behaved.

Looking at it in more detail, we see some more magic.

\(z=(a,b)\) gives us a picture of \(z\) in terms of Cartesian coordinates. But we could just as well use polar coordinates \((r,\theta)\) to represent this point, where \(r\) is the distance from \(z\) to the origin, and \(\theta\) is the angle between the line joining the origin to \(z\) and the \(x\)-axis. \(r\) is just \(|z|\), the modulus of \(z\), and \(\theta\) is called the argument of \(z\).

If \(z\) has modulus \(r\) and argument \(\theta\), while \(w\) has modulus \(\rho\) and argument \(\phi\), it turns out that \(zw\) has modulus \(r\rho\) and argument \(\theta+\phi\).

The multiplication isn't just algebraically well-behaved, it has a nice geometric interpretation.

So we can think of complex numbers as vectors in the plane with a well-behaved multiplication on top of the addition which we can always do for vectors.

But what are they really?

Once we start thinking about vectors, we notice that the real and imaginary parts of \(zw\) look like inner products. With a bit of fiddling about, we finally notice that we can represent \(z\) by a \(2\times 2\) matrix, \[ z=\left[ \begin{array}{cc}a&b\\-b&a \end{array} \right] \] and the matrix addition and multiplication correspond to adding and multplying complex numbers: \[ z+w=\left[ \begin{array}{cc}a&b\\-b&a \end{array} \right] +\left[ \begin{array}{cc}c&d\\-d&c \end{array} \right] =\left[ \begin{array}{cc}a+c&b+d\\-(b+d)&a+c \end{array} \right] \] and \[ zw=\left[ \begin{array}{cc}a&b\\-b&a \end{array} \right] \left[ \begin{array}{cc}c&d\\-d&c \end{array} \right] =\left[ \begin{array}{cc}ac-bd&ac+bd\\-(ac+bd)&ac-bd \end{array} \right] \] Just to check: \[ i^2 = \left[ \begin{array}{cc}0&1\\-1&0 \end{array} \right]^2 = -\left[ \begin{array}{cc}1&0\\0&1 \end{array} \right] = -1 \] So complex numbers are just a particular kind of \(2 \times 2\) matrices, made entirely out of real numbers.

This means that we can get all the algebraic properties of complex numbers without inventing any new, imaginary, numbers, as long as we're willing to think of the real number \(x\) as the matrix \[ \left[ \begin{array}{cc}x&0\\0&x \end{array} \right]. \]

You may feel that the introduction of matrices is a high price to pay to get rid of non-existent numbers. There is another, more algebraic approach.

But what are they, really?

We can also build complex numbers in a familiar kind of way: it's very similar to congruence arithmetic, sometimes called clock arithmetic. All this means is that we choose an integer \(n \gt 1\), and whenever we do sums, we only keep the remainder when we divide by \(n\). A useful way to think of this is to think of each number as a multiple of \(n\) plus a remainder, then treat \(n\) as \(0\). We call this the arithmetic of \(\mathbb{Z}_n\).

For example, if \(n=3\) the possible remainders are \(0,1,2\),and we denote this set by \(\mathbb{Z}_3\). Then in \(\mathbb{Z}_3\), we have \[1+2=3 = 1 \times 3+0 = 0, \] \[ 2+2=4=1\times3+1= 1, \] and \[ 2 \times 2=4 = 1 \times 3 +1 =1. \]

Now we run very far and fast with this ball.

We start with the set of all polynomials in \(X\) with real coefficients, but we only ever keep the remainder when we divide by \(1+X^2\).

This is a little more abstract looking, but given any polynomial \(P(X)\), we can write it as a multiple of \(X^2+1\) and a remainder. Keeping only the remainder has the same effect as treating \(X^2+1\) as \(0\); in other words, as treating \(X^2\) as \(-1\).

Actually, this is almost back to the beginning, where I said you can just think of a complex number as a polynomial in \(i\) where \(i^2=-1\); the only difference is that now we don't have to pretend that anything squared is \(-1\), we can just work with remainders like in congruence arithmetic.

Again, we haven't introduced any new objects, any imaginary numbers with negative square roots. We've taken mathematical objects we're already familiar with, and seen how to get out of them something which behaves in exactly the same way as these so-called complex numbers.

They are all of the above

There isn't really an answer to the question what are they, really?. They're all of the above. Mathematically, they are all exactly the same, in the sense that you can think of each of them as just an alternative way to write the others. But none of those representations is the 'right' one. They all give exactly the same arithmetic of complex numbers.

So we don't have to worry about 'making up imaginary numbers', and worrying that what we do doesn't make sense because they aren't 'real'. They're just as 'real' (in that they make mathematical sense) as a number whose square is \(2\). (There was some resistance to accepting \(\sqrt{2}\) as a number initially.)

And we also get some accidental benefits. We have different ways of thinking about complex numbers, which means that if we want to do something involving them we can use whichever way of thinking about them is most convenient.

If, that is, we want to do something involving them. Since we aren't sixteenth century mathematicians trying to persuade a nobleman to be our patron by being better at solving cubics than the competition, there remains the question of why we should care.

So what was the point of all that, then?

This is where it actually gets more interesting. The different representations make it natural to ask different questions, which means that we have more avenues for exploration than just having one.


Complex numbers were introduced because they cropped up naturally in the solution of cubic equations. What might we need to introduce to solve higher order polynomial equations? What if we let the coefficients of our polynomials be complex numbers themselves?

The astonishing answer is: nothing. Once you've allowed yourself complex numbers it turns out that you don't need anything else for higher order polynomials, or even polynomials with complex coefficients. The fundamental theorem of algebra tells us that any polynomial of degree \(n\) with complex coefficients (and that includes real ones, since a real number is a complex number with an imaginary part of zero) has \(n\) complex roots.

That's a beautiful mathematical result.

If beauty doesn't motivate you sufficiently, then take solace in the fact that the properties of some differential equations which arise in physics and engineering are determined by polynomials you can build out of them, and the systems modelled by these differential equations are stable if the real part of all the roots are negative. Google for control theory to see lots more.


So we can multiply two dimensional vectors together...what about three dimensional vectors? It turns out that there is no way to do this that is well behaved. But you can do it with four dimensional vectors, if you're willing to lose the commutative law. And with eight dimensional ones, but you lose the associative law. Lots of fun mathematics to play with.

But again, if you want something more useful, the complex numbers can be thought of as determining a magnitude and an angle, or a phase. This turns out to be a useful way of thinking about alternating current, and you can analyse AC circuits using complex arithmetic in much the same way as you use real algebra to analyse DC circuits.

More Algebraic

Can I try the same trick with a different polynomial from \(1+X^2\)?

Can you ever. This procedure is very similar to working in congruence arithmetic. when you work with the remainders on dividing by any \(n\), you always get well-behaved addition, subtraction and multiplication. but division only works if you can't factorize \(n\). In just the same way, with polynomials you always get well-behaved addition, subtraction and multiplication, and if the polynomial you divide by cannot be factorized, you also get division.

This turns out to be useful in (for example) the construction of error-correcting codes which have brought us back beautiful pictures from space probes, and are responsible for the robustness of CDs and digital television signals.

So the algebra of complex numbers not only makes sense, it turns out to be useful in a variety of ways.

The miracle

All the above is fascinating stuff, but it misses out a big area of mathematics, namely calculus. Given two complex numbers, \(z\) and \(w\), the distance between them in the plane is just \(|z-w|\): and since we have a notion of what it means for two complex numbers to be close to one another, we can think about what it means for a function to be continuous, or differentiable, and so on.

This leads to some entirely unexpected consequences.

It turns out that if a complex function can be differentiated once, it can be differentiated infinitely often, so we don't get monsters like the real functions which are differentiable just once. It turns out that not only can the function be differentiated infinitely often, it is actually equal to its Taylor series (as long as you don't go too far away from the point where you calculated the derivatives). It turns out that differentiable functions give mappings of the plane that preserve angles, and this can be used to analyse fluid flow and electric fields. It turns out that you can do a kind of integration which lets you compute things that are much harder to do without the complex numbers. It even turns out that they can tell you a lot about the behaviour of the normal integers, which is where the whole thing started.

And all because people in the middle ages had competitions to solve cubic equations.


This whistle stop tour of complex numbers was partly motivated by a Twitter discussion with @SallyBiskupic, @nomad_penguin and @DavidKButlerUoA. Thanks also to Adrian Aichner for pointing out some typos.

Thursday, 8 June 2017

0.999...: is nonstandard analysis the answer?

I previously blogged here about the perennial discussion on \(0.\dot{9}=1\), carefully avoiding nonstandard analysis. I also blogged here about nonstandard analysis, being equally careful not to mention \(0.\dot{9}\). Some consequent conversations with have made it evident that a look at the interaction of these two might be of interest.

\(0.\dot{9}\) meets nonstandard analysis

It is tempting (so tempting) to think that although \(0.\dot{9}=1\) in normal analysis, things will be different in nonstandard analysis. After all, there are infinitesimals there, so presumably that's a place where it makes sense for \(0.\dot{9}\) to be infinitesimally less than \(1\). But of course, just as in the real analysis case, we have to decide what we mean by \(0.\dot{9}\).

Well, whatever it means in nonstandard analysis, it has to be a natural extension of what it means in standard analysis, where \[ 0.\dot{9}= \lim_{N\to \infty} \sum_{n=1}^N \frac{9}{10^n}. \]

There are different ways of trying to transport this into nonstandard analysis, with different consequences. Here are the ones that come to my mind.

\(0.999\ldots\) with is infinitesimally less than \(1\)

First, we might try to interpret that string of digits in a plausible way, forgetting about the 'infinite sum as a limit' semantics.

Then one might try to interpret \(0.\dot{9}\) via the definition of a nonstandard real as a sequence of real numbers, and a fairly natural choice of sequence is \(0.9,0.99,0.999\ldots\), i.e. the sequence of partial sums for the limit. Clearly, since every term in the sequence is less than \(1\), the corresponding nonstandard real is less than \(1\), and since eventually every term in the sequence is bigger than any real number less than \(1\), the corresponding nonstandard real must be less than \(1\) by an infinitesimal quantity, say \(\epsilon\); so the sequence defines a nonstandard real, \(1-\epsilon\). In fact, this \(\epsilon\) is the infinitesimal defined by the sequence \(1/10,1/10^2,1/10^3,\ldots\).

But there's a problem with this. Why choose this particular sequence? We could equally have used \(0.99,0.999,0.999,\ldots\) which also defines a nonstandard real infinitesimally less than \(1\), but this time the result is \(1-\epsilon/10\); or \(0.99,0.9999,0.9999999\ldots\), which gives \(1-\epsilon^2\). There are a great many possible choices, all giving different limits infinitesimally less than \(1\).

This is not good: if \(0.\dot{9}\) is to be infinitesimally less than \(1\), it should be a well-defined nonstandard real number, and the infinitesimal should be unambiguously defined. This is a fail on both criteria. We need to try something else.

\(0.9\ldots9\) with an infinite number of \(9\)'s is infinitesimally less than 1

Let \(N\) be a hyperfinite natural number (i.e. a natural number in the nonstandard reals that is greater than any \(n \in \mathbb{N}\)). Then the transfer principle tells us that just as for finite \(N\), \[ \sum_{n=1}^N \frac{9}{10^n} = 1- \frac{1}{10^N} \] and \(1/10^N\) is, of course, an infinitesimal.

This looks a bit like the previous answer: the different choices of \(N\) correspond to different sequences of terms of the form \(0.9...9\), and give different infinitesimals. But it does have some more information. A bigger \(N\) corresponds to a sequence of natural numbers that grows faster, and so a sequence that converges to \(1\) faster: and this is what it means for the infinitesimal difference to be smaller. Nevertheless, we still don't obtain a well-defined infinitesimal that is the difference between \(0.\dot{9}\) and \(1\).

In fact, it's not just a bit like the previous answer: it is the previous answer, but expressed in a different language. Let's take a closer look.

If \(N\) is a hyperfinite natural number, then it corresponds to a divergent sequence \(n_1,n_2,n_3\ldots\) of natural numbers. This defines a new sequence \(p_1,p_2,p_3,\ldots\) where \(p_i\) is \(0.9\ldots 9\) with \(n_i\) nines. This sequence of real numbers, \(p_i\), determines the nonstandard real \(\sum_{n=1}^N 9/10^n\), and the sequence \(1/10^{n_i}\) determines the infinitesimal difference \(1/10^N\) between this number and \(1\).

So each of these approaches gives us something infinitesimally less than \(1\) but is really satisfactory.

In fact, we get a more satisfactory situation by saying that

\(0.999\ldots = 1\) in nonstandard analysis

Perhaps the most mathematically natural interpretation of \(0.\dot{9}\) is by simply to use the definition above: \[ 0.\dot{9}= \lim_{N\to \infty} \sum_{n=1}^N \frac{9}{10^n} \] but remembering that now \(N \in {}^*\mathbb{N}\), so that \(N\) can be larger than any finite integer.

We also have to remember what this means: given any \(\epsilon \gt 0\), there is some \(K\) with the property that \(\sum_{n=1}^N 9/10^n\) is within \(\epsilon\) of the limit whenever \(N \geq K\). If \(\epsilon\) is infinitesimal, then \(K\) is hyperfinite.

But then the transfer principle guarantees that the standard analysis argument still works, and the limit is \(1\), just as in standard analysis. This is clear and unambiguous, if a little disappointing.

There is another way of thinking about it that I'm going to consider, though, and it's the strangest one, and the hardest to understand (which is partly why I've kept it till last).

\(0.999\ldots\) isn't even a thing in nonstandard analysis.

We might think of \(0.\dot{9}\) in a slightly different way: \[ 0.\dot{9} = \sum_{n \in \mathbb{N}} \frac{9}{10^n} \] forgetting (for the moment) that this is really just an evocative shorthand for \[ \lim_{N \to \infty} \sum_{n=1}^N\frac{9}{10^n} \] which is itself just a notation for 'the thing I can get arbitrarily close to by making \(N\) sufficiently large'.

Let's take the notation seriously, and think of it as representing the sum of a collection of numbers indexed by \(\mathbb{N}\).

There is something very strange here. It seems that summing over a finite number of terms gives an answer less than \(1\), summing over all natural numbers gives exactly \(1\), but then summing a hyperfinite number of terms gives an answer that is less than \(1\). How can adding more terms make the answer smaller again, even by an infinitesimal amount?

This is where the waters become relatively murky.

When we talk about a property of (a set of) real numbers in nonstandard analysis, what we're really doing is talking about a property of the representatives of (the set of) these real numbers in the nonstandard reals. Now, a single real number does have a natural nonstandard real that it corresponds to, and it's the obvious one. This works for any finite set of real numbers. But the representative of an infinite set of real numbers always acquires new elements. So the natural representative of the set of all natural numbers, \(\mathbb{N}\), is \({}^*\mathbb{N}\), which includes all the hyperfinite natural numbers too.

In fact there is no way to specify the set of finite natural numbers in \({}^*\mathbb{N}\) in the language of nonstandard analysis. It simply isn't a subset of \({}^*\mathbb{N}\), just as there is no set of all infinitesimals, or of all hyperfinite natural numbers.

Actually, simply is something of a lie here: Sylvia Wenmackers (@SylviaFysica) called me on this, so here's my attempt to explain what I mean by it.

Again, we have a rather subtle idea to struggle with. There are collections which we can describe in the language of nonstandard analysis, which we call internal sets. These are the meaningful sets, since they are the only ones we can talk about when we are working within nonstandard analysis. The other sets, the ones which require the use of descriptions such as infinite or infinitesimal, are called external sets. We can only talk about them when we talk about nonstandard analysis, looking at it from outside. They don't correspond to anything meaningful inside nonstandard analysis. Depending on the way that you like to do your nonstandard analysis, you can think of internal sets (part of the system) and external sets (not part of the system); or you can think of sets and descriptions which don't describe sets at all.

The situation is a little like that of trying to talk about the collection of all sets in set theory. If you try to make that collection a set, then you get into all kinds of problems, so you are forced into the position that some collections are sets, and have all the properties we require of sets but others, which are called proper classes are not, and do not.

This is hard to come to terms with, but we really don't have any option. It's the price we have to pay for having access to infinitesimal and infinite quantities in our number system, if we want to retain the properties that make things work well, such as every set of natural numbers having a smallest member (there is no smallest hyperfinite natural number), or every set with an upper bound having a least upper bound (there is no least upper bound for the infinitesimals).

To summarize: the only objects which exist meaningfully in nonstandard analysis are the ones which can be described using the language we already have for real analysis: and that doesn't allow us to talk about infinitesimal or infinite quantities. If we work purely inside the nonstandard reals, we can't tell that some are infinitesimal and some are infinite, because we can't pick out the standard reals to compare them with. It's only when we look at the system from outside that we can tell that some nonstandard reals are smaller than all positive reals, and some nonstandard natural numbers are larger than all standard natural numbers.

So the answer to the question up above is likely to feel unsatisfactory: it is that that isn't actually a question, because it refers to things that don't make any sense in the context of nonstandard analysis.

So where does that leave us?

It leaves us with the conclusion that the situation for \(0.999\ldots\) is exactly the same in nonstandard analysis as it is in standard analysis. Either the string of \(9\)'s terminates, in which case the number is less than \(1\); or it does not terminate, in which case it represents a limit, and the corresponding number is exactly \(1\).

What we gain in the nonstandard interpretation is that the string might terminate after a hyperfinite number of terms, in which case the result is less than \(1\) by an infinitesimal. In this case we can think of the hyperfinite number of terms as specifying a particular rate at which the number of \(9\)'s increases, and the infinitesimal as telling at what rate the sequence of numbers of the form \(0.9\ldots9\) approaches \(1\): the hyperfinite and infinitesimal quantities have an interpretation in terms of the standard reals in which they tell us about rates of convergence.

It has been argued that one reason for the widespread conviction that \(0.\dot{9} \lt 1\) is that people are somehow (implicitly, inchoately, even unwittingly) thinking of \(0.\dot{9}\) as a terminating infinite, i.e. hyperfinite, sum, requiring the addition of an infinitesimal \(0.0\ldots01\) term (with a terminating infinite, i.e. hyperfinite, number of zeros) to give \(1\). I'm not entirely convinced by this: it seems more to me that they're taking the 'obvious' inductive limit of the behaviour of a finite terminating decimal expansion, and that this interpretation is more eisegesis than exegesis of the phenomenon. But I have encountered people trying to argue that the difference is a decimal expansion with infinitely many \(0\)'s followed by a \(1\), so the idea can't be entirely discounted.

After all that, is nonstandard analysis the answer?

No. nonstandard analysis isn't the answer. But it does give us a framework for thinking about the question in a fruitful way, and for asking (and answering) some interesting questions. Maybe that's even better.

Sunday, 4 June 2017

A tale of a tangle

At the early stages of mathematics, things are all very satisfying. At first you don't understand something, then you think about it, and solve problems, and think about the problems (rinse and repeat), and after a while you do understand it. But as you progress, you realise that you often go back to these things that you already understood and find that there's a whole new layer of understanding that you didn't previously realize you lacked, and a whole new bunch of thinking and understanding to do.

In a previous blog, I argued that the lack of a clear agree notion of the real numbers is one of the problems in providing a convincing argument that \(0.\dot{9}=1\). This time I want to present some evidence that even after we agree on what the real numbers are, there are still some surprises in store, even (or especially) regarding some quite fundamental properties.

How the rational and irrational numbers fit together is something which gives me the feeling that the harder I think about it, the less I understand it.

Rational and irrational

For many, one of the first big mathematical surprises we meet as students is that there is the same number of rational numbers in the interval \([0,1]\) as there are positive integers (so we call that set countable), but that there are more real numbers (and therefore more irrational numbers) then there are rational numbers (so we call that set uncountable).

Both arguments are very familiar, but each is interesting in itself, so let's recall them briefly.

First, we can explicitly list the rational numbers in \([0,1]\), as follows: for each denominator \(n \geq 1\), list the rational numbers \(0/n,1/n\ldots n/n\), and build a list of each such group, in order of increasing \(n\). Then remove from the list any fraction which reduces to an earlier one in the list. We now have a list which contains all the rational numbers in \([0,1]\), and so a bijection between \(\mathbb{N}\) and this set. So there are the same number of rationals in \([0,1]\) as there are natural numbers.

On the other hand, it is impossible for a list of real numbers to include all of them.

The proof of this relies on Cantor's wonderful diagonalization argument. First, we have to get a slightly awkward detail out of the way: any number whose decimal expansion eventually terminates can also be represented by a decimal which ends in an infinite string of 9's. We agree to use the version which terminates whenever this choice arises, and now there is a unique decimal representation for every real number.

Now suppose we have an infinite list of real numbers in \([0,1]\) (or rather, their decimal representations). We build a real number by looking at the \(n\)th decimal digit of the \(n\)th number in the list: if it is anything but a \(1\), we replace it by a \(1\), and if it is a \(1\), we replace it by a \(0\). This real number cannot be the first number in the list, because the first digits are different; it cannot be the second, because the second digits are different; and so on. Obviously, this real number cannot be any number in the list.

This tells us that whatever infinite list of real numbers we consider, it cannot include all the real numbers in \([0,1]\). So there are more real numbers in \([0,1]\) than there are rational numbers, even though both sets are infinite. This has been known to cause an unpleasant reaction in some people, but there really isn't any avoiding it without performing some fairly radical surgery on the basic ideas of mathematics. (To be fair, that's just the route that some people have gone down, including perfectly respectable mathematicians. You can reject the notions of infinite sets and irrational numbers entirely, and work with what's left. But let's stick with normal mathematics here.)

It doesn't quite get us what I claimed though: I claimed that there were more irrationals than rationals. All I know so far is that by taking the total collection of rationals and irrationals, I get a collection with more numbers than just the rationals. But that last step's not hard to see. If the irrationals were in fact countable, we could just make a list of all the real numbers by interleaving the rational and irrational numbers, and that would mean the set of real numbers was countable after all.

So there's no doubt: the irrational numbers are uncountable, and therefore there are definitely more irrational numbers than rational numbers.

A complicated arrangement

The arguments above, though they're commonplace and almost universally accepted now, caused quite a stir to begin with, and a certain amount of mathematical feuding. But interesting though it is to see that infinite sets aren't necessarily the same size, it only scratches the surface of some stuff that I find pretty mysterious.

Bizarrely, not only are there more irrationals than there are rationals, but the rationals don't take up any room.

That's worth thinking about. We know that no matter how small a subinterval of \([0,1]\) we choose, there are infinitely many rational numbers in there; and likewise, there are infinitely many irrational numbers in there. Both sets of numbers are dense in \([0,1]\). So how can the rational numbers not be taking up any space?

It has to do with the mysterious way that the rational numbers and the irrational numbers are arranged to make the real line.

We can see that the rational numbers take up no space by showing that no matter how small a positive number, \(\epsilon\), we choose, we can fit all the rational numbers into a region of size less than \(\epsilon\). So, back to our list of rational numbers. Around the \(n\)th rational number in the list, we put an interval of width \(\epsilon/2^{n}\). Call this collection of intervals an \(\epsilon\)-covering of the rationals. Every rational number is in at least one of the intervals, and since every interval contains infinitely many rational numbers there will be lots of overlaps. Therefore the total amount of space required to put all the rational numbers into these intervals is less than \[ \epsilon\sum_{n=1}^\infty \frac{1}{2^n} = \epsilon. \]

It follows that the rationals can't actually take up any space at all. Those intervals, of total size less than \(\epsilon\), include all the rational numbers, but miss out at least \(1-\epsilon\) of the stuff in \([0,1]\), and we can choose \(\epsilon\) as small as we like.

So how does that work?

I can't see it. The rationals are dense, so there's a finite interval around every one of a dense set of points. And yet the total amount of space taken up by these can be made smaller than any positive number you choose. What's missed out of the intervals? It's a set of irrational numbers that can't include any intervals at all, because any interval would contain a rational number. And yet the set of irrational numbers that's omitted by an \(\epsilon\)-covering of the rationals takes up at least \(1-\epsilon\) of the space in \([0,1]\).

It is tempting to think that this is because there are so many more irrationals than there are rationals: naturally they take up much more of the space. But this isn't it. There are uncountable subsets of of reals that also take up no space. The Cantor set (yes, same Cantor) is probably the best-known example.

I don't have any kind of handle on how this all fits together. I have no feel for how what's left behind when you remove countably many intervals can be uncountable many irrationals, all separated by what has been removed, but nevertheless occupying almost all the space. I think it's saying something quite deep about the structure of the real line, but I don't know what.