Friday, December 29, 2017

Measuring up to Heisenberg (and since)

'Everybody knows' that the uncertainty principle of quantum mechanics tells us that we can't make a measurement on a system without disturbing it, and that that's why you can't simultaneously measure the position and momentum of a particle. This isn't an entirely unreasonable thing to think: it is, after all, closely related to Heisenberg's original argument involving using light to probe the properties of a particle. But it's wrong. The situation in quantum mechanics is subtler and much more interesting than that.

Statistics rears its ugly head

Suppose we have a large collection of boxes, each containing a particle, and they have been prepared so that the particle in every box is in exactly the same state. (Or if you prefer, one box containing a particle, prepared in the same way a large number of times.)
You may be wondering how we do that. I'm not going to tell you, largely because I have no idea. But I can tell you what quantum mechanics predicts about the measurements we make on the particles inside. You may think this is unfair, but it's no less fair than using classical mechanics to tell you what happens to a ball thrown at 20 meters per second at an upward angle of 30 degrees from a cliff edge 100 meters above sea level: classical mechanics doesn't tell us how to arrange that situation either, only what the consequences are.
Now we measure the \(x\)-coordinate of the position of each particle. It could be that every single box is guaranteed to give the same value for this measurement. In this case we say that the particle is in an eigenstate of \(x\)-position, and the eigenvalue is the \(x\) value that we measure.
But this is not a very likely occurrence. It is much more likely that the measurements result in different values. Then we can compute the average value of all the measurements, and if we have the enthusiasm and stamina, we can compute the standard deviation. Doing this lots of times with differently prepared particles, we find that sometimes the standard deviation is large, and sometimes it is small: so sometimes the particle has a relatively well-defined position (in the sense that the measurements tend to be close together), and sometimes not.
Instead of measuring the \(x\)-coordinate of the position of each particle, we might measure the \(x\)-component of its momentum. A similar situation results. Depending on the initial preparation, we sometimes find a large spread of results, and sometimes a small spread.
There is an obvious question we might ask: what is the relationship between the measurements of position and momentum? The answer is simple to describe, but not quite so easy to explain.
It turns out that we can prepare the particles so that they have a very small standard deviation in their position measurements. We can also prepare them so that they have a very small standard deviation in their momentum measurements. But we can't prepare them in a way where both sets of measurements have a small standard deviation. In fact, the product of the two standard deviations cannot be reduced below a certain value. The more tightly clustered the position measurements are, the more spread out are the momentum measurements; conversely, if the momentum measurements are tightly clustered, the position ones are more spread out.
This is not easy to understand from the point of view that the act of measurement disturbs the system. All the measurement are being carried out on particles in the same state, but some measure position and others measure momentum. So what is going on here?

What is a particle's state?

This is one way in which quantum mechanics is really different from classical mechanics.
In classical mechanics, the state of a particle is described completely by its position \(\pmb{x}\) and momentum \(\pmb{p}\). Given these quantities and the knowledge of the forces acting on the particle, the subsequent values of position and momentum are entirely determined.
The quantum mechanics picture is rather different.


In quantum mechanics, the state of a system is described by a vector, often denoted \(\psi\). So far, this does not sound so different from the classical mechanics situation: we can think of the position and momentum as a vector too. But it's a different kind of vector.
In the quantum mechanics picture of the universe, each quantity that we can observe is associated to a linear operator (in fact, a special kind of linear operator, called Hermitian, but I won't go into that level of detail here) I'll use \(H\) to represent a (Hermitian) linear operator. What this means is that \(H\) is a machine that I can feed a vector and it returns a vector in a well-behaved way: if \(\alpha, \beta\) are any two complex numbers, and \(\psi,\phi\) are any two vectors, then \[ H(\alpha \psi + \beta \phi) = \alpha H(\psi) + \beta H(\phi) \]
Although I slid past it fairly quickly just then, you may have noticed that I said \(\alpha\) and \(\beta\) could be complex numbers: this is one of the ways that quantum mechanics is different from classical mechanics---the mathematics of complex numbers is there at the fundamental level of the kind of vectors that can be used to describe the state of a system.
Now, if we choose our state vector very carefully, we can have the situation \[ H(\psi) = \lambda \psi \] where \(\lambda\) is a real number. When this happens, \(\psi\) is called an eigenvector of \(H\), and \(\lambda\) is the associated eigenvalue. (For general linear operators the eigenvalues can be complex, but for Hermitian operators they must be real.)
Eigenvectors are very important in this story. I'll explain why shortly, but first, we need to know this: If \(H\) is the linear operator corresponding to an observable, then there are eigenvectors \(\psi_i\) of \(H\) such that any state vector \(\psi\) can be expressed in the form \[ \psi = \sum_{i} \alpha_i \psi_i \] in just one way, where \(\sum_i |\alpha_i|^2 = 1\). (We call this set \(\{\psi_i\}\) the basis of eigenvectors associated with \(H\)).


'Just what,' you should be wondering by now, 'does this have to do with measuring a property of a particle?'
Well, given an observable, \(H\), (note that I'm abusing terminology a bit here by referring to the linear operator as the observable: I do that) and its associated basis, \(\{\psi_i\}\), we know that each \[ H (\psi_i) = \lambda_i \psi_i \] for some real number \(\lambda_i\).
Then if we have a particle in the state \[ \psi = \sum_{i} \alpha_i \psi_i \] We can carry out a measurement of the observable corresponding to \(H\).
The result of the measurement must be one of the values \(\lambda_i\), and a result of \(\lambda_i\) occurs with probability \(|\alpha_i|^2\). (So we don't know just what the result will be, just the probabilities of the various possible results.) Furthermore, immediately after the measurement, if the measured value is \(\lambda_K\), then the state of the system is \(\psi_K\). (This is called the collapse of the wave function.)
Quantum mechanics does not tell us which outcome will occur, it only tells us the probabilities of the various outcomes.
So for example, if the state of the system were \(\psi = 0.6 \psi_1 + 0.8 \psi_2\), then the result of measuring \(H\) would be \(\lambda_1\) \(0.36\) of the time (and immediately after these measurements the system would be in state \(\psi_1\)), and it would be \(\lambda_2\) the remaining \(0.64\) of the time (and immediately after those measurements the system would be in state \(\psi_2\)).
On the other hand, if the state of the system were \(\psi=\psi_1\), then the result of measuring \(H\) would always be \(\lambda_1\), and the state of the system would be unchanged by the measurement. (This, of course, contradicts the claim that it is impossible to make a measurement without disturbing the system.)
So we can see that (given this model of particle states and measurements), the average and spread of measurements is determined by the coefficients \(\alpha_i\); if most of the weight is associated with a particular basis vector, then the results have a very small spread, and if many basis vectors make a significant contribution, then the results are more spread out. |


So now we can try to see what this might suggest about measurements of position and momentum. Let's call the operator corresponding to measurement of \(x\)-position \(X\), with corresponding basis \(\{\psi_i\}\) and the operator for the momentum measurement, \(P\), with corresponding basis \(\{\phi_i\}\).
The crux of the situation is that the \(\psi_i\) cannot be eigenvectors of \(P\), and the \(\phi_i\) cannot be eigenvectors of \(X\).
So if a particle is prepared in a state which guarantees a particular value of \(x\), so \(\psi = \psi_i\), then it must be a combination of \(\phi_i\); likewise, if \(\psi=\phi_i\), it must be a combination of \(\psi_i\). So certainty of position means that momentum is uncertain, and vice versa.

What's going on here?

The problem is that the operators \(P\) and \(X\) do not commute: if \(\psi\) is a state vector, then \(PX(\psi) \neq XP(\psi)\). The mathematics of Hermitian operators tells us that it is possible to find a basis of vectors which are eigenvectors for a pair of \(H_1\) and \(H_2\) if and only if the two commute, i.e. \[ H_1H_2 (\psi) = H_2H_1(\psi) \] for all vectors \(psi\).
We can define the operator \([H_1,H_2]=H_1H_2 - H_2H_1\), called the commutator of \(H_1\) and \(H_2\), and then we can equivalently say that \(H_1\) and \(H_2\) commute if \[ [H_1,H_2] = 0 \] (meaning that \([H_1H_2](\psi =0) \) for all \(\psi\).
It then follows that since \(P\) and \(X\) do not commute, one cannot have simultaneously sharply defined values for both the \(x\)-position and the \(x\)-component of momentum for a particle.
On the other hand, if you find two observable which do commute, then it is possible to have sharply defined values for both; indeed, either both will be sharply defined or neither will!

Heisenberg's uncertainty relation

Heisenberg's uncertainty relation (as we now understand it) tells us how the extent to which two operators fail to commute tells the extent to which values of the corresponding observable cannot be simultaneously well-defined. More precisely, given a state \(\psi\), and observables \(A\), and \(B\) the product of the standard deviations of measurements of \(A\) and \(B\) must be at least half the length of the vector \([H_1,H_2](\psi)\).
In the case of \(x\)-coordinate of position and \(x\)-component of momentum, the length of \([X,P](\psi)\) is always \(h/2\pi\), given the well-known uncertainty relation.

Just a minute \(\ldots\)

We've actually shifted ground in a fairly radical way here. We've gone from 'you can't measure a particle's position and momentum simultaneously' which comes from a world-view in which it has a position and momentum, but we can't access the values', to 'a particle's state does not simultaneously specify a sharp position and momentum' which comes from a world-view where the state of a particle just isn't the same information as it is in classical physics.
This, together with the fact that the theory is probabilistic rather than deterministic has been a significant issue in the development of quantum mechanics. The desire to maintain a 'realist', deterministic picture of physics, in which particles do have well-defined position and momentum even if we can't know them survives, and even now has its proponents.
And all this doesn't actually explain the uncertainty principle. It just provides a mathematical/theoretical framework which it is a consequence of. But that's OK, because that's generally how physics proceeds: we have phenomena that are hard to understand, and we develop a structure in which those phenomena are natural. Once we've become thoroughly accustomed to that structure, we tell ourselves we understand what's going on. At least, until the next Big Idea comes along and we start all over again.

Further reading

If this has whetted your appetite for more detail, there are, of course, many places to go to. This is a short list of 'not a standard undergraduate textbook' books which I think are interesting.
  • Susskind and Friedman Quantum mechanics: the theoretical mininum gives a careful presentation that doesn't require a huge amount of mathematical backjgound. The first five chapters fill in most of what I've glossed over above, and then it goes on to look at other interesting stuff such as entanglement. It also explains how the quantum state evolves in time, which I haven't mentioned at all.
  • Penrose's The Emperor's New Mind (and his other more recent tomes) give wonderful insights into quantum mechanics (inter alia), including some speculative ideas about wave function collapse.
  • Bohm and Hiley's The undivided universe is an exposition of pilot wave theory, which provides a deterministic and realist model of quantum mechanics. I haven't invested the time and effort to be able to do more than say: this looks like a good place to start if you want to know more.
  • Feynman's lecture notes are, of course, worth looking at once you have some idea what's going on.
You may have noticed that I haven't listed any resources on the philosophy of quantum mechanics. This is because I know my limitations.

Friday, November 17, 2017

Russian peasant cryptography

What is the relationship between Russian peasant multiplication and modern cryptography? Unlike the multiplication procedures commonly taught, it can easily be adapted to compute powers rather than products; and we need to calculate powers for some cryptosystems.
I should probably admit right away that this is not going to be at all informative about how Russian peasants managed keep their communications secret from their aristocratic overlords.

Russian Peasant Multiplication

If you haven't seen this before, it's a rather neat approach to multiplication which doesn't require the memorization of multiplication tables, but only the ability to double or half integers. It seems to be rather unclear just what it has do do with Russian peasants: it's certainly less than Danish pastries have to do with Denmark. The method was known to the ancient Egyptians, who considerably predate Russian peasants.
The general description is simple, but not very informative. Nevertheless, I will start by presenting it.
If you want to multiply together two integers, \(m\) and \(n\), form two columns, one headed by \(m\) and the other by \(n\). Then repeatedly add a new row obtained by halving \(m\) (discarding any remainder) and doubling \(n\), stopping when you end up with a \(1\) in the \(m\) column. Finally, add up all the entries in the \(n\) column where the \(m\) value is odd.
It's much more informative to see an example. So let's work out \(13 \times 7\). This gives us the table \[ \begin{array}{|c|c|} \hline m & n \\ \hline 13 & 7\\ 6 & 14\\ 3 & 28\\ 1 & 56\\ \hline \end{array} \] after which we add up the right hand entries when the left hand entry is odd, giving \[ 7+28+56 = 91 \] which we can quickly check is \(13 \times 7\).
The interesting question is, of course, why does this mysterious looking procedure work?
If we look at the left hand column, and work our way up, recording a \(1\) when we see an odd number, and a \(0\) when we see an even one, we obtain \(1101\), which is (by no coincidence whatever) the binary representation of \(13\). In fact, this is exactly the standard algorithm for converting numbers to their binary representation.
Then we can see that the repeated doubling of the \(m\) column is calculating the product of \(m\) with the corresponding power of \(2\), so that adding those corresponding to an odd value in the \(n\) column is adding up \(m\) multiplied by those powers of \(2\) which sum to \(m\).
So by using this we can multiply any integers together, and what's more the algorithm is reasonably efficient. I admit that it's not as efficient as the column or grid methods usually taught now, but at least you don't need to know your times tables to do it.
This is rather nice. At least, I like it. But rather remarkably, it can be adapted slightly to work out something apparently much more complicated, and very useful in modern (public key) cryptography.

Russian peasant exponentiation

We make two changes to the above algorithm above, both only involving the \(m\) column.
  • Instead of doubling, we square
  • Instead of adding at the end, we multiply.

Then in each row of the table, instead of having \(m\) multiplied by the corresponding power of \(2\), we have it raised to the corresponding power of \(2\). Multiplying all these together then gives us \(n^m\).
Let's see how it works with the previous \(m\) and \(n\). \[ \begin{array}{|c|c|} \hline m & n \\ \hline 13 & 7\\ 6 & 49\\ 3 & 2401\\ 1 & 5764801\\ \hline \end{array} \] and multiplying the relevant terms together gives \[ 5764801 \times 2401 \times 7 = 96889010407 \] which is, indeed, \(7^{13}\).
But unlike the multiplication, this isn't just reasonably efficient: this is remarkably efficient.
If we want to find \(n\) to the power of \(m\), we could do it by multiplying together \(m\) lots of \(n\). This is like finding \(m \times n\) by adding together \(m\) lots of \(n\): you can do it, but if \(m\) is bigger than about \(3\) you really have better things to do with your time.
But with this algorithm, the number of squarings that you have to do is at most about \(3\) times the number of digits of \(n\) (since the binary representation of a decimal number has about \(3\) times as many bits as the decimal one has digits), followed by that many multiplications.
Of course, we need to be able to multiply to do this, but that's OK: the multiplication algorithm lets us multiply by doubling and adding, and now the adapted version lets us exponentiate by squaring and multiplying. It's still a good deal.

A quick outline of RSA

The point of this is that in the RSA (Rivest-Shamir-Adleman) approach to public key cryptography (and also in some others), we need to compute powers in order to encrypt and decrypt messages.
The way that the procedure works is that you choose a private pair of primes, \(p\) and \(q\), and tell the world the value of \(N=pq\). (You keep \(p\) and \(q\) secret: the algorithm relies on the fact that factorizing \(N\) is hard.)
Then you pick a number \(e\) and find \(d\) with the property that \(ed\) is \(1\) more than a multiple of \((p-1)(q-1)\). You tell the whole world \(e\). (You keep \(d\) secret. The algorithm relies on the fact that finding \(d\) without knowing \(p\) and \(q\) is hard.)
Then you represent your message as an integer \(M \lt N\). The encrypted form of your message, \(E\) is the remainder when you divide \(M^e\) by \(N\).
Then by the magic of number theory (actually, it isn't magic, it's Fermat's Little Theorem) the original message is the remainder when you divide \(E^d\) by \(N\).
In principle, this is all there is to it, and there are many descriptions available, often illustrated using values of \(N\) which are small enough to make all the arithmetic tractable.
But there's an issue.
In practice, the values of \(M\) and at least one of \(e\) or \(d\) is very large. If you had to do repeated multiplication, the universe would die of old age before you were finished. This algorithm reduces the amount of work to manageable proportions. \(e\) (or \(d\)) multiplications are reduced to a number of multiplications that is a small multiple of the number of digits of \(e\) (or \(d\)).
That wasn't the only issue.
The value of \(M^N\) is almost inconceivably large. You couldn't write it down if you used every elementary particle in the known universe to hold one digit. So how does that work?
The answer is more number theory. One of the really nice properties of working out remainders is that it doesn't matter where in a calculation you do it. If you want to know the remainder of some quantity when you divide by \(N\), then you can calculate the whole thing then divide by \(N\); or, rather more sensibly, you can work it out in stages and take the remainder whenever the working value exceeds \(N\). You are guaranteed to obtain the same result in the end.

Russian Peasant RSA

So now we can see how to compute an encrypted message. We can use the adapted Russian peasant multiplication, but now we just add the final detail that whenever the squared number exceeds \(N\), we take the remainder on division by \(N\).
Let's see this work with finding the remainder on dividing \(7^{13}\) by \(59\).
\[ \begin{array}{|c|c|} \hline m & n \\ \hline 13 & 7\\ 6 & 49\\ 3 & 2401 \rightarrow 41\\ 1 & 1681 \rightarrow 29\\ \hline \end{array} \] giving \[ 29 \times 41 \times 7 = 8323 \rightarrow 4 \] which is (phew!) the remainder on dividing \(96889010407\) by \(59\).
But here's the punchline. This algorithm for calculating the exponential is what is usually referred to as exponentiation by (repeated) squaring. And this (or some variation on it) is in fact the algorithm used in implementations of public key cryptography (such as RSA) which rely on computing exponentials.
So there we have it. What is essentially an algorithm for multiplication which can be used by people who don't know their times tables turns into an algorithm for exponentiation which underlies contemporary cryptographic methods.


At the 2017 MathsJam Gathering, Elaine Smith and Lynda Goldenberg gave a talk about techniques for multiplication including the Russian peasant method. That was when the penny dropped for me that the repeated squaring algorithm was an adaptation of the Russian peasant multiplication, and I decide it was worth sharing the realisation. About twenty minutes lates, Oliver Masters gave a talk about the Fibonacci matrix where he mentions the repeated squaring algorithm for matrix exponentiation. Fortunately he didn't relate it to the multiplication algorithm. About five minutes after that, Colin Wright (@ColinTheMathmo) was wrapping up the talks and he mentioned the relationship. But by that time I had the bit between my teeth and had decided to do this anyway, so I have.
If you enjoyed this, then you'll probably enjoy @ColinTheMathmo's series of articles about Perrin numbers.

Thursday, October 26, 2017

A minus times a minus is a plus

The reason why, we really should discuss.

So, why is the result of multiplying two negatives numbers together a positive number? There are various answers to that.
First, let's think a little about why we care.

The rules of the (arithmetic) game

We start off with the non-negative integers \[ \mathbb{N} = \{0,1,2,3,\ldots\} \] which we all learn to manipulate from a very young age. We know how to add and multiply them, and (at least sometimes) to subtract, at least as long as the number we want to subtract is no bigger than the one we want to subtract it from.
Then somebody comes along and says that when you subtract \(2\) from \(1\), the answer is \(-1\): a bit like \(2-1\), but with a minus sign in front of it. This is a new kind of number called a negative number, and they're useful for situations where you have to subtract a bigger number from a smaller one, and in certain unpleasant circumstances to describe the amount of money you have in the bank.
But now we have to ask: when we do arithmetic with them, how do they work?
One solution is just to present some rules, including the rule that the product of two negative numbers is the same as the product of their positive counterparts. In other words:

That's just the rule, isn't it?

Ooh, not at all satisfactory. The rule seems completely arbitrary. But did we really have any choice? Are there any constraints that determine this? What if we'd chosen the rule that the product of two negative numbers was negative?
Let's hold this thought for the moment and think about how \(\mathbb{N}\) behaves.
So, let's take a careful look at the non-negative integers, and what we know about them. In fact, there is a bunch of stuff that we know. I'm too lazy to write in an explicit multiplication symbol, so I'll use \(mn\) to represent the product of \(m\) and \(n\).
In the list below, \(m,n\) and \(p\) are arbitrary elements of \(\mathbb{N}\). Then
  1. \(m+n = n+m\) (addition is commutative)
  2. \(m+(n+p) = (m+n)+p\) (addition is associative)
  3. \(mn = nm\) (multiplication is commutative)
  4. \(m(np)= (mn)p\) (multiplication is associative)
  5. \((m+n)p = mp+np\) (multiplication distributes over addition)
  6. \(0+n = n\) (\(0\) is the additive identity)
  7. \(1n = n\) (\(1\) is the multiplicative identity)
  8. If \(m+n = p+n\) then \(m=p\) (cancellation law for addition)
  9. If \(mn=pn\) and \(n \neq 0\) then \(m=p\) (cancellation law for multiplication)
There are lots of other rules which we could write down, but these are all familiar, and the others follow from them, so we'll take them as a useful working collection.
In particular, there's one very familiar one which is conspicuous by its absence. It is the rule \(0n = 0\). I left it out on purpose, because we can show that it follows from the others, by the following argument:
We know that \(0+0=0\), because \(0\) is the additive identity. But that means that \((0+0)n=0n\), no matter what \(n\) is. By the distributive law, then, \(0n+0n=0n\), so that \(0n+0n=0n+0\), and then by the cancellation law, \(0n=0\).
There is an important lesson hiding in here. It is that our familiar rules aren't independent: once we have agreed on some of them, others are forced upon us. (So it's a good idea to choose a useful collection of rules to start with!)
This should raise an ugly possibility. What if my collection of rules isn't even consistent to begin with? How can I check that?
That's actually quite hard. Let's settle for the moment by accepting that there really is a set of non-negative integers with operations of addition and multiplication which satisfies the above rules. So there are lots of consequences from them that I haven't written above, but at least it all makes sense.
All this doesn't really help yet, but I can make use of it. Whatever the negative integers are, I want arithmetic to keep obeying the same rules as before. I want \(0\) and \(1\) to have their familiar properties, and I want the usual laws of algebra to continue to hold. So for every positive integer \(n\) I introduce a negative integer \(-n\) with the property that \(-n+n=0\). The set of all integers, positive, zero, and negative, is denoted \(\mathbb{Z}\). Now we can investigate what happens if we insist that all the rules above continue to hold.
Then \(\ldots\)

It follows from the algebra

Once we have all the stuff above, it then follows by a fairly short argument that \((-1)(-1)=1\), if we insist that all the above rules continue to hold when we allow \(m,n\) and \(p\) to be negative.
First, we all agree that whatever we mean by \(-1\), it has the property that \(-1+1=0\). Then we know that \[ (-1+1)(-1) = 0(-1) = 0 \] and so that \[ \begin{split} (-1+1)(-1)&=(-1)(-1)+1(-1)\\ &=(-1)(-1)+(-1)\\ &=0 \end{split} \] But now adding \(1\) to each side gives \[ (-1)(-1)+(-1)+1=0+1 \] so that \[ (-1)(-1)+ 0 = (-1)(-1)= 1 \] So if we insist that all the algebraic rules for non-negative numbers continue to hold, we have no option. A minus times a minus is a plus.
Well, now we can see that if we decide that a minus times a minus is minus, then the ordinary rules of algebra must break somewhere. So this isn't really an arbitrary rule, it's forced upon us by the requirement that the algebraic properties of \(\mathbb{Z}\) are the familiar algebraic properties of \(\mathbb{N}\).
We can see various other things, too: one very important one is that \((-1)n = -n\). The argument is fairly similar: \[ \begin{split} (-1)n & =(-1)n + 0\\ &= (-1)n + n + (-n)\\ &= (-1)n+1n + (-n)\\ &=(-1+1)n + (-n)\\ &=0n + (-n) \\ &= -n \end{split} \]
So, this tells us that if we extend the non-negative integers to include negative ones, and insist that algebra works as before, there are certain logical consequences, including the familiar a minus times a minus is a plus. If. But how do we know that these rules describe anything at all? What are these negative numbers that we have introduced, and how do we know that it is even possible to introduce negative numbers while preserving the algebraic rules?
You might be tempted to think I'm about to kick up a lot of dust and then complain about how hard it is to see, but when the dust settles the view is wonderful.
So here's the awkward question \(\ldots\)

What is a negative number anyway?

Negative numbers are there to be the solutions to equations of the form \[ m+x = n \] when \(m>n\). In particular, \(-n\) is the solution to \(n+x=0\), if this makes sense.
So, let's make sense of it.
We're going to do this a bit indirectly, thinking about pairs of integers \((m,n)\). Secretly we want to think of this pair as \(m-n\), but we don't yet have a meaning for this when \(m \lt n\).
So we're going to lump together certain pairs.
We say that the pairs \((m,n)\) and \((p,q)\) are equivalent if \(m+q=n+p\). A collection of pairs which are all equivalent to each other is called an equivalence class.
This is actually quite a familiar idea: it's analogous to thinking of \(3/4\) and \(6/8\) as different ways of representing the same fraction. There are infinitely many pairs of integers \((m,n)\) which we can use to do it, namely all the ones satisfying \(3n=4m\). We don't (often) talk explicitly about a fraction being an equivalence class of pairs of integers, but that's what's really going on.
We're going to do the same trick, but with pairs where it's the difference than matters, not the ratio.
We need to do three things now:
  1. Define addition on these equivalence classes of pairs.
  2. Define multiplication on these equivalence classes of pairs.
  3. Show how we can think of these equivalence classes as the familiar integers.


This is the easy one.
We define addition on pairs by \[ (m,n)+(p,q)=(m+p,n+q) \]
But we're not done yet, because the collection that the answer lies in should only depend on the collections of the original pairs, not on the specific pairs. We can check this.
So suppose that \((m,n)\) and \((M,N)\) are equivalent to each other, as are \((p,q)\) and \((P,Q)\). Then we need to check that \((m+p,n+q)\) is equivalent to \((M+P,N+Q)\).
First, we know that \[ m+N=n+M \] and \[ p+Q=q+P \] so adding these together we have \[ m+N+p+Q=n+M+q+P \] i.e. \[ (m+p)+(N+Q)=(n+q)+(M+P) \] which is exactly what we need.
So this operation of addition works in the sense that the equivalence class of the sum depends only on the equivalence classes of the pairs we add.
Multiplication is a bit tougher.


This is not so obvious, so let's cheat a little. We are thinking of \((m,n)\) and \(m-n\), and \((p,q)\) as \(p-q\). Then the product ought to be \[ \begin{split} (m-n)(p-q) &= mp-mq-np+nq\\ & = (mp+nq)-(mq+np) \end{split} \] So we use the definition \[ (m,n)(p,q)= (mp+nq,mq+np) \] We can quickly check that \[ \begin{split} (p,q)(m,n) &= (pm+qn,qm+pn)\\ &=(mp+nq,mq+np)\\ &= (m,n)(p,q) \end{split} \] so that this is commutative, but now we have to check that the result is also independent of the choice of pairs.
So we consider \((m,n)\), \((p,q)\) and \((P,Q)\) where \((p,q)\) and \((P,Q)\) are equivalent. Then \[ (m,n)(p,q) = (mp+nq,mq+np) \] and \[ (m,n)(P,Q) = (mP+nQ,mQ+nP) \] and we see that \[ \begin{split} mp+nq+mQ+nP &= m(p+Q)+n(q+P)\\ &=m(P+q)+n(Q+p)\\ &=mP+nQ+mq+np \end{split} \] so that the two answers are equivalent.
Nearly there.
In just the same way, if \((m,n)\) is equivalent to \((M,N)\), then \((m,n)(p,q)\) is equivalent to \((M,N)(p,q)\). Putting this together, \((m,n)(p,q)\) is equivalent to \((M,N)(p,q)\), which is equivalent to \((M,N)(P,Q)\), so the equivalence class of the product only depends on the equivalence classes of the pairs we are multiplying.
Now we know that we have a well-defined arithmetic on our (equivalence classes of) pairs. But we still need to see how we can think of this as supplementing the non-negative integers with the negative integers.
We make use of the fact that we can choose any representative we like for each equivalence class. So we use the representative of the form \((n,0)\) for all pairs of the form \((p+n,q)\) where \(n \geq 0\), and the representative of the form \((0,n)\) for the pairs of the form \((p,q+n)\) where \(n \gt 0\).
Using this, we can now see that \((m,0)+(n,0)=(m+n,0)\) and \((m,0)(n,0)=(mn,0)\), so the non-negative integers live inside this strange new object.
We also see that \((n,0)+(0,n)=(n,n)\), which is equivalent to \((0,0)\). If we think of \((n,0)\) representing \(n\), then \((0,n)\) represents \(-n\).
It is then tedious but straightforward to check that all the rules of algebra for the non-negative integers still work. (This is mathematics code for 'I can't be bothered to write it all out, but I don't see any reason why you shouldn't suffer.')
So, now we know that is is, indeed, possible to add negative integers into the mix while preserving the well-known algebraic structure of the non-negative integers, and that the argument up above for why \((-1)(-1)=1\) does actually apply.
Of course, now that we have defined multiplication of our pairs of positive numbers, we can also do this explicitly and see the same answer from an entirely different perspective. Given than \(-1\) is represented by \((0,1)\), we have \[ (0,1)(0,1)=(0+1,0+0)=(1,0) \] and \((1,0)\) represents \(1\) in our model.

What was the point of all that again?

After all this, you might wonder what the point of all that peculiar stuff about equivalence classes was. It just made a complicated picture of the positive and negative numbers, which everybody was happy with anyway. The snag is that you can't always just add a new kind of number into the mix and preserve the algebra that you like, but now we know that it can be done in this particular case of interest.
Maybe the most famous example of this is Hamilton's search for a three-dimensional version of complex numbers. You can start off with the real numbers, then include a new quantity \(i\) which satisfies \(i^2=-1\), and just do algebra with combinations of this \(i\) and the real numbers, and it all works: complex numbers make sense, and we can think of their algebra as a way of adding and multiplying two-dimensional vectors.
Hamilton spent some time trying to add in another quantity like \(i\) so get a three-dimensional version of algebra. In fact, that isn't possible. He eventually realised that it could almost be done in four dimensions, but you had to give up on the multiplication being commutative.
So there are problems with adding the new kind of numbers in and just assuming that the old rules of algebra will continue to hold. But now, after quite a lot of work, we can say with confidence that it is possible to introduce negative integers in a way which is compatible with the algebra of the non-negative ones. This is a Good Thing.

Food for thought

  1. I wrote down a multiplication rule which was motivated by me knowing how the answer should behave. Could I have chosen another rule with difference consequences?
  2. I wrote down a bunch of rules above, which are followed by the non-negative integers. But are there other sets of quantities which satisfy those same rules? What happens if I try this trick on these other sets?

Wednesday, August 2, 2017

Schroedinger's cat: not one thing and the other.

Schroedinger's cat is not alive, is not dead, and is not simultaneously alive and dead. It's more interesting than that.

Schroedinger's Cat

We're all familiar with the Schroedinger's cat (thought) experiment. A cat is placed in a box along with a sample of a deadly poison and a radioactive sample all hooked together in such a way that at a certain time the probability of the sample decaying, and releasing the poison resulting in a dead cat is exactly \(1/2\). The question then is, what is the state of the cat at just that time? The standard understanding of quantum mechanics tells us that if we have a lot of identical boxes of this type and we look inside them all, then half of them will contain a live cat and the other half will contain a dead cat. But what is the state of the cat before the box is opened?

It is quite common to see statements to the effect that the cat is simultaneously dead and alive up until the moment the box is opened and it is observed. But observing the cat forces it into one state or the other, and so we never see a cat simultaneously alive and dead: but if we do the experiment many times, we expect just to see a live cat half of the time and a dead cat the other half.

There are a couple of questions which spring pretty irresistibly to mind. The first is, what on earth does it mean for the cat to be simultaneously dead and alive? And the second is, if it is in fact simultaneously dead and alive, why don't we ever see that?

Decoherence-a digression

Let's address the second question first. One current understanding is that when the interior of the box interacts with the outside world, the state of the interior is forced into a well-defined classical state by a phenonemon known as decoherence. This gives a purely physical explanation for the 'collapse of the state', a central aspect of the Copenhagen interpretation of quantum mechanics, avoiding the earlier speculations about a conscious observer being required, and the consequent problems raised by Wigner's friend.

But this is not an easy question, nor one with a universally accepted answer. The issue has been vigorously debated for many years and will almost certainly continue to be for many to come. I am going to take the coward's way out, and avoid the question entirely. Instead, I will consider the notion of a state and what kind of state describes the cat in the box.

From now on, then, when I talk about 'observation', you can think about 'what I see when I look at the system' or 'the state of the system when decoherence due to interaction with the exterior world results in a classical state', as you prefer.

Quantum states

In the quantum mechanics picture of the world, the state of a system is described by a vector: this contains all the information that there is about the system, and the results of any measurement that can be made. In (fairly) general, the state of a system can be written \[ |\psi\rangle = \sum_i \alpha_i | \psi_i \rangle \] where each of the \(|\psi_i\rangle\) is a vector corresponding to a classical state of the system, such as 'alive' or 'dead' for the cat, or 'decayed' or 'not decayed' for the radioactive sample. The \(|\psi_i\rangle\) have unit length and are orthogonal to each other, and the \(\alpha_i\) are complex numbers whose squared moduli add up to \(1\). Finally, the probability of the system being in the \(i\)th state when observed is \(|\alpha_i|^2\).

The use of the notation \(| \;\; \rangle\) to represent a state-vector is due to Dirac, and is part of a fairly awful pun he was very fond of. If you think of \(|\psi\rangle\) as a column vector, then the corresponding row vector is \(\langle \psi |\). Dirac called the row vector a bra, and the column vector a ket, so that when you take the inner product of two vectors \(|\psi\rangle\) and \(|\phi\rangle\) you get \(\langle \psi | \phi \rangle\), a bra-ket or bracket. I said it was awful.

There's a long story here which I'm not going into. As time passes, the \(\alpha_i\) evolve according to Schroedinger's (differential) equation, which relates the rate of change of the \(\alpha_i\) to the surroundings. The details of how this works are, fortunately, irrelevant here.

In addition, there can be more than one way of expressing the state \(|\psi\rangle\) as a combination of classical states, and the relationship between these expressions is tied to the famous uncertainty principle of Heisenberg. We are doubly fortunate, because we don't need to worry about that here either.

The cat's state

So, what's actually going on inside the box? We have the radioactive sample, which at a certain time is in a quantum state which means that an observation will show decayed or not decayed with equal probability. Such a microscopic system being in a weird quantum state doesn't raise any immediate alarm bells, it's when the state of a macroscopic object (such as a cat, and whether the cat is alive or dead) is weird in this way that we start to worry.

Looking at this again, it seems reasonable that we can identify the two classical states of the radioactive samples as 'decayed' and 'not decayed', and the relevant aspects of the cat's state as 'dead' (if the sample has decayed) and 'alive' (if the sample has not yet decayed). I'll use \(|a\rangle\) to represent the state 'alive' and \(|d\rangle\) to represent 'dead'.

So if the cat is in the state \(|a\rangle\) then the cat will certainly be alive when it is observed; likewise, if it is in the state \(|d\rangle\) then it will certainly be dead when observed. So you can see that it is not true that observation of a quantum system unavoidably changes its state. That depends on the state and the type of observation: the uncertainty principle is much more fundamental than the idea that poking a system to see where it is pushes it about a bit. But that's another story.

But in general the cat is not in either of these states. Instead, it is in some state of the form \[ |\psi\rangle = \alpha |a\rangle + \beta |d\rangle) \] where \(|\alpha|^2+|\beta|^2=1\). In this state, the probability that the observation reveals a live cat is \(|\alpha|^2\), and the probability of a dead cat is \(|\beta|^2\). In the classical thought experiment, we have both of these probabilities equal to \(1/2\), so one possible state would be \[ |\psi\rangle = \frac{1}{\sqrt{2}}(|a \rangle + |d \rangle) \]

So what is the state of the cat here? It is not alive. That would mean it was in the state \(|a\rangle\). It is not dead. That would mean it was in the state \(|d\rangle\). It certainly isn't simultaneously alive and dead, any more than I can be simultaneously in the living room and in the hallway. It also isn't in some kind of intermediate state, such as when I am standing in the doorway with one foot in the living room and one in the hallway. It is in a different kind of state, one without a classical analogue, and this kind of state is called a superposition.

To say that the cat is simultaneously alive and dead is like saying that somebody pointing north-east is simultaneously pointing north and east. They aren't: but the direction they are pointing in has a northern and an eastern component. In the same way, the cat isn't simultaneously alive and dead: but the state it is in has an 'alive' and a 'dead' component.

This actually has enormous consequences. The different components of the state can evolve indepenently, almost as if there are two universes involved, in one of which the cat is alive, and in the other of which it is dead. Developing this idea leads to the many worlds, or Everett-Wheeler model of quantum mechanics, which Sean Caroll describes here.

In any case, if we take quantum mechanics seriously, and the evidence for it it is pretty compelling, then we have to learn to live with and accept (even if we can never be entirely comfortable with) the idea that the state of a system might not be a classically recognized state, but rather a superposition of such states. We even have to accept that the acceptable states may not quite correspond to what we think of as classically acceptable states. They are more general in some respects (we have to allow for superpositions of classical states), but more restricted in others (there is no quantum state for a particle with a simultaneously well-defined position and momentum).

Decoherence again

All this has been about what the quantum state of the (unobserved) cat in the box is. But maybe you aren't any happier with the cat being in a superposition of states than with the cat somehow being in two states at once. There is a possible way of avoiding this, and it is to argue that when the radioactive sample interacts with the rest of the apparatus in the box (including the cat), that already causes decoherence, so at box opening time what we actually have is a box with either a dead cat or a live cat in it: or again, if we did it many times, a collection of boxes of which about half contain live cats and half contain dead one.

It would be easy to give the impression that decoherence is supposed to be the cure for all our problems with why macroscopic systems look classical, rather than quantum. It would also be wrong: I'm not going to go into any detail at all here, but if you've made it this far you can probably get quite a lot from reading (at least substantial chunks of) Schlosshauer's review article.

Sunday, July 9, 2017

The RIGHT way to think about derivatives

At secondary school and then university, we meet a variety of definitions of the derivative, all motivated by the fundamental idea that the derivative is telling us something about the rate of change of some quantity as another varies. The definitions are likely to start with a fairly intuitive one and become more rigorous, incorporating the notion of limit. Unfortunately, as the rigour and degree of precision grow, it is possible for the student to become lost in a thicket of notation, and to lose sight of just what they are studying. However, I claim that if one thinks of the derivative the right way, then one has a fully rigorous concept of derivative which keeps fruitful contact with the intuitive one.

Let's take a look at (some of) these definitions, before settling on the right one.

A menu of definitions

The slope of the tangent to a graph

This is a fairly straightforward idea. You start off with the graph of a function, say (just to be conventional) \(y=f(x)\), and at some chosen value of \(x\) you draw the tangent to the curve. Then the gradient is how fast the height of the graph changes as \(x\) changes, so that sounds good.

At least, it sound good until you start thinking about just what you might mean by 'tangent'. If the curve is a circle, that's pretty unambiguous: you take the line perpendicular to the radius at the point of interest, and we all know that is what is meant by the tangent.

Actually, since I am talking about the tangent to a graph, I've just used the upper half of the circle, given by \(y=\sqrt{1-x^2}\) for \(-1 \leq x \leq 1\).

But what if the graph isn't (part of) a circle?

We could try 'the tangent to \(y=f(x)\) at \((x,f(x))\) is the line that touches the graph at just the point \((x,f(x))\), without crossing it'. That works for circles, but now also includes some other well-known curves, including the parabola and hyperbola.


So we have generalized the notion of tangent usefully, to include some new curves. But what about this?

It looks like what we want, but cuts the graph in more than one place. But somehow that looks all right: the 'bad' intersection is a long way away from the point where we want the tangent. Clearly being a tangent is a local propery, so we can fix this by saying 'the tangent to \(y=f(x)\) at \((x,f(x)\)) is line that touches the graph at just one point in the vicinity of \((x,f(x))\) without crossing it'.

But what about this?

The straight line certainly looks like a tangent, except that it crosses the graph at the point of tangency. There's no obvious way of retaining the notion of 'touching the graph without crossing it' here.

So eventually we accept that this is going to be hard to fix, and that something a little subtler is required.

The chord slope definition

Instead of thinking directly about the tangent to a point on the graph, we think of an approximation to it, and see how that can lead to a notion of tangent.

So although we want the tangent at \((x,f(x))\) we decide to look instead at the straight line joining two points on the graph: \((x,f(x))\) and \((x+h,f(x+h))\), where we insist that \(h \neq 0\).

The gradient is \[ \frac{f(x+h)-f(x)}{(x+h)-x} = \frac{f(x+h)-f(x)}{h} \] and it seems entirely reasonable (because it is) to say that the slope of the tangent is the limiting value of this as \(h\) gets closer and closer to \(0\). So the tangent to \(y=f(x)\) at \((x,f(x))\) is the line through \((x,f(x))\) with this gradient. At first exposure, we might not go into too many details of just what we mean by the limiting value, sticking with examples where no obvious complications arise.

This is a meaningful definition. Using it we can calculate the derivative of a variety of simple functions, such as powers of \(x\) and, with the help of a little geometric ingenuity, the trig functions \(\sin\) and \(\cos\).

It's also a sensible definition, because it really does seem to do the job we want to do. It doesn't make sense to say it's 'correct', because this defines the notion. But it does have the properties we'd hope for, and it's something we can calculate with. It's worth noticing that in the process we have shifted from defining the derivative in terms of the tangent to defining the tangent in terms of the derivative!

I claimt that even so, it isn't the right way to think about the derivative.

The formal limit definition

After spending some time with the chord slope definition, and informally working out some simple limits, it's usually time to give a more rigorous idea of what is going in here.

We then introduce the following incantation: \[ \lim_{x \to a}f(x) = L \] means \[ \forall \epsilon>0, \exists \delta>0 \text{ such that } 0 \lt |x-a| \lt \delta \Rightarrow |f(x)-L| \lt \epsilon \] At first sight, and especially for the novice, this is likely to cause a great deal of fear and trembling. It is a complex statement which makes very precise a simple idea. The idea is that we can make \(f(x)\) as close as we want to \(f(a)\) by making \(x\) as close as we have to to \(a\).

With this in hand, we can tighten up some of what is done with the chord slope idea by writing \[ f'(x)= \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} \] It isn't really anything new, it's just a more precise way of stating the previous idea.

There's also the minor variation that \(f\) has derivative \(f'(x)\) at \(x\) if \[ \lim_{h \to 0} \left| \frac{f(x+h)-f(x)}{h} - f'(x) \right| = 0 \]

Working with these, including showing that they are equivalent, gives some practice in understanding the formal statement of what is meant by a limit, and how that formal definition is used. But I also claim that this is not the right way to think about the derivative.

Best Linear Approximation

Once the derivative has been defined and calculated in some tractable cases, various theorems about derivatives are presented. Here's one that gets used a lot. \[ f(x+h) \approx f(x)+hf'(x) \] or rather, with some foresight, \[ f(x+h)-f(x) \approx f'(x)h \] Now, as it stands, this says less than you might think. As long as the function \(f\) is continuous at \(x\), then for very small \(h\), \(f(x)\) and \(f(x+h)\) are very close together, and \(f'(x)h\) is also very small, so the the approximation gets better and better as \(h\) approaches \(0\), no matter what value we use for \(f'(x)\). But this is just saying that both sides are approaching \(0\), and this obviously isn't all we mean by that approximate formula.

Let's tease it out by looking a little more carefully. Instead of having an approximate equality, we will have an exact equality which includes an error term. \[ f(x+h) - f(x) = f'(x)h+e(h) \] where \(e(h)\) is the error, which in general will depend on the size of \(h\). Now, it turns out that if \(f\) is differentiable at \(x\), with derivative \(f'(x)\), then as \(h\) gets closer and closer to \(0\), not only does \(e(h)\) get arbitrarily close to \(0\), but \(e(h)/h\) also gets arbitrarily small. When this happens I say that \(e(h)\) is suitably small.

In other words, for very small \(h\), the error is a very small fraction of \(h\), and we can make that fractional error as small as we want by choosing \(h\) small enough.

It's this behaviour of the error term that makes it all work. If we choose any other value, say \(K\) instead of \(f'(x)\) and try to use that instead, we have \[ f(x+h) - f(x) = Kh + E(h) \] where \(E(h)=(f'(x)-K)h +e(h)\). But for very small \(h\), we see that \[ \frac{E(h)}{h}=f'(x)-K + \frac{e(h)}{h} \] and we cannot make this small by making \(h\) small. So the special property of \(f'(x)\) is that \(f'(x)h\) is the best linear approximation to \(f(x+h)-f(x)\).

To make contact with the usual limit definition, we note that there is a best linear approximation if and only if the error term is suitably small.

I claim that this is the best way to think about the derivative.

What's so great about it?

The first thing to say is, it doesn't make it any easier to calculate a derivative. That's still just the same as it always was.

What's so great about it is that it gives insight into how derivatives behave. This is not to say that it makes the proofs of behaviour easier: but it does make the results easier to see and understand.

Differentiation rules

We learn the various rules for differentiating combinations of functions in our first calculus course: linearity, the product, quotient and chain rules. Let's see how this point of view does with them.

In each case, we'll think about two functions \(f\) and \(g\), and use \(e_f\) and \(e_g\) for the corresponding errors.


Suppose we have two real numbers \(\alpha\) and \(\beta\). Then \[ \begin{split} (\alpha f+\beta g)(x+h)-(\alpha f+\beta g)(x) &= \alpha f(x+h)+ \beta g(x+h) - \alpha f(x)-\beta g(x)\\ &=\alpha(f(x+h)-f(x))+\beta(g(x+h)-g(x))\\ &= \alpha (f'(x)h+e_f(h))+\beta(g'(x)h+e_g(h)\\ &=(\alpha f'(x)+\beta g'(x))h +\alpha e_f(h)+\beta e_g(h)\\ &=(\alpha f'(x)+\beta g'(x))h + e(h) \end{split} \] where \(e(h)\) is obviously suitable small.

So differentiation is linear.

Product rule

\[ \begin{split} (fg)(x+h)-(fg)(h) &= f(x+h)g(x+h)-f(x)g(x)\\ &= (f(x)+f'(x)h+e_f(x))g(x+g'(x)h+e_g(x))-f(x)g(x)\\ &= (f'(x)g(x)+f(x)g'(x))h +f'(x)he_g(x)+g'(x)he_f(x)\\ &= (f'(x)g(x)+f(x)g'(x))h + e(h) \end{split} \] where \(e(h)\) is still fairly obviously suitably small.

Quotient rule

Actually, I'll cheat slightly. Assuming that \(f(x) \neq 0\), we have \[ \begin{split} \frac{1}{f(x+h)} - \frac{1}{f(x)} &= \frac{1}{f(x)+f'(x)h+e_f(h}-\frac{1}{f(x)}\\ &=\frac{-f'(x)h-e_f(h)}{f(x)(f(x)+f'(x)h+e_f(h))}\\ &= -\frac{f'(x)}{f(x)(f(x)+f'(x)h+e_f(h))} - \frac{e_f(h)}{f(x)(f(x)+f'(x)h+e_f(h))} \end{split} \] which is less obviously, but quite plausibly \[ -\frac{f'(x)}{f(x)^2}h + e(h) \] where \(e(h)\) is suitably small.

Then the quotient rule is just the product rule combined with this one.

Chain rule

And finally, we have \[ \begin{split} f(g(x+h))-f(g(x)) &= f(g(x)+g'(x)h+e_g(h))-f(g(x))\\ &= f(g(x))+f'(g(x))(g'(x)h+e_g(h)) -f(g(x))\\ &= f'(g(x))g'(x)h+e(h)) \end{split} \] where again it is not immediately obvious, but is quite plausible that \(e(h)\) is suitably small.

Insight, not proof

So we can see that in each case, assuming that the behaviour of error terms is not unreasonable, this idea of the derivative as the best linear approximation to the change in the function value gives us insight into why the various rules for differentiation work as they do.

But there's more

This works very well. The mental picture of a best linear approximation is much more meaningful than that of some limiting value, and the arguments involving it are rather more intuitive than those involving the usual limit definition.

But, as is often the way with a good idea, we get more than we seem to have paid for.

Functions of several variables

Suppose that now instead of having a function \(f:\mathbb{R} \to \mathbb{R}\) we have \(f:\mathbb{R}^n \to \mathbb {R}\). The picture is much the same except that \(x\) and \(h\) are now vectors with \(n\) components, and I will adopt the standard convention that they are represented as column vectors. So we can still say that \(f\) is differentiable at \(x\) with derivative \(f'(x)\) if this \(f'(x)\) has the property that \[ f(x+h)-f(x)=f'(x)h +e(h) \] where \(e(h)\) is suitably small.

But now we can think about what sort of thing this \(f'(x)\) actually is. \(h\) is a vector with \(n\) components, and so \(f'(x)\) must be a linear mapping from the space of these vectors to \(\mathbb{R}\), in other words it must be a row vector, or covector, with \(n\) components.

This is where we first see why I wrote \(f'(x)h\) rather than \(hf'(x)\) previously. When \(f\) was a real valued function of a real variable, it made no difference, but now the order is significant, and is determined by the requirement that we end up with a scalar quantity.

The next obvious question is, what are these components of \(f'(x)\)? We can answer this by choosing \(h\) carefully. To pick out components of vectors, I'll use a superscript for a component of a column vector and a subscript for a component of a row vector. Then we have \[ f(x+h)-f(x) = f'(x)h +e(h) = f'(x)_i h^i +e(h) \] which tells us that \[ f'(x)_i = \frac{\partial f}{\partial x^i}(x) \]

So our definition of 'derivative' naturally produces the object we learn to call the gradient of \(f\) in vector calculus (and maybe gives a little more explanation for the name).

In fact, it tells us more. It tells us that partial derivatives are really components of the best linear approximation, i.e. the actual derivative of the function. The partial derivatives are not best thought of as objects which we define as ordinary derivatives 'holding all but one variable constant' but as components of a (row) vector naturally associated with the function.

Again, thinking of the derivative as the best linear approximation gives us a deeper insight into a chunk of familiar calculus. As a bonus, it also gives us our first glimpse of the fact that the derivative of a function of this type is not a vector, but is a covector. This is a distinction that grows in importance as one generalizes from functions on \(\mathbb{R}^n\) to functions on a surface in \(\mathbb{R}^n\), and finally to functions on a manifold which may or may not be equipped with a (semi-)Riemannian metric. It is a gift which keeps on giving.

Tangent (velocity) vector

In the last example, we looked at functions which associate a value to each point of space. This time, we'll do the opposite, and think about functions which associate a point in space to each value of some parameter which we can think of as time. So we have functions of the form \(f:\mathbb{R} \to \mathbb{R}^n\), and use \(t\) for the argument of the function.

We can now play the same game and see where it takes us. \(f\) is differentiable with derivative \(f'(t)\) if \[ f(t+h)-f(t) = f'(t) h + e(h) \] where \(e(h\) is suitably small.

This time, we see that \(h\) is just a number, and so \(f'(t)\) really is a vector with \(n\) components. And by looking at the components of \(f(t)\), we see that \[ f'(t)^i = \frac{d f^i}{dt}(t) \] and so the derivative of \(f\) really is the velocity of a point whose position is given by as a function of \(t\) by \(f(t)\).

It's useful to note that the character of \(f'\), i.e. whether it is a scalar quantity, or a vector, or a covector, is entirely determined by the dimensions of the domain and codomain of \(f\). We don't choose whether to make a vector or a covector out of the derivatives of \(f\); that is decided by the fact that \(f'\) is the best linear approximation.

The whole hog

Let's combine the two ideas up above, and consider \(f:\mathbb{R}^n \to \mathbb{R}^m\), so now \(f(x)\) is a vector with \(m\) components and \(x\) is a vector with \(n\) components. As always, \(f\) is differentiable with derivative \(f'(x)\) if \[ f(x+h)-f(x) = f'(x)h + e(h) \] where \(e(h)\) is suitably small.

Now, \(f'(x)\) is a linear transformation which takes a vector with \(n\) components and produces a vector with \(m\) components. In other words, it is an \(m \times n\) matrix. Using my previous notational device of labelling rows with superscripts and columns with subscripts, we now have \[ f'(x)^i_j = \frac{\partial f^i}{\partial x^j} \] and the hidden agenda behind the placement of the indices now becomes slightly exposed. (But trust me, there is plenty still lurking behind the tapestry.)

So the power of thinking of the derivative as the best linear approximation should now be rather clearer: we can't always use the chord-slope definition, though we can generally pick an aspect of the set-up where it can be used to define, say, partial derivatives. But the best linear approximation makes sense in great generality, and the familiar objects built out of partial or component derivatives can now be seen as the natural objects, and the partial or component derivatives as analogous to the components of vectors.


In general, we can't multiply, divide, or compose functions. But whenever we can, the rules for differentiating combinations work in just the same way. In fact, if you revisit the (plausibility) argument for each rule, you can see that all that matters is that the algebraic operation makes sense. Although they were written with functions of the form \(\mathbb{R} \to \mathbb{R}\) in mind, in fact that is irrelevant.

Probably the most powerful aspect of this comes from the chain rule. It tells us that the derivative of the composition is the product (in general, a matrix product) of the derivatives of the functions. A beautiful consequenc is that if \(f,g:\mathbb{R}^n \to \mathbb{R}^n\) are inverses of each other, so are the derivatives. So the partial derivatives of \(f\) are not the inverses of the partial derivatives of \(g\) (a standard error of students in advanced calculus courses), but the matrices built out of the partial derivatives are in inverse relation to one another.

Algebra for and from calculus

So we are just beginning to see that what we know about linear algebra (vector spaces, dual vector spaces, linear transformations) is relevant to calculus. Since the derivative is the best linear approximation, and derivatives of combinations of functions are linear algebraic combinations of the derivatives, we can use the apparatus of linear algebra to tell us about the (local) behaviour of non-linear functions.

In some ways, this is the entire point of differential calculus: it is a tool for turning the difficult analysis of nonlinear functions into the relatively easy analysis of linear ones. The identification of the derivative as the best linear transformation is what makes this all work.

This all extends to even more general situations. The spaces involved don't actually have to be finite dimensional, as long as there is a way of talking about the size of the vectors.

It leads on to, for example, ways of considering fluid flow, where the velocity field of the fluid can be thought of as the tangent velocity to a curve in an infinite dimensional space of fluid configurations, and of thinking about the Euler-Lagrange equations in the calculus of variations as the derivative of a functional, to name just two.

Simply the best?

This gives some of the reasons I think that the right way to think of the derivative is as the best linear approximation. It both gives insight into what the rules for derivatives of combinations of functions are actually saying, and makes the various generalizations easier to understand. It lets us see how we can use linear algebra to understand aspects of the behaviour of nonlinear functions. It generalizes still further, in ways that I have just hinted at.

But it isn't the right way for everybody. I don't think it's the right way for it to be introduced to novices, or indeed for novices to try to think about it. It really makes sense best when you already have a bunch of different-looking bits of differentiation at hand, which seem to be myusteriously linked together, when it provides a unifying framework to remove the mystery. Then it lets things drop into place together, and you can see how they're all really different aspects of the same thing. But you do have to go through a certain amount of getting to grips with the more traditional approaches in order to have a mental place for this to live.

Wednesday, June 28, 2017

Finding greatest common divisors and all that.

Greatest Common Divisor

The problem is simple: given two positive integers, \(a\) and \(b\), how do we find \(\gcd(a,b)\), the greatest common divisor of \(a\) and \(b\) (i.e. the largest integer that \(a\) and \(b\) are both multiples of)?

Just about everybody learns how to do this at school. You break each of \(a\) and \(b\) down into a product of prime factors (where a prime number is one with exactly two factors, itself and \(1\)) and then multiply together all the common prime factors to get the biggest number that each is a multiple of. But this actually relies on quite a deep result. The procedure works because The Fundamental Theorem of Arithmetic says that any positive integer can be expressed as a product of primes, and it can only be done in one way (except for the order of the factors).

So, there are two issues here.

The first is in plain sight. It is the question of why you should believe the fundamental theorem of arithmetic. This might not seem like an issue. In fact, after a lot of experience in factorizing integers you may well feel that it is pretty obvious. But it isn't, and part of the intention of this post is to persuade you that it isn't.

The other is hidden, but comes into view when the numbers \(a\) and \(b\) get larger. It is the problem that working out the prime factorization of a number is really quite hard when the number is large.

There is, fortunately, an answer for both of these: Euclid's algorithm gives us a different way to compute the greatest common divisor of two integers, and, as a bonus, we can use it to prove that every integer has a unique prime factorization. This algorithm is to be found in Euclid's Elements, and is probably the oldest non-trivial algorithm in significant current use, being a vital ingredient in the RSA public key cryptosystem.

Euclid's Algorithm

The algorithm relies on a familiar fact from basic arithmetic, which goes by the appallingly inappropriate name of the division algorithm. It isn't an algorithm: an algorithm is a set of step-by-step instructions which tell you how to do something. This just tells you that a certain type of question has a particular type of answer. A couple of sources call it the division theorem because of this, but, alas, the terminology is very well established. I would rather be right than popular, though, so I will call it the division theorem.

But enough whining. What does this result say?

The Division Theorem If \(a\) and \(b\) are positive integers, then there is a unique positive integer \(q\), and a unique non-negative integer \(r \lt b\) such that \(a=qb+r\).

In other words, if you divide \(a\) by \(b\) you get a quotient and remainder, and there's only one right answer. This is something so entrenched in our experience that we take it very much for granted, and that's just what I'm going to do. But it has consequences.

If the remainder, \(r\), is zero, then \(a\) is a multiple of \(b\): we also say that \(b\) divides \(a\).

Back to the original question: I have two positive integers, \(a\) and \(b\), and I want to know the biggest number that divides into both of them. Let's suppose that \(a \gt b\). (If not, I can swap them over). Now, by the division theorem, I know that I can write \(a\) as \(qb+r\), where \(0\le r \lt b\). What's more, if \(d\) is any number that divides \(a\) and \(b\), since \(a-qb=r\), \(d\) divides \(r\). And since \(a=qb+r\), if \(d\) divides into \(b\) and \(r\), it also divides \(a\). That means that \(b\) and \(r\) have exactly the same common divisors as \(a\) and \(b\).

But something really useful has happened. I've replaced the problem of finding the greatest common divisor of \(a\) and \(b\) by finding the greatest common divisor of \(b\) and \(r\), and \(r\) is smaller than \(b\), which is smaller than \(a\). I've made the problem easier.

There are two possibilities: if \(r=0\), then \(b\) divides into \(a\), so it is the greatest common divisor of \(a\) and \(b\). If not, I can repeat the trick. Eventually the remainder will be \(0\), and I will have the greatest common divisor of my original \(a\) and \(b\).

This procedure is the essence of Euclid's algorithm. Let's see how it works with an example.

What is \(\gcd(60,25)\)? We calculate as follows: \[ \begin{split} 60 &= 2 \times 25 + 10\\ 25 &= 2 \times 10 + 5\\ 10 &= 2 \times 5 \end{split} \] This tells us that \(\gcd(60,25)=5\). But we get something not at all obvious from it. We can rewind the calculation like this: \[ \begin{split} 5 &= 25 - 2 \times 10\\ &= 25-2 \times (60-2 \times 25)\\ &= 5 \times 25 - 2 \times 60 \end{split} \] This tells us that we can express \(5\) as a combination of \(60\) and \(25\). But there's nothing special about these numbers: I could have rewound any appliation of Euclid's algorithm in just the same way.

So, given any \(a\) and \(b\), we can use Euclid's algorithm to find \(\gcd(a,b)\), and also to find integers \(m,n\) with the property that \(\gcd(a,b)=ma+nb\). This wasn't at all obvious how to do from the prime factorization approach. In fact, with the prime factorization approach it would never have occurred to me that this could even be done.

Thinking about this approach to finding the gcd of two numbers, we can see that \(\gcd(a,b)\) isn't just the biggest number that divides into both \(a\) and \(b\): it is actually a multiple of any other divisor. The reasoning is just what was used above: any divisor of \(a\) and \(b\) is also a divisor of \(r\), the remainder. Repeating this, we eventually find that it must be a divisor of the last non-zero remainder, i.e. it must be a divisor of \(\gcd(a,b)\).

You might wonder who cares? I think there are two immediate reasons to care, namely that this addresses both of the issues I raised earlier. The first reason is that you don't have to take it on trust that prime factorization works as advertised, since you can see just why this works. The second is more practical. It is that for large numbers, this algorithm is very fast: enormously fast compared to the best known algorithms for factorizing large numbers, which is important. It is the fact that factorizing large integers is (as far as we know) expensive while finding the greatest common divisor is cheap that makes RSA public key cryptography work.

There are other, less immediate reasons too. Using this, we can actually prove that prime factorization works as advertised. But before getting there, we need to see something useful that this tells us about prime numbers.

Prime numbers

Suppose that \(p\) is a prime number, and \(a\) is any number which is not a multiple of \(p\). Then we must have \(\gcd(a,p)=1\), which tells us that there are integers \(m,n\) such that \(ma+np=1\). Now comes the clever bit.

Let \(p\) be a prime number, and suppose that \(a\) and \(b\) are integers such that \(ab\) is a multiple of \(p\). Then either \(a\) or \(b\) must be a multiple of \(p\).

"But that's obvious!" you may be tempted to object. But if you think it's obvious, think about why: it's almost certainly because you're so accustomed to how prime factorization works. Unfortunately that's exactly the thing that I want to prove.

So, let's suppose that \(ab\) is a multiple of \(p\). If \(a\) is a multiple of \(p\), we're done. But what if \(a\) is not a multiple of \(p\)? Well, in that case \(\gcd(a,p)=1\), so there are integers \(m,n\) such that \(ma+np=1\). Now multiply this equation by \(b\). We get \[ b=mab+npb. \] But \(ab\) is a multiple of \(p\) and \(npb\) is a multiple of \(p\), so their sum - which is \(b\) - must also be a multiple of \(p\). That's what we needed.

Now we have everything we need to prove the

Fundamental Theorem of Arithmetic

At last, the proof

We can now consider an integer, \(n\). There are two possibilities: either \(n\) is a prime, or \(n\) is the product of two smaller factors. Then we think about each of them in turn. Eventually, we must reach a point where each of our factors is a prime.

But this is only part of the problem, and the easy part at that. How do we know that there is only one way to express \(n\) as a product of primes?

This is where we use that last fact. Suppose we have two prime factorizations of \(n\), say \[ n=p_1p_2 \ldots p_k = q_1q_2 \ldots q_l \] Then \(p_1\) is a factor of \(q_1q_2 \ldots q_l\). If \(p_1\) is not a factor of \(q_1\) (in which case \(p_1=q_1\)), then \(p_1\) is a factor of \(q_2\ldots q_l\), and by repeating this argument we find that \(p_1\) must be one of the \(q_i\). Then divide \(n\) by \(p_1\) and repeat the argument. Eventually we find that each of the prime factors in either factorization must also be a factor in the other factorization, so the two are actually the same (except possibly in a different order).

What is a prime number again?

I rushed past this idea up at the top, with the brief statement that a prime number is one with exactly two factors, itself and \(1\). This isn't quite satisfactory if we allow negative numbers, so let's be a little more general than that, and try to give a definition which will work in more generality.

We say that a number is a unit if it has a multiplicative inverse, and then that a factorization is proper if no factor is a unit. Then a prime is a number with no proper factorization. So this works when we allow negative numbers, since although we can write \(3=1\times 3 = -1 \times -3\), in each case one of the factors is a unit.

But this property wasn't really what we used to prove the fundamental theorem of arithmetic. We used this to show that any integer has a prime factorization, but not to show that it was unique. For that part of the argument, the important fact was that if a product is a multiple of a prime, then at least one of the factors in the product must be a multiple of the prime.

In fact there two ways of characterizing prime numbers.

One characterization is that \(p\) is prime if it has no proper factorization.

The other is that \(p\) is prime if, whenever \(ab\) is a multiple of \(p\) then at least one of \(a\) and \(b\) must be a multiple of \(p\). We saw above that every prime has this property: we just have to make sure that any number with this property is prime. But if \(n\) has a proper factorization into \(ab\) then \(n\) divides into \(ab\), but since \(n\) is larger than both \(a\) and \(b\) neither is a multiple of \(n\), so no number with a proper factorization can satisfy this property.

But it doesn't have to be like that.

Another number system

The integers aren't the only number system we could work with. For example, we might consider the Gaussian integers \(\mathbb{Z}[i]\), i.e. the complex numbers whose real and imaginar parts are both integers. More interestingly, we could, and we will, do something rather more exotic. We consider numbers of the form \(m+n\sqrt{-5}\), where \(m\) and \(n\) are both integers, and call this collection \(\mathbb{Z}[\sqrt{-5}]\). Note that as for the integers, the units here are just \(\pm 1\).

We now get something happening which cannot happen with the integers.

First, notice that \(|m+n\sqrt{-5}| = \sqrt{m^2+5n^2}\), and since these objects are complex numbers, if \(M,N \in \mathbb{Z}[\sqrt{-5}]\), \(|MN|=|M||N|\). Then in this number system, just as in \(\mathbb{Z}\), neither \(2\) nor \(3\) has a proper factorization; and neither do \(1 \pm \sqrt{-5}\). You can check this in each by writing down all the numbers with modulus less than the target value, and checking that you can't get the target by multiplying any collection of them together (unless one is a unit). But \[ 2 \times 3 = (1+\sqrt{-5})(1-\sqrt{-5}) = 6. \] So in this number system we see that \(1+\sqrt{-5}\) is a factor of \(2 \times 3\) but it is not a factor or either \(2\) or \(3\).

So in this case we do not get a unique factorization. It can't really be obvious that there is only one way to factorize a number into a product of terms which can't be factorized any further, because there are number systems where it doesn't happen!

Prime versus irreducible

In this more general context, it is necessary to split up our two notions of prime, because they no longer coincide. The standard way to do this is to call a number with no proper factorization an irreducible number, and to define a prime to be a number which has the property that if it is a factor of \(ab\) then it must be a factor of at least one of \(a\) or \(b\). (This is why on your first encounter with abstract algebra, the definition of prime can seem to have nothing to do with the familiar prime numbers.)

Then any prime number must be irreducible, but a number can be irreducible without being prime. For each number to have a unique prime factorization, it must also be the case that every irreducible number is prime.

Moving on

This may have already been more than enough for you. But if not, there are lots of extenstions to play with here.

We could play the game again with the Gaussian integers, and it turns out that again there is a unique factorization into irreducibles. And we could consider variations, such as numbers of the form \(m+n\sqrt{-k}\) where \(k\) is a positive integer. How does the choice of \(k\) affect the behaviour? We already know that when \(k=5\), unique factorization fails. But is this typical, or does it happen for other vaues? Is there a property of \(k\) that we can calculate that tells us?

We could try with polynomials with real coefficients: it turns out that division of polynomials is similar enough to division of integers that we have a division theorem and a Euclidean algorithm, that prime and irreducible coincide, and that any polynomial can be expressed uniquely as a product of primes (which here means linear factors or quadratics with no roots). We could ask how things change if we take different types of coefficients, say complex, or rational, or integer.

These two alone give a large toy box to play with and explore these ideas, but they're only a start. Explore, investigate, make up your own variations. If all else fails, look in a book on abstract algebra or ask the internet.

Calculating gcd redux

So, after all that, how should you actually calculate the gcd of two integers?

The answer is, as it almost always is, "it depends". If the numbers are small enough to be factorized easily, using the prime factorization is probably the quicker of the two. If they are large enough that factorization is a pain, then Euclid's algorithm will do the job with less pain.

And I think it's worth calculating a few both ways just to see how the same result is arrived at by such very different routes.

Tuesday, June 20, 2017

When i grows up...

...\(i\) wants to be a quaternion.

We can think of the complex numbers as two dimensional vectors, with a rule for multiplication. This multiplication rule is actually very well behaved, and for some time mathematicians tried and failed to find a similarly well behaved multiplication rule for three dimensional vectors. In fact the failure was not due to any lack of ability: it's impossible to find such a well-behaved multiplication rule for these vectors. Then William Rowan Hamilton had a flash of inspiration which gave him the insight that although a multiplication rule couldn't be found for three dimensional vectors, it could be done for four dimensional vectors. He was so excited by the realization that he carved the basic algebraic identities into the stonework of the bridge he was crossing at that time. The structure resulting from the multiplication rule he discovered is the quaternions.

Hamilton then devoted an immense amount of time and effort developing the mathematics of these quaternions and their physical applications. Unfortunately, he was far less effective a communicator of mathematics then he was an inventor (or discoverer) and quaternions were not adopted by the mathematical community.

So what is so nice about the complex number multiplication, what are the quaternions, and how well do the nice properties of the complex numbers generalize to them?

Geometry of complex multiplication

Multiplication of complex numbers is very rich geometrically. Probably the simplest observation we can make has to do with multiplication by \(i\). If \(z=a+ib\), so that \(z\) corresponds to the point \((a,b)\) in the plane, then \(iz=-b+ia\), which corresponds to the point \((-b,a)\): so multiplication by \(i\) has rotated this point by a right angle about the origin.

This is pretty, and suggests that a closer look is warranted, so let's take one.

The complex conjugate of \(z\) is \(\overline{z}=a-ib\), and the modulus of \(z\) is \[ |z| = \sqrt{z\overline{z}} = \sqrt{a^2+b^2}. \] Thinking of \(z\) as the point \((a,b)\), \(|z|\) is the distance to the origin, so it tells us the size of \(z\).

We also see that \(1/z=\overline{z}/|z|^2\) so that every complex number apart from zero has a multiplicative inverse. An excellent start. But complex multiplication has a more remarkable property still.

If \(z=a+ib\) and \(w=c+id\), then \[ \begin{split} |zw|^2&=|(ac-bd)+i(ad+bc)|^2\\ &=(ac-bd)^2+(ad+bc)^2\\ &=(ac)^2-2abcd+(bd)^2+(ad)^2+2abcd+(bc)^2\\ &=a^2c^2+b^2d^2+a^2d^2+b^2c^2\\ &=(a^2+b^2)(c^2+d^2)\\ &=|z|^2|w|^2 \end{split} \] so that \(|zw|=|z||w|\), so the size of the product is the product of the sizes.

A special collection of complex numbers are the ones satisfying \(|z|=1\), so they form the unit circle in the complex plane. The point on the circle with polar coordinates \((1,\theta)\) has Cartesian coordinates \((\cos(\theta),\sin(\theta)\)), and expressed as a complex number is \(\cos(\theta)+i\sin(\theta)\).

We can do a little more with these special complex numbers.

If \(z\) and \(w\) both have modulus \(1\), then so do \(1/z=\overline{z}\) and \(zw\), so the complex numbers of unit modulus are closed under multiplication and multiplicative inverse: when a collection of quantities has this properties we can it a group, and this particular group goes by the name \(U(1)\).

We can put this together with in a very pretty way.

If \(z,w \in U(1)\) with \(z=\cos(\theta)+i\sin(\theta)\) and \(w= \cos(\phi)+i\sin(\phi)\), the \(zw=\cos(\theta+\phi)+i\sin(\theta+\phi)\). Depending on how you define things, this either follows from or gives a proof of the standard trigonometry addition formulae.

There's also a useful special case, already mentioned above. If \(|z|=1\), so \(z=\cos(\theta)+i\sin(\theta)\), then \(iz=-\sin(\theta)+i\cos(\theta)\), which associates with each point on the unit circle a unit vector pointing tangent to the circle in the anti-clockwise direction.

Three dimensional vectors

This is all so neat and elegant that a great deal of effort was invested into trying to do the same thing with three component vectors. However, it couldn't be done. There's no way of multiplying three dimensional vectors to give a three dimensional vector with all these nice properties.

All is not actually lost: it is possible to multiply three vectors in a useful way, but you have to be prepared to extend what you mean by multiplication. The result is geometric algebra, and it has many attractions. I'll come back to this later on, but for the moment let's just accept that you just can't define a nice multiplication on three dimensional vectors, and go on to see how you can do it with four dimensional vectors.

In fact we will see that just about everything described above for the complex numbers has an analogue in the quaternions. (Though sometimes in a surprising way.)


So, what is a quaternion?

One way of describing them is that a quaternion, \(q\), is a sum of the form \[ q=a+bi+cj+dk \] where \(i,j\) and \(k\) are a new kind of number, like the usual square root of minus one, but related in a particular way. We have \[ i^2=j^2=k^2 = ijk = -1 \] and then we carry out arithmetic by multiplying out the expressions and using these relationships. It's like complex arithmetic but with more component parts (and a more complicated set of rules for how these different square roots of minus one interact).

Just as with the complex numbers, we can think of these objects as vectors: but now they have four components, a real one and three imaginary ones.

We can also define a quaternionic conjugate, \[ \overline{q} = a-bi-cj-dk \] and then we quickly get \[ q\overline{q}=a^2+b^2+c^2+d^2=|q|^2 \] which gives the modulus, or size, of the quaterion, and \[ q^{-1} = \frac{\overline{q}}{|q|^2} \] is the inverse of \(q\).

There's a lot packed up into that.

The complex numbers are hiding in there

The quaternions contain the complex numbers. This is easy to see, because it's obvious that if you set \(c=0=d\) then what is left is just the arithmetic of complex numbers.

What's slightly less obvious is that the complex numbers sit in there in more than one way. If you set \(b=0=d\), you get exactly the same thing except that the square root of minus one is now \(j\) (which should please the electrical engineers), and if you set \(b=0=c\) it is now \(k\).

In fact, you can think of the quaternions as the result of building complex numbers out of complex numbers: if \(z\) and \(w\) are both complex numbers, than \(z+wj\) is a quaternion, as long as \(i,j\) and \(k\) satisfy the rules up above.

Vector algebra is hiding in there

The rules for multiplying \(i,j\) and \(k\) have some very evocative consequences.

If I multiply \(ijk=-1\) by \(i\) I get \(i^2jk=-i\), so that \(jk=i\), and similarly \(ij=k\), \(ki=j\).

If I square \(ij=k\) I get \(ijij=-1\), so multiplying on the left by \(i\) and on the right by \(j\) I get \(ji=-ij\), and similarly \(ik=-ki\), \(jk=-kj\).

The relations look alot like what happens with the usual unit vectors \(\pmb{i,j,k}\) pointing in the \(x,y,z\) directions in three dimensional vector geometry. The square of each unit vector is like the dot product (except with the wrong sign), and the product of two different ones is just like the cross product.

In fact, this is such an important observation that I'll actually think of \(i,j\) and \(k\) as those unit vectors, and regard the purely imaginary quaterions as vectors in three dimensions, with the rules I gave up above for multiplication.

So, what do I get if I have two purely imaginary quaternions (or two three dimensional vectors) and multiply them together using the quaternion rule? If you haven't seen this before, it really is quite amazing. So let's start with \(q_1=a_1 i+b_1 j+c_1 k\) and \(q_2=a_2 i+b_2 j+c_2 k\). Then \[ \begin{split} q_1 q_2 &= (a_1 i+b_1 j+c_1 k)(a_2 i+b_2 j+c_2 k)\\ &= a_1a_2i^2 + b_1b_2 j^2 + c_1c_2 k^2 + a_1b_2 ij +a_1c_2ik + b_1a_2ji + b_1c_2jk+ c_1a_2 ki + c_1 b_2 kj\\ &= -(a_1b_1+a_2b_2+a_3c_3) +(b_1c_2-c_cb_2)i + (c_1a_2-a_1c_2)j+(a_1b_2-b_1a_2)k \end{split} \] and if we think of \(q_1\) and \(q_2\) as vectors, then \[ q_1 q_2 = -q_1.q_2 + q_1 \times q_2 \]


I haven't mentioned this explicitly, but we do have to be a bit careful. There is a price to pay for extending the multiplication of two dimensional vectors to that of four dimensional vectors, and it is that the resulting multiplication is not commutative, so we have to pay attention to the order that they come in when multiplying: in general, \(q_1q_2 \neq q_2 q_1\).

So, for example, if we have two quaternions \(q\) and \(r\), and we want to solve the equation \[ qs=r \] for the unknown quaternion \(s\), we have to calculate \(q^{-1}r\), not \(r q^{-1}\) which is (usually) different.

But this is OK. It's no worse that the situation with matrices, and we all have to learn to cope with that. In fact, one can represent quaternions as \((4 \times 4)\) matrices with their entries made out of the real and imaginary parts so that the matrix multiplication matches the quaternion multiplication. An interesting challenge is to try to figure out how to do that, and the representation of the complex numbers as \((2 \times 2)\) matrices gives a good starting point.

What have the quaternions ever done for us?

OK, so we can invent a quaternion algebra, and it looks as if it has a fair bit of the usual vector algebra coded into it somehow. But what is it good for?

I think a piece of maths is worthwhile if it can be used to solve problems, or if it gives a new insight into something, or if it's just downright pretty. Quaternions meet all three criteria. In fact Hamilton spent a long time writing a multi-volume treatise on the quaternions and their applications. For good or ill, he didn't succeed in persuading the mainstream community of their utility, though they do have their proponents even now, and are heavily used in a few niche areas.

I am going to take a look at just two aspects of the quaternions. The first is an application, involving using them to represent rotations. The second is a bit of pure geometry, involving the sphere in four dimensions.


The unit complex numbers can be used to describe rotations in the plane, so you might expect, or at least hope, that unit quaternions (i.e quaternions of modulus \(1\)) can be used to describe rotations in space.

They can, and what's more it can be done in a very beautiful way.

You might expect (I certainly did) that it would work like this. Given a point in three dimensional space, with coordinates \((x,y,z)\), I think of this as the purely imaginary quaternion \(X=xi+yj+zk\), then if \(q\) is any quaternion with \(|q|=1\), we see that \(|qX|=|q||X|=|X|\), so this preserves the size of \(X\) and it looks quite promising. But it's quite hard to see just which rotation (if any) corresponds to a given \(q\),and whether this gives all rotations. Fortunately, there's a better way.

First, let's think what determines a rotation: we need an axis of rotation, and an angle of rotation. Let's say that the axis of rotation is given by the vector \(n=n_xi+n_yj+n_zk\) where \(|n|=1\), and the angle of rotation is \(\theta\). Then we build the quaternion \[ q(\theta) = \cos\left(\frac{\theta}{2}\right) + n \sin\left(\frac{\theta}{2}\right) \] Now, here comes the sneaky bit.

Take any point \((x,y,z)\) and represent it by the purely imaginary quaternion \(X=xi+yj+zk\). Then \[ qX\overline{q} \] is the result of rotating \(X\) about \(n\) by the angle \(\theta\).

That is a bold claim. You need a reason to believe it. So write \[ X=\alpha n + \beta m \] where \(m\) is orthogonal to \(n\). (You'll notice that I've completely given up distinguishing between vectors and purely imaginary quaternions.) Then a couple of lines of algebra shows that \[ \begin{split} qX\overline{q} &= \left(\cos\left(\frac{\theta}{2}\right) + n \sin\left(\frac{\theta}{2}\right)\right) (\alpha n + \beta m) \left(\cos\left(\frac{\theta}{2}\right) - n \sin\left(\frac{\theta}{2}\right)\right)\\ &= \alpha n + \beta(m\cos(\theta)+(n \times m)\sin(\theta) \end{split} \] which makes it explicit that the component of \(X\) parallel to \(n\) is unchanged, and the component perpendicular to \(n\) is rotated by the angle \(\theta\) in the plane determined by \(m\) and \(n \times m\), which is the plane perpendicular to \(n\), so \(X\) has been rotated about the axis of rotation \(n\), just as claimed above.

This is unreasonably nice. Not only is there a quaternion corresponding to any rotation, but we have an extremely simple recipe for building it out of the axis and angle of rotation.

There is, of course, more. Not only is this a pretty way to represent rotations, it is a very useful one. If you want to control a robot arm, or model how a three dimensional object moves under the influence of a force, or calculate what the view looks like from a different position then you need an efficient way to represent rotations. Quaternions give us an efficient way to do this, and for this reason are used in robotics, dynamical simulations of systems of molecules and computer graphics.

On top of all this, you can use this quaternion representation to obtain more insight into the structure of \(SO(3)\), the group of three dimensional rotations. I won't develop that idea here, but just leave you with the observation that if \(q\) is a unit quaternion, then \(-q\) gives the same rotation as \(q\), so there's something interesting going on.

I've just given a very brief description of how to get rotations out of the quaternions. For much more, from a rather different perspective, @mathforge gives a development of the quaternions from rotations here, including an implementation.

The 3 sphere

I described how the unit complex numbers give us a way of finding a vector tangent to each point in a circle in the two dimensional plane. We can use the unit quaternions to see a nice bit of geometry in four dimensions. Now, the unit quaternions are defined by the squares of their components adding up to \(1\), so they are the points a distance \(1\) from the origin in four dimensional space. This is an object sometimes called a hypersphere, but which I will call the three-sphere, and denote by \(S^3\).

So if we take a point on \(S^3\), and multiply it by \(i\), what do we get? \[ \begin{split} X&=a+bi+cj+dk\\ iX&=-b+ai-dj+ck \end{split} \] so we see that \(iX.X=-ab+ba-cd+dc=0\). In other words, \(iX\) is orthogonal to \(X\). But that means that \(iX\) is a tangent vector to \(S^3\) at \(X\). (This is just the same as in two and three dimensions: the tangent space to a circle or a sphere at a point is given by all the vectors orthogonal to the position vector at that point.)

This means that by this process we can obtain a unit tangent vector at each point on \(S^3\), which clearly varies continuously over the sphere, since small changes in \(X\) give small changes in \(iX\).

We can now extract a rather indirect way of seeing why we can't have a multiplication on three dimensional vectors which has all the nice properties of complex and quaternionic multiplication. If there were, we could use this to build a vector field on the surface of a sphere in three dimensions: but the hairy ball theorem tells us that this is impossible, so there can't be a multiplication of three vectors which behaves like this.

You can start from here and start asking questions such as: Which surfaces admit a non-vanishing vector field? How many linearly independent vector fields can there be? What kind of things go wrong when there can't be one? This leads into worlds of differential and algebraic topology, which is maybe a step or several too far for this article. Instead, I'll return to a couple of topics raised up above.

But what about...

Geometric algebra

An alternative approach to multiplying vectors is given by geometric algebra. For three-dimensional vectors, this works in the following way. \[ i^2=j^2=k^2=1, \qquad ij=-ji \text{ etc} \] and \(ij,jk,ki,ijk\) are new objects. With this in place, we can multiply vectors together, at the expense of everything taking place in an eight-dimensional space.

On the one hand, this gives a powerful tool for calculating with vectors, as well demonstrated in Jason Merrill's blog post and it is well-behaved in that every vector has a multiplicative inverse. On the other hand, it is not completely well-behaved algebraically. If \(\pmb{n}\) is a unit vector, than \((1+\pmb{n})(1-\pmb{n})=1-1=0\), so non-zero quantities can multiply together to give zero.

Of course, neither geometric algebra nor quaternion algebra is right or wrong. Each gives a useful tool for working with vectors. I find the quaternions more aesthetically satisfying, but if I had to do a calculation I'd use whichever was the most convenient. (Assuming I could tell in advance, of course.)

Higher dimensions

So we've seen how to think of a pair of real numbers as a complex number; and how to think of a pair of complex numbers as a quaternion, though the result is no longer commutative. Can the trick be repeated? Can we take pairs of quaternions and get a multiplication on eight dimensional vectors?

Yes, we can, and the resulting objects are called octonions. This time the price we have to pay is that the multiplication is not associative. This is quite serious, as it means that we can't use matrices to represent these objects. Nevertheless, they give another interesting geometry which has been investigated both for its mathematical structure and in mathematical physics.

To finish with

I have to admit that hand calculation with quaternions can be a fairly tedious business. But you do get quite a lot of power in return, and I think they that we should all have at least a passing acquaintance with them.

There is, of course, a wealth of material available on the internet, and Google can be your friend if you ask it nicely. If you don't already know about it, the Internet Archive is an amazing resource where you can access many out of print texts, including Hamilton's books and many of his research papers. I'll just add one more link, to Pertti Lounesto's Clifford algebras and spinors. This is an exceptionally careful text, which provides a general framework where both quaternionic and geometric algebra live, and also demonstrates many applications to physics.