Thursday 25 May 2017

Ghosts of departed quantities

The problem with infinitesimals

In the early development of the calculus, both Newton and Leibniz made fruitful use of infinitesimal quantities, without really being able to give a satisfactory account of just what they were. Bishop Berkeley famously pointed out the incoherence of the notion, referring to them with the excellent barb the ghosts of departed quantities. Over the next century or so, calculus developed apace, but it wasn't until much later that the likes of Weierstrass managed to sort out the idea of a limit properly, and laid the foundations for the well-known "\(\epsilon\text{-}\delta\)" approach to analysis that generations of mathematics undergraduates have since had to come to grips with. Once a detailed axiomatic presentation of the reals was developed, it was established to everybody's satisfaction that there really was no space for infinitesimal quantities among the real numbers, but that the \(\epsilon\text{-}\delta\) approach gave the appropriate language for understanding everything. That is, until Abraham Robinson found a way to put resurrect the approach in the early 1960s, inventing nonstandard analysis, and providing a framework in which infinitesimal and infinite quantities could be consistently worked with.

What I want to do here is to give at least an idea of how it works, and how it finesses the fact that the real numbers are, indeed, uniquely determined by being a complete, ordered field. I'll try not to say anything outright untrue, though I'll gloss over a huge amount of detail.

So first, let's remember how the real numbers arise when we fill in the gaps between the rational numbers.

We say that a sequence of rational numbers \(q_n\) is a Cauchy sequence if the elements get closer together in a well-controlled manner: in particular, given any tolerance \(\epsilon \gt 0\), there is some \(N\) such that all elements of the sequence \(q_n\) for \(n>N\) are closer together than \(\epsilon\). For some such sequences, there will be a rational limit, but for others there won't be. So we decide that if a sequence looks as if it ought to converge but has no rational limit, it does converge after all, but to something which lives in the gaps between the rational numbers: an irrational number. If we also decide that two rational sequences \(q_n\) and \(Q_n\) converge to the same limit iff \(q_n-Q_n\) converges to \(0\), then we can think of \(q_n\) and \(Q_n\) as different representations of the same irrational number. This lets us fill in the gaps in a way which fits in nicely with the arithmetic of the rational numbers.

The trick is to use sequences of the numbers we already have in order to define some new ones. A good trick can be used again, and this is no exception. We will make our extended real numbers, the nonstandard reals, from sequences of real numbers.

We can't use quite the same trick, though. Part of what the completion of the rational numbers to the real numbers does is to give us a number system where trying to apply exactly the same trick again doesn't give anything new: any Cauchy sequence of real numbers already converges to a real number. So we have to adapt the idea in a new way.

Constructing the nonstandard reals

The idea is to think of a sequence that tends to \(0\) as somehow representing an infinitesimal number, and a sequence that grows without bound as somehow representing an infinite number. The real trick is in the criterion that tells us when two sequences have the same limit, and so can be thought of as representing the same quantity.

The missing ingredient is a way of assigning a measure to any set of positive integers. We call it \(m\), and require it to have the following properties:
  1. \(m(\mathbb{N})=1\).
  2. If \(K \subset \mathbb{N}\) is finite, then \(m(K)=0\).
  3. If \(K \subseteq \mathbb{N}\) then \(m(K)=0\) or \(m(K)=1\).
  4. If \(K,L \subset \mathbb{N}\) such that \(K\cap L = \emptyset\) then \(m(K \cup L)=m(K)+m(L)\).
Then we say that any statement which holds for a set of integers of measure \(1\) holds almost everywhere. Using this, we can treat any sequence of real numbers \(a_n\) as having (more exactly, defining) a limit. If \(a_n\) has limit \(\alpha\) and \(b_n\) has limit \(\beta\), then
  • \(\alpha = \beta\) if \(a_n=b_n\) almost everywhere.
  • \(\alpha \lt \beta\) if \(a_n \lt b_n\) almost everywhere.
  • \(\alpha \gt \beta\) if \(a_n \gt b_n\) almost everywhere.
It follows from the above requirements on \(m\) that for any two sequences \(a_n\) and \(b_n\), exactly one of these is the case.

Arithmetic is carried out in the obvious way: in terms of the above sequences, \(\alpha+\beta\) is the limit of \(a_n+b_n\), \(\alpha \beta\) is the limit of \(a_n b_n\), and so on. This all works, in the sense that it doesn't matter which representative sequences you choose, you get the same result. We call this set of numbers the nonstandard reals, denoted \({}^*\mathbb{R}\).

With this in place, we say that an element of \({}^*\mathbb{R}\) is infinitesimal if it lies between \(-a\) and \(a\) for all positive real \(a\), is finite if it lies between \(-a\) and \(a\) for some positive real, and is infinite if it lies between \(-a\) and \(a\) for no positive real \(a\). Clearly, \(1/n\) gives an infinitesimal number, and \(n\) gives an infinite one. An infinite number arising from a sequence of integers is called an infinite integer, so in this second case we actually have an infinite integer.

Then we get the following consequences: all look reasonable, though some take a little more proving than others.
  1. \({}^*\mathbb{R}\) is an ordered field.
  2. The normal real numbers (which we can now call standard real numbers) \(\mathbb{R}\) live inside \({}^*\mathbb{R}\). If \(a \in \mathbb{R}\), then the constant sequence \(a,a,a,\ldots\) represents \(a\) in \({}^*\mathbb{R}\).
  3. Any finite nonstandard real \(x\) is the sum of a standard real number, denoted \(\mbox{st}(x)\) plus an infinitesimal. We call \(\mbox{st}(x)\) the standard part of the nonstandard real number.
  4. If \(\epsilon\) is infinitesimal, then \(1/\epsilon\) is infinite, and vice-versa.
  5. The product of a (non-zero) finite number and an infinitesimal is infinitesimal.
  6. A function \(f:\mathbb{R}\to\mathbb{R}\) naturally determines \({}^*f:{}^*\mathbb{R} \to {}^*\mathbb{R}\): if \(\alpha \in {}^*\mathbb{R}\) corresponds to the sequence \(a_n\), then \({}^*f(\alpha)\) corresponds to \(f(a_n)\).
But in addition to all this, we have a much less obvious, but very powerful result.

The transfer principle: Any statement which does not involve the notion of standard part is equally true in \(\mathbb{R}\) and in \({}^*\mathbb{R}\).

In other words, every theorem about the standard real numbers has an analogue in the nonstandard real numbers. This is subtler than it might seem, and I'll return to it later.

So, we have a way of extending the standard real number system to include infinitesimal (smaller than any positive real) and infinite (larger than any positive real) quantities. Why would we bother with this? In other words...

What do we get for our money?

We can (and should) think of working with infinitesimals as a way of working with sequences that tend to zero, and working with infinite numbers as a way of working with sequences that diverge to infinity. The point of all this is to give a way of dealing efficiently with these sequences, so that we don't have to work with them explicitly.

Here is a sampler of how nonstandard analysis can be used to give an alternative, and perhaps more intuitive, picture of some aspects of standard real analysis.

It's useful to have a notation for when two numbers differ by an infinitesimal quantity: we write \(a \approx b\) if \(a-b\) is infinitesimal.

Continuity

The usual way of saying that \(f\) is continuous at \(a\) is to say that we can make \(f(x)\) as close as we want to \(f(a)\) by making \(x\) sufficiently close to \(a\), or, equivalently, that if \(x_n\) is any sequence tending to \(a\), then \(f(x_n)\) tends to \(f(a)\).

But then we can see that this latter is coded in the language of nonstandard analysis as saying that if \(\epsilon\) is any infinitesimal, then \({}^*f(x+\epsilon) \approx f(x)\). This gives a precise sense to the notion that changing the input to a continuous function by an infinitesimal amount changes the value by an infinitesimal amount.

Example

Consider \(f(x)=x^2\). Then \({}^*f(x+\epsilon)=x^2+2\epsilon x + \epsilon^2\), which differs from \(x^2\) by an infinitesimal amount, so \(f\) is continuous.

Differentiation

\(f\) is differentiable at \(x\) with derivative \(L\) if, whenever \(\epsilon\) is infinitesimal, then \[\frac{{}^*f(x+\epsilon)-{}^*f(x)}{\epsilon} \approx L. \]

Important note: this is not saying that the quotient is the derivative, but that it differs from it by an infinitesimal.

Example

Again, we consider \(f(x)=x^2\). Then \(({}^*f(x+\epsilon)-{}^*f(x))/\epsilon = 2x+\epsilon\), so \(f'(x)=2x\).

Integration

We want to calculate \(\int_a^b f(x) dx\). Then we choose an infinite integer \(N\), and split up the interval \([a,b]\) into \(N\) equal strips of width \(\epsilon = (b-a)/N\), and calculate the sum \[ S = \sum_{i=1}^N \epsilon f(a+i\epsilon) \] If \(S \approx I\) for some real \(I\), then \(I=\int_a^b f(x) dx\).

Example

To calculate \(\int_0^1 x dx\) we let \(N\) be infinite, so in this case we have the associated infinitesimal \(\epsilon = 1/N\). So \(a=0\), \(b=1\) and \(f(x)=x\), so \[ \sum_{i=1}^N \epsilon f(a+i\epsilon) = \sum_{i=1}^N \epsilon \times (i \epsilon) = \epsilon^2 \sum_{i=1}^N i = \epsilon^2 \frac{1}{2}N(N+1) = \frac{1}{2}+ \frac{\epsilon}{2} \approx \frac{1}{2} \] so that \[ \int_0^1 x dx = \frac{1}{2}. \] Again, the integral isn't the sum: they differ by an infinitesimal quantity.

Et cetera

One can then use nonstandard arguments to prove the usual theorems of introductory analysis such as the intermediate value theorem, define partial derivatives and multiple integrals, solve differential equations using an analogue to a finite difference method but now with an infinitesimal step length, and so on.

The point is that these and many other operations we do in calculus and analysis can be replaced by more intuitive notions that look like ordinary algebra. There is no free lunch, of course: one has to establish properly that the nonstandard reals really do behave themselves, and that the intuitive notion of a derivative as a quotient, or an integral as a sum, really do match up to the standard notions. Which leads to the next question.

Is it worth it?

As long as we understand that what we get for our money is not access to theorems which could not be proven by standard means, but an alternative approach to proving these theorems, then it does seem to be worth it. There's a small but active community of people who use nonstandard analysis to investigate problems in pure and applied mathematics, and who have gained valuable insights from it. It is, admittedly, a matter of efficiency rather than possibility; but the process of setting up the nonstandard framework does a lot of heavy digging once and for all, so that it doesn't have to be repeated on each occasion when it is needed. This can lead to a sufficiently streamlined approach to a problem that a previously intractable problem becomes practically attainable via nonstandard means. The standard interpretation in terms of sequences can then be recovered, and (if one finds it necessary) a standard proof reverse-engineered.

Devilish details

There are, of course, many details which I've glossed over here, and many devils reside in them. The subject has its subtleties, which I'll try to indicate a couple of here.

The first subtlety goes right back to the way of measuring sets of integers which was used to decide when two sequences converge to the same nonstandard number. How do we know it's possible to do this?

The obvious solution would be to simply exhibit an example explicitly. Unfortunately, there is no way of doing this. One can prove the existence of such a size function, but there is no constructive proof. (The standard argument uses Zorn's lemma.) So in practice, we can't actually work with a measure of the required type, only deduce the consequences of having one.

Another subtlety is that the real numbers are well known to be the unique complete ordered field. But the nonstandard reals include the reals, and I said above that theorems about the reals transfer to theorems about the nonstandard reals. So what's going on here?

In fact, the situation is similar to that of the integers in Peano arithmetic: the inductive axiom in its usual form makes the integers unique, but when one restricts to sets that can be finitely described, nonstandard models of first order Peano arithmetic exist. This time, we note that the transfer principle only allows us to talk about sets which do not make use of the notion of standard, and again this permits the existence of nonstandard models. The sets which we can talk about are called internal, and the others are external.

So for example, we can't ask for the least upper bound of the set of all infinitesimals, since the set of all infinitesimals requires us to use the notion of standard to define it, and so is an external set, not an internal one. Thus the nonstandard reals are not complete, in the sense that there are bounded sets without a least upper bound. However, any bounded internal set has a least upper bound.

Something I didn't mention up in the section on integration is that infinite integers are really, really big. If \(N\) is an infinite integer, then there are uncountably many nonstandard integers less than \(N\). The reason is that if \(\alpha\) is any real number in \((0,1)\) then there is a nonstandard integer \(M \lt N\) such that \(M/N \approx \alpha\); but then \((M+n)/N \approx \alpha\) for any \(n \in \mathbb{N}\), so there are infinitely many nonstandard integers less than \(N\) for every real number in \((0,1)\). It's not easy to see what is going on here.

Further reading

If you want to fill in some of these details (or rather, see them filled in for you) Keisler's Foundations of Infinitesimal Calculus provides at least as much detail as you might want, and also shows some of what can be done with it. It was written to accompany the same author's Elementary Calculus: an infinitesimal approach, which presents an undergraduate course on calculus and analysis based on nonstandard analysis.

As you may have guessed from the fact that I went to the effort of writing this, I think nonstandard analysis is worth knowing about. Not everybody agrees. For a look at the criticisms and responses, you can start with the wiki page, which will also point you to other approaches to infinitesimals.

Thursday 18 May 2017

It's geometry, Jim, but not as Euclid knew it.

The CGSE syllabus contains a chunk of plane geometry. Plane geometry is a Good Thing. It's a beautiful intellectual structure, an example of the power of abstract reasoning. And those are good reasons to study plane geometry. Unfortunately, the framework of the GCSE seems almost designed to hide that from the unlucky pupils who are ploughing their way through the syllabus, and to make life difficult for the teacher.

One problem is that the bits of geometry included in the syllabus (a collection of what I assume somebody decided were the more interesting theorems about lines, triangles, polygons and circles) are not presented as part of an edifice, but as a bag of results, all too often just used to solve some fairly contrived problems. The individual theorems may indeed be proven, in the sense that the pupil is presented with an argument that it is hard to disagree with. But there's no real sense of just where it all starts from, and how it is all built from a remarkably small collection of givens (the definitions, postulates and common notions).

Another is when a pupil asks the perfectly reasonable question "What's the point of this?". Brave attempts to find practical everyday uses such this tend not to be convincing: nobody really believes that painters routinely use Pythagoras' theorem to find out what length of ladder they need. The actual practical application of the plane geometry is vanishingly small for almost everybody, and the pupils are quite capable of spotting this. The obvious consequence is that they become less motivated, rather than more. The issue was raised a while ago by David Wees , where he took Pythagoras' theorem in particular as a starting point for a 'Why should we teach this topic?' discussion. Much of what I write below could be read as my answer to that question.

The real reasons for doing plane geometry are quite different from its practical application. But, as I noted above, the context of syllabus and assessment that they crop up in can be a potent distraction from the value of the material. I want to argue for the material, but also for the importance of having it placed in a context (i.e. in the specification and in the corresponding textbooks) that makes the value explicit. The hope is to give the students a different, and I hope a better, reason for working on the material: and maybe at least as important, to give a response to those adults who also have the attitude that the point of school mathematics is to teach material which is of everyday practical use, as exemplified in this blog, by a maths teacher. (That blog, and some of the responses to it, are part of the reason I don't think I'm just preaching to the choir.)

So, what do I claim are the reasons that pupils should study plane geometry?

One is, of course, that there is strong pressure on schools and pupils for everybody to get a passing grade on GCSE maths, and being able to do the geometry makes a useful contribution to this. This is, also of course, a terrible reason, to which the easiest response would be to take it out of the syllabus entirely, so I'll say no more on it.

I'm also not going to argue that a study of Euclidean geometry teaches logical thought. It would be nice if it were true, but I don't think it's any truer of geometry than it is of Latin. Tempting though it is to believe that understanding how to prove a geometric proposition or analyse the grammatical structure of a long Latin sentence will transfer to skills of logical deduction and analysis in other contexts, the evidence is just not there. If you want that, you have to teach that.

So what are the good reasons?

One good reason for it is that it is an opportunity to understand an elegant mathematical structure. But this requires a quite different approach, one of seeing how the results start off with very basic ideas and build up to more advanced ones. Jumping straight to the 'interesting' bits without seeing the journey doesn't give an appreciation of the intellectual journey. Is the alternative demanding? Yes, it is. I don't see that it's more demanding that reading Shakespeare though, and we expect that of our school pupils.

I'm not advocating a return to some kind of nineteenth century public school syllabus of working through and memorizing the entirety of book one of Euclid's elements, culminating in the proof of Pythagoras' theorem. (That said, I wish that schooling afforded the time to work through and understand the material.)

Another good reason is that it lets the pupils in on some of the few eternal truths that a school education can impart. It doesn't matter what culture you are from, or when you study it. From the basic assumptions of Euclidean geometry will always and inevitably flow the conclusions. They aren't a matter of opinion, or preference, or cultural heritage.

And all that isn't to say that there is nothing interesting about the way the different cultures have developed their understanding of plane geometry, and the relationship between those developments and the particular synthesis offered by Euclid. But this is probably to go considerably beyond the scope of what might be possible in the curriculum up to the age of 16, even in my rather Utopian ideal school.

So, bearing in mind the unpleasant fact that there are limits to the amount of time which will be devoted to mathematics in the school curriculum, what might be done to improve the situation? Of course, I do have an opinion about this, which comes in two parts.
  1. The first part of the solution would be to be more honest about the reasons for studying the material. We don't do it for reasons of everyday practicality, but because it has an intrinsic beauty and elegance, and demonstrates the power of careful logical thought, and in particular how so much can come from a relatively small basis. How the results of this show up in the real world, and why some of them have been important historically then demonstrates that there are real-world connections and consequences; but they aren't the reason for studying geometry, any more than holding up a layer of paint is the reason for the Eiffel Tower.
  2. The second part would be to present the material in a way which makes the logical structure clearer. Obviously (I hope) I'm not suggesting a complete presentation of the geometric books of the elements, but rather a whistle-stop tour starting with the rules of the game, and outlining what goes into getting to the particular results of interest, emphasising the logical dependencies. It would also be useful to discuss at least briefly why these rules, and how you can do more if you extend the collection of tools (which was, of course, extremely well-known by the time of Euclid).
The skeptic will wonder just how successful such an approach might be, pointing out that in spite of our attempts to persuade schoolchildren of the merits of Shakespeare (or Bach or Michaelangelo or...) the success rate is distressingly low. Most of them finish their study with a sigh of relief, and think 'Good, I'll never have to read/listen to/look at that again.' But even if that's true in the majority of cases, we don't respond by just not bothering with Shakespeare: we try to plant the seed, and rejoice in the successes. We're not doing anybody a service by not even trying to plant the seed of an appreciation of mathematics, and may be doing a disservice to those who might respond to a more meaningful exposition.

Thursday 11 May 2017

0.999... it just keeps on going.

If you follow any kind of mathematical discussions on the internet, you'll probably have noticed that there's one particular topic that refuses to die: \[ 0.\dot{9} = 1 \] In the way of perpetual internet conversations, although the participants change, the actual positions are pretty constant.

One side of the conversation insists that \(0.\dot{9}\) is not equal to \(1\), but is a little (even infinitesimally) less than \(1\).

The other side correctly says (but somehow unpersuasively argues) that, on the contrary, \(0.\dot{9}\) is exactly equal to \(1\), and provides various reasons for that.

Here are what my extremely scientific survey suggests are two of the commonest arguments presented in support of that true statement.

The first goes something like:
We all know that \[ \frac{1}{3} = 0.\dot{3} \] so, multiplying both sides by \(3\) we obviously have \[ 1 = 3 \times \frac{1}{3} = 3 \times 0.\dot{3} = 0.\dot{9} \]
The second is some version of:
Let \[ 0.\dot{9} = S. \] We'll find an equation for \(S\) that tells us what it is.

Clearly, \[ 10S = 9.\dot{9} \] and so \[ 10S-S = 9.\dot{9} - 0.\dot{9} \] i.e. \[ 9S = 9 \] and therefore \(S=1\).
And yet somehow the intransigent "it's a bit less than \(1\)" brigade aren't quite convinced. But why not?

Let's take a closer look at those two arguments.

The multiplications both rely on operating on an infinite decimal expansion by manipulating it in a way that looks entirely plausible. It should, because in each case it is in fact correct. But in each case the truth is not actually obvious. Why should we believe that multiplying an infinite decimal expansion by \(3\) is the same as multiplying each term by \(3\), or believe that multiplying an infinite decimal expansion by \(10\) is the same as shunting the decimal point along one place to the left? The second argument also requires the subtraction of one infinite decimal expansion from another. The algorithms we have for doing such arithmetic start with the rightmost non-zero digit, and in these cases there is no starting position.

In fact none of these calculations is trivial. The distributive law obviously holds for any multiple of any finite sum, but a non-terminating decimal is not a finite sum. It might be tempting to say that the multiplication and subtraction work no matter how many terms in the expansion we use, so they also work when the number of terms is infinite. Unfortunately for that argument, it doesn't matter after how many digits of \(0.\dot{9}\) we truncate, the result is less than \(1\), so why isn't that also true when the number of digits is infinite?

We have, then, two serious issues.
  1. Most people don't actually have a clear understanding of infinite decimal expansions. They're introduced early enough in the school curriculum that everybody becomes familiar with them, but that isn't the same thing. So arguments involving them don't carry the psychological weight that they might.
  2. The inductive argument makes it so 'obvious' that \(0.\dot{9} < 1\) that the other approaches, persuasive as they might seem, can't displace this conviction.
I suspect that some combination of these issues (though doubtless not clearly formulated) is what makes it so hard for people to see that \(0.\dot{9}\) is actually equal to \(1\). In the end, when faced with two incompatible statements, people will prefer the one that causes less mental discomfort.

Now, we have to take seriously the fact that it really does take a considerable amount of work to give a genuine proof that \(0.\dot{9} = 1\). You have to explain what is meant by a non-terminating decimal expansion, which means that you have to explain the meaning of the limit of a sequence, and in particular the meaning an infinite series as the limit of a sequence of partial sums. This is a serious undertaking.

Once you've done that, you have various strategies available. Amongst them are proving that the multiplications above are legitimate, and so that with this additional detail the result is established. And by the time you've done all the heavy digging about infinite series, and what a limit is, and what the real numbers are, the recipient of your wisdom will find it much harder to retain their conviction that the sum is somehow "infinitesimally" less than \(1\). (I hesitate to claim that they will find it impossible.)

It's worth noting that some of those arguing that \(0.\dot{9} = 1\) may not have a much better justification for their position that those who argue that \(0.\dot{9} <1\). You can believe the right thing for the wrong reason.

So don't be dejected if you can't persuade somebody whose mathematical preparation doesn't extend to an understanding of an infinite decimal expansion as the limit of an infinite sequence of partial sums. For such people, it is (almost) inconceivable that a sequence of approximations can all be less than \(1\) and the limit be exactly \(1\). In fact, if you have persuaded such a person by means of one of the arguments up above, you probably ought to feel a little shabby about it: although the conclusion is true, the argument as presented is no (or at least not much) more rigorous than the (mistaken) intuition that each finite expansion of \(9\)'s is less than \(1\), so the infinite expansion is still less than \(1\).

Alas, you probably ought to be dejected if one of the arguments presented above was what convinced you that \(0.\dot{9}=1\). I'm afraid you were tricked, and it really is a bit more complicated than that. On the bright side, you have a fascinating journey ahead of you if you decide to fill in the gaps.

Tuesday 9 May 2017

It isn't trig, or geometry, or analysis: it's maths.

We all learn the standard trigonometry function addition formulae at school: \[ \begin{split} \sin(\theta+\phi) &= \sin(\theta)\cos(\phi)+\cos(\theta)\sin(\phi)\\ \cos(\theta+\phi) &= \cos(\theta)\cos(\phi)-\sin(\theta)\sin(\phi). \end{split} \] There are many ways to see the truth of these. Let's take a look at two of them, and the relationship between them.

Geometry

Since a rotation by the angle \(\theta\) about the origin in the Euclidean plane is described by the matrix \[ R(\theta) = \left[ \begin{array}{cc} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{array} \right], \] it follows that a rotation by \(\theta\) followed by a rotation by \(\phi\) results in a rotation by \(\theta+\phi\). In terms of the matrices, \[ \begin{split} R(\theta+\phi) &= R(\theta)R(\phi) \\ &= \left[ \begin{array}{cc} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{array} \right] \left[ \begin{array}{cc} \cos(\phi) & -\sin(\phi) \\ \sin(\phi) & \cos(\phi) \end{array} \right]\\ &= \left[ \begin{array}{cc} \cos(\theta)\cos(\phi)-\sin(\theta)\sin(\phi) & -\sin(\theta)\cos(\phi)-\cos(\theta)\sin(\phi) \\ \sin(\theta)\cos(\phi)+\cos(\theta)\sin(\phi) & \cos(\theta)\cos(\phi)-\sin(\theta)\sin(\phi) \end{array} \right]\\ &= \left[ \begin{array}{cc} \cos(\theta+\phi) & -\sin(\theta+\phi) \\ \sin(\theta+\phi) & \cos(\theta+\phi) \end{array} \right], \end{split} \] so comparing the entries of the matrices, we obtain the addition formulae.

There's actually quite a bit going on here.
  1. First, this works no matter the size (or sign) of \(\theta\) and \(\phi\). If we can just remember that \(\sin(-\theta)=-\sin(\theta)\) and \(\cos(-\theta)=\cos(\theta)\) then formulae for \(\cos(\theta-\phi)\) and \(\sin(\theta-\phi)\) come along for the ride.
  2. Second, although matrix multiplication is not generally commutative, in this case it is: \[ R(\theta+\phi) = R(\theta)R(\phi) = R(\phi)R(\theta) = R(\phi+\theta). \]
So, thinking in terms of angles of rotation rather than just angles at a corner of a triangle, we get a nice intuitive derivation of the addition formulae.

Analysis

On the other hand, we have some intriguing formulae. The power series for the exponential function is very familiar: \[ \exp(x) = 1+x+\frac{x^2}{2} + \ldots + \frac{x^n}{n!} + \ldots \] But now if we replace the real value \(x\) by the purely imaginary one \(i\theta\) we have \[ \begin{split} \exp(i\theta) &= 1 + i\theta - \frac{\theta^2}{2} -i\frac{\theta^3}{3!} + \ldots \\ &= 1-\frac{\theta^2}{2}+\frac{\theta^4}{4!} + \ldots + i(\theta-\frac{\theta^3}{3!} + \ldots)\\ &= \cos(\theta) + i \sin(\theta). \end{split} \] Putting this together with \[ \exp(x+y) = \exp(x)\exp(y) \] we have \[ \begin{split} &\cos(\theta+\phi)+i\sin(\theta+\phi)\\ &= \exp(i(\theta+\phi))\\ &= \exp(i\theta)\exp(i\phi)\\ &= (\cos(\theta)+i\sin(\theta))(\cos(\phi)+i\sin(\phi))\\ &= (\cos(\theta)\cos(\phi)-\sin(\theta)\sin(\phi)) + i(\cos(\theta)\sin(\phi)+\sin(\theta)\cos(\phi)). \end{split} \] So again, this time by matching up the real an imaginary parts, we get the addition formulae.

Maths

These calculations are weirdly similar. There's obviously something going on here. So, what's the connection between the matrices and the complex exponentials?

I claim that they're really just different ways of representing the same thing, and we can see this in terms of complex numbers, where we use the fact that the complex number \(z=x+iy\) can be thought of as the point \((x,y)\) with position vector \(\left[ \begin{array}{c} x\\y \end{array} \right]\).

To see this, let's take a complex number \(z=x+iy\) and multiply it by \(\exp(i\theta)\): we have \[ \begin{split} \exp(i\theta)z &= (\cos(\theta)+i\sin(\theta))(x+iy)\\ &= \cos(\theta)x-\sin(\theta)y +i(\sin(\theta)x+\cos(\theta)y). \end{split} \] Alternatively, we could take the vector \(\vec{z}=\left[\begin{array}{c} x \\ y\end{array} \right]\) and multiply it by \(R(\theta)\). That gives \[ \begin{split} R(\theta) \vec{z} &= \left[ \begin{array}{cc} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{array} \right] \left[\begin{array}{c} x \\ y\end{array} \right]\\ &= \left[\begin{array}{c} \cos(\theta)x-\sin(\theta)y \\ \sin(\theta)x+\cos(\theta)y \end{array} \right] \end{split} \] We can see that there is a common structure here. By converting between the complex number \(z\) and the vector \(\vec{z}\) and simultaneously between the exponential \(\exp(i\theta)\) and the matrix \(R(\theta)\), we convert between the analytic picture and the geometric one.

So, why did I call this last section maths? Weren't the geometry and analysis maths? Well, yes, of course they were, and (at least to my taste) both are good fun. But seeing that two apparently different structures and really one and the same but in two different guises is somehow a particularly satisfying, and even mathsier part of mathematics.

Monday 1 May 2017

Yes, but why? (book review)

Ed Southall of the Unversity of Huddersfield ( @solvemymaths on Twitter) has written a book called "Yes, but why? Teaching for understanding in mathematics."

I have to start off by saying that the publication of this book makes me very angry.

I admit that there is some risk of that giving the wrong impression, so I should explain why.

The book itself is not the problem: it has received many enthusiastic endorsements, and this review is in fact one of them. What makes me angry is that there is a need (and there really is a need) for anybody to point out that mathematics teaching should involve the students being led to an understanding of the material. Unfortunately, far too much school mathematics is taught as a bag of tricks which have to be memorized and applied in well-drilled situations. I also admit that I don't teach in school, so I don't have direct evidence of this: however, I do teach at the undergraduate level, and the indirect evidence is overwhelming. Of course, it isn't all students, but it is far too many.

So I repeat: I'm angry that the book serves a purpose. Ideally, my review would have consisted of
This is an entirely superfluous work. How could anybody even think of teaching mathematics in any way other than for understanding?
Alas that this is not the case, and the book is addressing an extremely serious problem. As a consequence of the 'bag of well-drilled tricks' approach to teaching, far too many students come to the conclusion that mathematics is a collection of more-or-less arbitrary rules and methods which they have to memorize, but which have no meaning or value other than to enable them to get at least a grade C (or its equivalent in the new version) in GCSE and preserve the school's precious place in the league tables. It is onerous, boring and pointless. Except for a relatively fortunate minority, for whom the quality of teaching is probably largely superfluous, it does nothing to inspire a wish to take the subject further.

So, what should be done about it?

Ed Southall has provided a sourcebook for teachers and trainee teachers with a wealth of useful material, all aimed at helping the teacher to help the student to understand what is going on.

It would have been all to easy to produce a book of careful proofs of all the results used in GCSE mathematics and the instruction to teach these to the students. If I'd been tasked with trying to solve the problem, I might well fallen into that trap. Fortunately, the book was written by somebody with a much more sensible plan.

The book begins with basic arithmetic and works through the algebra, geometry and statistics required for GCSE. But rather than stating and proving theorems, the results are explained by means of examples accompanied whenever possible by explanatory diagrams. By means of these illustrations, the student can come to an understanding of the mathematics, which can subsequently be developed into formal proof as and when that is appropriate.

The great advantages of this approach are: when something is actually understood, it is much easier to remember (in particular it doesn't have to be memorized); and it is then easier to see what piece(s) of mathematics can be applied to which problem.

Of course, good intentions are one thing, implementing them is another. Fortunately, the implementation matches the intention. The examples are well-chosen, the illustrations clear and relevant, and the combination should furnish any trainee teacher (and many experienced ones!) with a wide variety of material to help them teach the material so that the student understands it. The text is enriched by inclusion of 'Teacher tip' boxes providing explicit advice, 'break out sections' with additional material for investigation, and explanations of the historical development of some of the mathematics.

But what if your teaching is at a higher level than GCSE, say A level, or undergraduate? I'd still recommend the book, for two reasons.
  1. It is often the case that a student who appears to be struggling with more advanced maths is actually being held back by problems with GCSE level material. This book will be valuable for those of us unaccustomed to teaching the material, as it provides strategies for explaining concepts we tend to take for granted.
  2. The examples remind us that students at whatever level we teach are liable to learn better from the approach in which well-chosen examples are used to provide a framework for the understanding to develop on. (This is a salutory reminder for those of us who perhaps coped well with the definition-theorem-proof exposition rather than just surviving it, and who may be inclined to assume that their students will find it equally congenial.)
So I agree with all the previous responses I've seen to this new book: it's a valuable addition to the reading list of anybody who teaches mathematics from early secondary school onwards.

Of course, I have a few (very few) quibbles. I list these mostly in the hope that they might make a second edition of the book even better than the current one.
  • p39 It might be helpful if the discussion of \(20-3-4-5-6\) were to introduce the convention of left-associativity explicitly rather than just using it without comment.
  • p53 onwards, mathematical arguments are sometimes presented as sequences of equations without any explicit statement that each equation is in fact equivalent to the next. I think it would greatly help the students to realize that they really have to present an explanation of what is happening, and this includes the logical connection between each line and the subsequent one. Being loose about this at the early stages can lead to considerable problems later in the student's mathematical development, where (for example) solutions may be lost or gained in the attempt to solve equations if the logical connections are not dealt with carefully.
  • p95 It is common to state that the angle at the circumference is half the angle at the centre: this might be a good place to consider what happens if the point on the circumference is not on the same side of he chord as the centre, as the theorem statement doesn't usually specify.
  • p98 Two triangles are argued to be similar on the basis have a common hypotenuse and he same length of base: but this only gives a matching two sides and the non-included angle, which does not imply (in general) that the triangles are similar.
  • pp149f It might be helpful to explain that the top right hand box in the diagram at the top of p150 contains a 75 because 25% off gives 75% of the original price.
  • p237 Anscombe's quartet is much stronger than just having the same mean; the means and variances of the \(x\) values are all the same, as are those of the \(y\) values, and all four data sets have the same line of best fit. It's an important example of how totally different types of data may be summarized by the same descriptive statistics, and deserves at least a little more than it gets here.
  • p306 In the 'Teacher tip' box, the degrees symbol is missing from the angle.