Saturday, 21 July 2018

What is school mathematics for?

What is the purpose of school mathematics? I have opinions… Here I shall mostly be ranting about what I think school mathematics ought to do.

Note that when I talk about what schools actually do, I'm referring to the English educational system: YMMV.

So, on with the show.

To a first approximation, I think that school mathematics should provide three things.

Basic skills

First off, everybody should leave school numerate. They should be arithmetically competent, and be able to cope with percentages, ratios, proportions and the like. They should also have a reasonably well-developed number sense, and be able to make plausible estimates and approximations.

For example, the total cost (say) of a shopping basket shouldn't come as a complete surprise, nor should the cost per head of population of something that costs the country a billion pounds, nor the fraction of one's own disposable income that that amounts to. And how to work out whether a 500g box of cereal or a 750g box is the better deal should not be a mystery (or rely on the shelf label helpfully giving the price per 100g).

This is the kind of stuff that the mythical general public probably think of as mathematics. But it's not the only stuff that you comes in handy in 'real life'.

We also all need some basic statistical understanding, at least we do if we aren't going to have the wool pulled over our eyes all the time. By this I mean understanding data and its presentation, what can be reliably inferred from it, and the nature of uncertainty in what is reported (or predicted!).

This is, I think, a bare minimum of the mathematics that you need to cope with everyday life (without being taken advantage of) and take meaningful part in a democracy (so that your decisions have a fighting chance of being well-informed).

I leave you to decide for yourself whether this utilitarian need is met. (And if it's as much of a Bad Thing as I think it is for it not to be met.)

Intellectual development

Outside that, mathematics is there to provide you with intellectual development.

The purpose of this material is not to provide every pupil with a bag of mathematical techniques which they will employ in day-to-day life or in their jobs. Attempt to persuade them of a utilitarian value are doomed to failure, because they all know adults. The purpose of this material is the same as exposing them to great literature or art; it's to expand their mental horizons.

This shouldn't consist (only) of methods and algorithms to be learned, but try to provide an appreciation of the structure and process of mathematics. Mathematics is, after all, one of the subjects where you don't just have to take somebody else's word for it. You can give a satisfying argument about why something is the case. There's a huge amount here that can be explored with no more than numbers and basic algebra.

For example, nobody should leave school knowing that 'a minus times a minus is a plus' without also knowing that this isn't just an arbitrary rule, that it follows from some more fundamental choices, and that those choices themselves are made for a good reason.

Currently, there is a lot of material in the compulsory maths curriculum for school pupils up to the age of 16. There's a lot of learning to do, and a lot of skill to be acquired. And there's a lot of teachers working extremely hard to help their pupils acquire the knowledge and develop the skills.

I think that the curriculum is drastically overloaded, and encourages the wrong (I did say I have opinions) kind of learning: a procedural, or instrumental, skill based learning which is quicker to achieve, but doesn't provide the deeper understanding that an appreciation of the mathematics requires.

I'd much rather see a much smaller curriculum, with time for a deeper understanding of fewer topics to be developed.

You can even show how if gives an enriched understanding of the world around us.

"How much shorter is a short cut corner to corner compared to going along the pavement round the edge of a rectangular lawn?" is an interesting question, and nobody is pretending that a builder or a plumber needs to use Pythaogoras in everyday life to work out how long a pipe or a rafter has to be.

Note that this isn't going to rob anybody of job opportunities. The requirement to have some arbitrary number of GCSE's (including English and mathematics) at some arbitrary level of achievement isn't selecting the people with the required level of mathematics to be able to do a job—though there may be some hope that it selects out those who can't read, write or count.

Preparation for further study

And of course, we also have to prepare those who might go on to study maths at higher level. What we should be providing here is a basis on which undergraduate study can be built. A good foundation for the more advanced and abstract material covered in a degree course.

The current preparation expected (and required) by Universities is A-level, which includes a large chunk of calculus. I feel in a better position to comment on this, because every year I see a lot of students who have been moderately to very successful at A-level mathematics. As the years have passed, I've become more convinced that this does not provide a particularly good foundation for what we do at university.

Again, the problem seems to be one of an overstuffed curriculum. I see a lot of students who can sort of do a quite a lot of stuff. But a large proportion of them seem to have, again, an entirely instrumental learning: when I see this, I do that.

In fact, some of them by this stage have a fairly well-developed conviction that that is what mathematics is: it's a bag of procedures and algorithms which you learn to apply to the problems that you're been taught how to solve using them. These get downright panicky, and in some cases even resentful, at the idea of having to explain why some mathematical fact is true, or why some procedure works.

Unfortunately for them, there tends to be much more emphasis on why things are true or work in undergraduate maths than they are used to.

So, back to my previous point. They should be arriving with a basis on which further undergraduate study can be developed.

I don't think this is well-served by having a lot of stuff which so many students have learned to do without knowing why it works.

I think it would be better served by having a reduced curriculum, but with a deeper understanding of the material. I'd much rather see students arrive with a strong understanding of algebra, and a decent idea of what constitutes a proof, than see them arrive with an extensive bag of tricks which they don't really understand. OK, it would be even better if they had the deeper understanding of all the stuff they currently meet. But it takes time and effort to develop that understanding. You can't have both.

Yes, that would mean that we have to start calculus from scratch at university. But it's pretty standard to do that anyway (if quite fast) to try to give the deeper understanding that we want; I'm not sure that it would make a big difference to what we do in the first year.

And even if it did, that might not be a terribly Bad Thing.

There's a whole other discussion to be had about what the content of a mathematics degree ought to be, but one point on which we could probably all agree is that very little of the content of a mathematics degree is of great relevance to one's life after graduation. For those who do use mathematics subsequently, whether commercially or academically, the most important thing is likely to be the ability to learn new mathematics efficiently, and to understand what's going on.

And I'd be happy to maintain that this is better done by developing understanding, if necessary at the expense of some extent of coverage.

Afterthoughts

I think I'm pretty safe in these opinions. I might be entirely wrong, but nobody will ever do the experiment (i.e. completely overhaul mathematics education) to find out.
For a nice summary of what I called instrumental and deeper learning, there is Richard Skemp's article
Relational Understanding and Instrumental Understanding, Mathematics Teaching 77 20–26, (1976).
available here, courtesy of @republicofmath, and for a much more extensive discussion, his book The psychology of learning mathematics

Wednesday, 27 June 2018

Why is subtraction so hard?

Subtraction presents various problems to learners of mathematics, not least with the mechanics of hand calculating the result of subtracting one multi-digit number from another. But that's not the kind of difficulty I'm interested in: I'm more interested here in the difficulties that arise when computations involving several additions and subtractions of integers are involved, or at the slightly more advanced level where algebraic computations involving several additions and subtractions are required. I'm going to take a look at what I think lies underneath the difficulty.

The procedure required to deal with subtraction starts off deceptively simple.

Addition and subtraction are of the same priority, and so a calculation involving a string of additions and subtractions is calculated from right to left: \[ 1-2+3-4-5=-1+3-4-5=2-4-5=-2-5=-7 \] Slightly more complicated, we may have things inside parentheses which consequently have to be evaluated before the result is combined with the rest: \[ 2-(3-4) = 2-(-1) = 3 \] But already we find lots of mistakes being made. It isn't uncommon for the first calculation to turn into some variation on \[ 1-2+3-4-5=1-2+3-(-1)=-1+2=1 \] or for the second to go down the route of \[ 2-(3-4)=2-3-4=-1-4=-5 \]

The root of the problem

In one sense, the root of the problem is combinatorial. There are lots of different ways of combining the various numbers in the calculations, but they don't all give the same answer. There are more ways of getting the wrong answer than the right one.

But that doesn't really get to the bottom of it. Why do learners combine things in the wrong way when they've been given careful explanation and instruction of the right way? (And it's not just novices: errors along these lines persist with undergraduates!)

I think that the real problem is that subtraction is a horrible operation, in many ways.

The horrors of subtraction

First, we learn to add together positive integers, or natural numbers. The result of adding together two natural numbers is always a natural number: addition is a well defined binary operation on natural numbers.
But subtraction isn't.
When we subtract one natural number from another, the result can be negative. Suddenly a whole new can of worms is opened up, because we need to deal with the arithmetic of negative numbers, which, as we all know, is a paradise of opportunity for sign errors.
Next, subtraction isn't commutative. This doesn't seem to be subtle, but let's take a quick look: \[ 2-1=1 \qquad \text{ but } \qquad 1-2 =-1 \] The answers are different, but only by being of opposite signs; and to rub salt in the wound, the second may well be done by subtracting \(1\) from \(2\) and changing the sign. Losing a minus sign is easy for all of us.
Even more horrible, subtraction is not associative: \[ (1-2)-3 \neq 1-(2-3) \]
But we get very used to using associativity and commutativity without even noticing it when adding. It's much harder to remember not to do it when it's inappropriate, once you've become thoroughly accustomed to doing it.
And lastly, to cope with the problem we end up with a plethora of rules about combinations of signs. More to remember, often just regarded as more rules to learn. I cannot express in words the horror I feel when I hear students muttering (or even writing out) "FOIL" when doing algebraic calculations; I haven't yet heard "KFC" from my students when working with fractions, but I've seen it on the interwebs often enough to know it's only a matter of time.

Now, there are many rules in algebra and arithmetic, but a relatively small number of principles (axioms) from which they all follow.

Unfortunately, you can't really teach arithmetic to five-year-olds by starting off with the axioms of a field.

But maybe there comes a time to revisit the collection of rules and see where they come from. At least, I always find it illuminating to go back to elementary stuff and see what new insight I might have based on further study.

Avoiding the issue

So here's my approach to a better (I have many opinions, and very few of them are humble) view of the arithmetic and algebra of subtraction.

Uninvent subtraction.

OK, so I haven't gone back in time and relived my education without the notion. What I mean is that I look back and say that after all this time I now see that there's a better way to think of subtraction.

First, any number has \(a\) an additive inverse \(-a\): a number we add to it to get \(0\). Then if \(a\) and \(b\) are two numbers, I say that \[ a-b \] is a (somewhat undesirable, but unavoidable) notation for \[ a+(-b) \] which is the sum of \(a\) and \(-b\), the additive inverse of \(b\).

And at this point I really, really wish that there were a conventional typographical distinction between \(-\), the binary subtraction operator, and \(-\), the unary additive inverse operator. Oh well.

This has the immediate consequence that we can write our sums slightly differently: \[ 1-2+3-4-5 = 1+ (-2) + 3 +(-4) + (-5) \] and these quantities can now be rearranged at will, since addition is associative and commutative.

In the second example, we have \[ 2-(3-4) = 2 + (-(3-4)) \] And what do we add to \(3-4\) to get \(0\)? we add \(4-3=1\), so \(-(3-4)=1\) and \[ 2-(3-4) = 2 + 1 = 3 \]

Of course, it's still necessary to be able to add together combinations of positive and negative numbers: there is no free lunch here. But it's a way of thinking about the computation that reduces the temptation/opportunity to make mistakes, so maybe it's a slightly cheaper lunch.

One consequence is that if I see \[ x+5=7 \] I don't think 'subtract \(5\) from each side', I think 'add \(-5\) to each side'.

I find it a useful way to think about what's going on.

I try to stress this with my students, but with mixed success. And just about everybody who's teaching early undergraduate material will surely be doing something like this in the students' first encounter with abstract algebra and axiomatic systems.

My general impression is that as long as they're doing a set problem of the 'show from the axioms of a field' type, most students can be persuaded to work this way.

But I find that as soon as that context is gone and they're back in the old familiar territory of doing algebra or arithmetic, for most it's also back to the old familiar way of proceeding. And for quite a few this has just too many opportunities for the old familiar way of going wrong.

But also

I have very similar thoughts about division, the reconstruction of which I leave as an exercise for the sufficiently motivated reader.

Tuesday, 29 May 2018

What does it mean to say that the universe is expanding?

Q: The universe is expanding, so what is it expanding into?

A: The question is wrong.

OK, that answer could use some expansion of its own.

Newtonian Cosmology

Here's one version of the story.

Careful measurements (and careful interpretation of the measurements) tell us that the stuff in the universe is getting further apart. What's more, if we look around us, then to a good degree of approximation, everything is travelling away from us, and the farther away it is, the faster it's getting away from us. In fact, the speed at which distant galaxies are receding from us is proportional to how far away they are. We know this because of red shift in the spectra of these galaxies, from which we can calculate how fast they are travelling in order for a Doppler shift to cause that much red shift.

This seems to suggest that we are (surprisingly or not will depend on your religious and philosophical framework) at the centre of the universe; it is perhaps less flattering that everything is trying to get away from us.

Let's start off by thinking about how special our position is. We think of the universe as being full of a homogeneous dust (for scale, the dust particles are clusters of galaxies). In the simplest model, we are at the origin, and at time \(t\) the the dust particle at position \(\pmb{x}\) has velocity \(\pmb{v}=H(t) \pmb{x}\). Then \(H(t)\) is the scale factor which relates distance to velocity of recession, and we call it \(H\) the Hubble parameter, to honour Hubble who made the initial observations.

But what about somebody who isn't at the origin? What about somebody living in a different dust particle, far, far away, say at position \(\pmb{x}'\)? What would they see?

Let's use \(\pmb{v}'\) to denote the velocity as seen by this observer at position \(\pmb{x}'\). We have to subtract their velocity from the velocity we see at each point to get the velocity relative to them at each point. Then we get \[ \pmb{v}' = H(t) \pmb{x} - H(t) \pmb{x}' = H(t)(\pmb{x}-\pmb{x}') \] But \(\pmb{x}-\pmb{x}'\) is just the position as seen by the observer at \(\pmb{x}'\); this other observer also sees everything in the universe recede at a rate proportional to distance, and it's even the same constant of proportionality. Rather suprprisingly, this suggests that we aren't anywhere special, and that the cosmological principle (or principle of mediocrity) that we are nowhere special is compatible with the observations.

The first thing to notice is that the universe is infinite - it's the whole of three dimensional Euclidean space - forever. And it's always full of this cosmological dust: but if we go back in time, we see that the dust particles are getting closer together, so the dust must be getting denser, and if we go forward in time, it is getting more rarefied. But there's no 'edge' to the material, no expanding of anything, even though the dust particles are getting further apart. This is one of the joys of having infinite space: it can be full of stuff, but there's nothing to stop the contents spreading out.

There's a slight problem when we try to push back to \(t=0\), when the density would have to be infinite: but I avoid that by saying that I have a model for a universe that seems compatible with Hubble's observations for all \(t \gt 0\), and I've no idea what kind of Big Bang happened at \(t=0\); that's not part of my model. There's certainly no sense in which the universe started off as a single 'infinitely dense point' and exploded outward.

So, great news! As long as the universe just happens to be full of stuff exploding outwards, we have a rather neat mathematical model compatible with the observations, and all we need is some physics compatible with this model to make it plausible.

Physics.

Aye, there's the rub.

Problem One

The first problem is a kinematic one. No matter how small \(H(t)\) is, objects sufficiently far away have to be moving faster than light. This is definitely hard to reconcile with special relativity, and special relativity is very well confirmed these days, so don't just want to abandon it and go back to Newtonian mechanics. This cosmological model really is only good for Newtonian mechanics.

We could try to fix it by slowing down the expansion sufficiently far away, so that Hubble's law is only valid over a certain scale: but that gets hard to reconcile with the cosmological principle.

Or we might try to fiddle with special relativity by arguing that it's great at small scales, but needs correction at larger ones.

Problem Two

But even if we only look for a Newtonian model, there is still a serious problem.

We are at the origin of the universe, and we see everything receding from us. But as thing get farther away, they speed up: everything is not only getting farther apart, but is acceleration. So our distant observer, living in a dust particle far, far away, is accelerating. That's an issue because it means that we are, after all, in a special place. Though all observers see a universe expanding in the same way, only we don't feel a force pushing on us to make us accelerate.

This is actually two problems.

There's the actual physical issue of just what is supposed to be doing the pushing. We need a pretty weird model of gravity to be compatible with gravity being attractive (so as to hold solar systems and galaxies together) at one scale, but repulsive (so as to make stuff accelerate apart at the bigger scale).

And there's the philosophical issue of what makes us so special: in this model we are, once again, in the actual centre of the universe.

So, can we find another way of modelling the universe which retains the good property of satisfying Hubble's law, but doesn't have either of the problems of this repulsive force which only becomes significant at large distances, or puts us in a privileged position?

Relativistic cosmology

Well of course we can. As usual, that's the sort of question that only gets asked if there is a good answer available.

But it involves a radically new way of understanding how the universe fits together.

Newtonian space-time

In the Newtonian picture of space-time, with the origin at the centre of expansion, we have spatial position, represented by a position vector \(\pmb{x}\) and time, represented by \(t\). If we have two events \(A\) and \(B\), which happen respective at position \(\pmb{x}_A\) and time \(t_A\), and at position \(\pmb{x}_B\) and time \(t_B\) then the distance between them is the usual Euclidean distance \(\|\pmb{x}_A -\pmb{x}_B\|\), and the time between them is \(|t_A-t_B|\).
This is all very persuasive, and seems to match well to experience.

But it isn't quite right, in ways which don't show up until things are moving fairly fast, at which point we have to worry about special relativity.

Fortunately, there's another way to think about distance and time, which is both compatible with special relativity (in the sense that it is very well approximated by special relativity over small regions) and with the observations of red shift (though for a somewhat subtler reason than you might expect).

Space-time and the metric

In this picture, we don't have separate notions of spatial and temporal distance, but a unified notion. Just as before, every point has a time coordinate \(t\) and space coordinates \(\pmb{x}\). But now we write \[ ds^2 = c^2 dt^2 - a(t)^2 dx^2 \] which is a way of packaging up a combination of difference between the (time and space) coordinates of two nearby events: \(dx^2\) is a shorthand for the sum of the square of the spatial coordinate differences.

So what does this new thing tell us?

If two events happen in the same place, then the time between them is \(ds/c\). If they happen at the same time, then the physical distance between them is \(a(t)\) times the coordinate distance, which is the square root of the sum of the squares of the coordinate differences (i.e. the usual Pythagoras' theorem thing). This really is the physical distance, as in it counts how many meter sticks (or yard sticks if you're old) you'd have to use to join them up.

It's worth noting that over a small enough region of space-time, this is indistinguishable from special relativity: so we really are doing what was hinted at above, by saying that special relativity (great though it is over small regions) might need to be adjusted to deal with large regions.

\(a(t)\) is called the scale factor, and it relates the coordinate distance between two simultaneous events to the spatial distance. It's a function of time, so if we have two objects, both of whose positions are fixed (i.e. their spatial coordinates are unchanging), the distance between them is nevertheless changing as time passes. This isn't because they're moving apart (or together, for that matter), but because the notion of distance is now something that depends on time.

Let's take a look at this: suppose we have two objects, at fixed spatial locations, so the coordinate distance between them, \(d_C\) is unchanging. The physical distance between then is \(d_P = a(t) d_C\). Now we can think about how fast \(d_P\) is changing. We have \[ \dot{d}_P = \dot{a}(t) \times d_C = \frac{\dot{a}(t)}{a(t)} \times a(t) d_C = \frac{\dot{a}(t)}{a(t)} \times d_P. \] So the rate of change of the distance between these fixed objects is proportional to the distance between them, and the Hubble parameter is \(\dot{a}/a\).
When \(a(t)\) is increasing, we loosely (and maybe misleadingly) call this expansion of space, or even worse, expansion of the universe.

OK, we have Hubble's law again, in the sense that the dust particles are all moving apart with a rate proportional to separation. And again, the universe is always full of these dust particles. But now the particles are stationary (meaning that their spatial coordinates don't change) and geometry is evolving so that the distance between them is changing.

But now we get more.

We also find that particles at fixed coordinates are not accelerating, so no force is required to get this effect of everybody seeing all the material in the universe obeying Hubble's law.

Red shift and recession

But there's something fishy about all this.

The original observation that gives rise to Hubble's law is obtained from measurements of red shift, interpreted as relative velocity. What does all that mean if everything is 'staying still' while 'geometry evolves'?

At this point we have to take into account that when we look at things a long way away, we are seeing them as they were a long time ago. In fact, we're seeing them when the scale factor was a bit different.

Amazingly, there's a geometric effect which exactly (well, almost exactly) fits the required bill.

It turns out that if a light signal is emitted at time \(t_e\) and received at time \(t_r\), then the signal is red-shifted; the ratio of the wavelength of the emitted signal to that of the received one is the ratio of the scale factor at these two times. This is called cosmological red shift and is an entirely geometric effect.

How does this fit in with Hubble's observation, though?

If you consider what an observer sees when looking out at a distance object, the cosmological redshift matches (to a very high degree of approximation) the redshift that you would get if there were no cosmological redshift, and the redshift was due to a Doppler effect from a velocity that's just the same as the rate of change of physical distance.

This is even more amazing than the fact that Hubble's law pops out.

Actually, this isn't exactly true. It is, though, true to a first degree of approximation, and it's very hard to measure distances and redshifts accurately enough to see any deviation. More importantly, it doesn't affect the result that the dependence of rate of recession (really redshift) on distance is the same no matter where in the universe you live.

So how does the scale factor evolve?

Up at the top, I was unhappy with the Newtonian approach because it had some mysterious field of force pushing stuff apart. I seem to have just replaced that with a mysterious scale factor that does the same job.

Not so. In the same way as Newtonian gravity tells us how massive bodies pull on each other, Einsteinian gravity has an equation of motion that tell us how \(a(t)\) behaves.

In Einsteinian gravity, there is the metric (and we've restricted attention to a particular class of metric already) and there is a field equation (the Einstein Field Equation, or EFE) which relates how the metric varies in space and time to the matter filling the universe. In order to work out how \(a(t)\) evolves, we need to have a model for the material that fills the universe.

This model comes in two parts.

First, there's the general model. The simplest thing that has a chance of working is a perfect fluid: so we're considering galactic clusters as parcels of a compressible fluid which has negligible viscosity.

Then, there's the specific model. A fluid has a density and a pressure, and a functional relation between the two is called an equation of state. There are standard equations of state for describing a universe full of easily compressible dust, or full of electromagnetic radiation.

Once you've chosen an equation of state, you can work out what the EFE tells us: and this is a differential equation for \(a(t)\), which can then be solved. One part of constructing a good cosmological model is deciding just what equation of state to use for your model.
Let's not go there.

The point is that \(a(t)\) isn't arbitrary; it comes out of the physical model.

In fact, all this theory was worked out by Friedmann decades before Hubble's observational work.

It's rare in science for somebody to manage to predict something before it actually happens, so stop for a moment to be impressed.

What have we got for our effort?

By investing some effort in having a model of space-time in which the geometry itself is dynamic, we get a universe in which Hubble's law is satisfied, but most of the ingredients are now interpreted in a new way. In particular, the recession of distant galaxies is no longer due to them moving, but now due to the fact that the definition of distance itself depends on time in such a way that the distance between stationary objects can be changing in time.

Have I missed anything out?

Oh boy, have I ever.

I've missed out

all the controversy in the interpretation of distance/velocity measurements.
just about all the calculations that justify the claims I make. Nothing new there.
what the Einstein Field Equations say. That's a big topic just in its own right, and it's a bit technical.
what the matter model of a perfect fluid actually looks like.
any discussion of the curvature of space. (I've only considered the simplest case.)
and much, much more.

Fortunately, there's lots of excellent material out there. If all I've done is whet your appetite for more, I'm happy.

Friday, 18 May 2018

It's all the same to me.

Mathematics is the subject, probably more than any other, that we associate with precision. And yet a particularly powerful tool in mathematics consists of a fruitful failure to distinguish between objects.
The basic idea is that we have a set of objects, but rather than thinking of them all as separate, we partition the set up into a collection of subsets which, between them, cover the entire set, and no two of which have any element in common. These subsets are called equivalence classes, and an element of an equivalence class is a representative. The set of equivalence classes itself is called a quotient space.
And most of the time, it's the quotient space that's the interesting thing, not the equivalence classes themselves. It can be all too easy to lose sight of this if one is buried in the detail of showing that a given relation between elements of a set does this job of partitioning into equivalence classes, and so is an equivalence relation.

Some Examples

A simple example is to partition up positive integers, as normally expressed in base ten, into classes of length; 1 digit numbers, 2 digit number, 3 digit numbers, and so on.
But this isn't very interesting. There's not much useful structure in this quotient space for us to play with. For example, if we add together two 1 digit numbers, sometimes the answer is 1 digit long, and sometimes it is 2 digits long.
On the other hand there are others, just as familiar, but more interesting.

Parity

Partition up integers (all of them this time, not just positive ones) into odd and even. This time things are very different. The quotient space consists of just two classes, which we can call ODD and EVEN. If we add together two odd numbers, the answer is even, whatever odd numbers we start with. Similarly if we add an odd and an even, or multiply.
The key here is that we do the arithmetic on two representatives, but no matter which representatives we pick, the answer is in the same equivalence class. This means that the quotient space \(\{\textrm{ODD},\textrm{EVEN}\}\) inherits operations of addition and multiplication from the set of all integers, \(\mathbb{Z}\).
There's another example, just as familiar, but the source of well-known problems.

Fractions

Two pairs of integers, \((m,n)\) and \((M,N)\) (where \(n\) and \(N\) are non-zero) are equivalent if \(mN=Mn\). This partitions the set of all pairs of integers (where the second integer is not zero) in the same way as being odd or even partitions the set of all integers. Secretly I'm thinking of the first pair as the fraction \(\frac{m}{n}\) and the second pair as the fraction\(\frac{M}{N}\).
This relationship - this equivalence relation - captures exactly the notion that the ratio \(m:n\) is the same as the ratio \(M:N\), in other words that that the two fractions represent the same rational number. We can think of each equivalence class as a rational number, and a pair of integers, i.e. a fraction, in the equivalence class, as a representative.
Then the usual arithmetic of fractions gives a way of adding and multiplying these equivalence classes, so that the equivalence class of the result doesn't depend on the choice of representatives of the classes being added or multiplied. In other words the rules for doing arithmetic with (and simplifying) fractions gives the arithmetic of rational numbers.
I can't shake the suspicion that the core of why schoolchildren have so much difficulty getting to grips with fractions is this idea of different representatives for the same rational number. It's a subtle idea, and it takes quite a lot of getting used to.

Going further

These ideas can be extended in various ways, some more familiar than others.

Congruence arithmetic

In the case of odd and even numbers, although it isn't the way we tend to think about it, we're regarding two numbers as equivalent if their difference is a multiple of \(2\).
Of course, there's nothing special about \(2\). We could pick any number to form the equivalence classes and it would work in just the same way. If we used \(3\), then sensible representatives are \(0,1,2\); if we used \(4\) then \(0,1,2,3\), and so on. If we are being careful, we use a notation such as \([0]\) to denot the equivalence class that \(0\) lives in. To save wear and tear on our fingers and keyboards, we can rather naughtily just use these representative numbers to denote the equivalence classes.
If we obtain our equivalence classes by using \(n\), then we call the resulting quotient space \(\mathbb{Z}_n\).
Actually, there is something a bit special about \(2\). If we multiply together two elements of \(\mathbb{Z}_2\), then the result is only \(0\) if one of the elements is. This is also the case with \(\mathbb{Z}_3\).
But with \(\mathbb{Z}_4\), we see that \(2 \times 2 = 0\).
OK, so from what I've written so far, it might be \(4\) that's special.
Really, what's special - or at least, well-behaved - is \(\mathbb{Z}_p\), where \(p\) is a prime. In this case, a product can only be zero if one of the multiplicands is zero. In fact, if we work with a prime, then we get a structure that is actually better behaved than \(\mathbb{Z}\), the set of all integers. We can divide by any non-zero element, in the sense that if \(a\) is not \(0\), then there is a unique solution to \(ax=1\); and multiplying by this number does just the job of dividing by \(a\). In the jargon, \(\mathbb{Z}_p\) is a field, which basically means that has a well-behaved addition, subtraction, multiplication and division.
This results in both fun and profit: the resulting number systems are fun to play with, and form the mathematical basis for some of the contemporary forms of cryptography.

Complex Numbers

We can use the same ideas to construct the complex numbers without inventing a new kind of 'imaginary' number whose square is negative.
The starting point is to consider the set of all real polynomials, and then to consider two polynomials as equivalent if their difference is a multiple of \(x^2+1\).
Then one (and the useful) representative for any polynomial is the remainder when it is divided by \(x^2+1\), so the representatives are of the form \(a+bx\) where \(a,b\) are real numbers. But then \(x^2 = 1\times(x^2+1)-1=-1\), where I misuse '\(=\)' to mean 'is equivalent to'. Then the arithmetic of these objects looks just like normal complex arithmetic with \(i^2=-1\), except that I have an \(x\) where you might expect to see an \(i\).
To phrase that slightly differently, the complex numbers, \(\mathbb{C}\), are the quotient space obtained by regarding two polynomials as equivalent if they differ by a multiple of \(x^2+1\).
This doesn't give us any algebra we didn't have before, but it does show that there is a perfectly respectable way of constructing the complex numbers out of non-problematic objects. We don't have to postulate some new kind of number whose square is negative, and then worry about whether we have accidentally made mathematics inconsistent. Well, at least we can be sure that complex arithmetic is no more inconsistent than real arithmetic.
And just as there is nothing special about \(2\), there is nothing special about \(1+x^2\); we could use any polynomial here.
In fact, the similarity to \(\mathbb{Z}_n\) is profound.
Just as with the integers, if the polynomial we work with cannot be expressed as a product of lower degree polynomials, we can add, subtract and multiply any pair of polynomials, and also divide by any non-zero one.
This idea lets us construct a whole new collection of algebraic structures, including in particular the finite fields, which are important in aspects of statistical experiment design and error correcting code.

The wild blue yonder

It doesn't stop here. Examples of naturally occurring partitions and quotient spaces abound thoughout mathematics, in geometry, algebra and analysis.

Linear algebra

If \(V\) is a vector space, and \(S\) is a subspace of \(V\), then we can think of two vectors \(u\) and \(v\) as equivalent if \(u-v \in S\). This relationship partitions \(V\) up into subsets that look like copies of \(S\) obtained by translation away from the origin, called cosets. The quotient space is again a vector space.
This turns out to be an important construction in analysis, where \(V\) and \(S\) are spaces of functions.

Group theory

If \(G\) is a group and \(H\) is a subgroup of \(G\), the we can say that two elements \(g_1,g_2 \in G\) are equivalent if \(g_1g_2^{-1} \in H\), and again this gives us a partition of \(G\) into cosets. This time it isn't automatic that the quotient space gets a well-defined multiplication from \(G\); \(H\) has to satisfy a certain condition, called being a normal subgroup.
In the case of Abelian groups, this lets us break a big group down into a kind of tower of other groups, in a way analogous to the prime decomposition of an integer.
It is also important in Galois theory, which relates the structure of certain groups to the solubility of polynomials.

Geometry

We start off with the \(x-y\) plane, and consider two points to be equivalent if their \(x\) and \(y\) coordinates differ by integer amounts. This time the quotient space is a new surface, which we can visualise as a unit square with opposite edges glued together: a familiar structure in certain video games, where leaving the screen to the left, or top, brings you back in at the right, or bottom. In fact, this quotient space is a torus.
Constructing surfaces as quotient spaces is a powerful tool in the attempt to classify all the possible surfaces. And it also suggest the question of when two surfaces should be regarded as 'really the same': another partitioning of all surfaces described in different ways into equivalence classes of 'the same surface, but a different description'.

And so on

This barely scratches the surface. The more mathematics, and the more mathematical structures, you meet, then more you come across interesting quotient spaces.
Googling for 'quotient' plus the thing you are interested in (group, surface, vector space, and many more) will probably return something cool, so don't waste any more time here, go and look.

Wednesday, 2 May 2018

Base n to mod n in one easy step.

We learn how to write numbers (well, integers) in bases other than ten fairly early in our mathematical education. A bit later, we learn how about congruence arithmetic and doing arithmetic modulo any integer \(n \gt 1\). I remember that there was a certain degree of initial confusion between the two, though that was really just about remembering which label went with what: after all, the two are entirely different, aren't they?

As it turns out, not so much.

Let's have a brief reminder of congruence arithmetic, and then see what representing numbers in different bases has to do with it.

Congruence arithmetic modulo \(n\)

The simplest and most familiar form of congruence arithmetic is something we're all familiar with long before we meet congruence arithmetic as a thing.

We all know right from a very early stage that an even number plus an even number is even, an even plus an odd is odd, and so on. But an even number is one which leaves a remainder of zero when it's divided by two, and an odd number is one which leaves a remainder of one. So these odd and even rules are telling us that when we do sums with integers, the remainder on division by two is well behaved with respect to addition and multiplication. Or in the hieratic jargon, we can do arithmetic modulo two.

It's not so obvious that this works with numbers other than two, but it's just as true.

Choose any integer \(n>1\). We say that two integers, \(a\) and \(b\) are congruent modulo \(n\) if their difference is a multiple of \(n\), or, equivalently, if they have the same remainder on division by \(n\).

Then with a bit of work, we can show that arithmetic modulo \(n\) makes sense: it doesn't matter what numbers we start with, the remainder after doing an arithmetic operation only depends on the remainders of the numbers we combine.

This takes a little bit of algebra. On the other hand...

Arithmetic modulo ten and modulo two

If we carry out the usual addition and multiplication operations of integer arithmetic, writing all numbers in base ten, then it is obvious (as long as we believe the standard algorithms!) that the units digit of a sum or product depends only on the units digits of the numbers multiplied or added.

But the units digit of a number (written in base ten) is just the remainder when you divide by ten! Looking only at the units digits is doing arithmetic modulo ten.

In just the same way, if we write our numbers in binary, then the units digit of a sum or product depends only on the units digits of the numbers combined. And the units digits is \(1\) for an odd number and \(0\) for an even one.

Looking only at the units digits is now doing arithmetic modulo two.

Arithmetic modulo \(n\) again

But there's nothing special about two or ten (at least, not in this regard).

If we choose any integer \(n \gt 1\), and write integers in base \(n\), then the units digit of a sum or product depends only on the units digits of the numbers combined.

And now, looking at the units digits is working modulo \(n\).

So although base \(n\) and modulo \(n\) look like entirely different ideas, they turn out to be rather closely related.

What's the point?

I'm not entirely sure. Thinking about last digits in base \(n\) might be an interesting way to approach congruence arithmetic, as it comes from a more concrete direction. It also suggests various investigative projects, such as when the last digit of a product can be zero, or how negative numbers fit into this picture. For me, the point was really that there is actually a real relationship between these two apparently different bits of maths, and it's always fun to see connections like that.

Monday, 9 April 2018

The two child puzzle

How careful do you have to be with probability problems? Very careful. Very, very careful. Lack of enough care leads to (apparent) paradoxes.

The puzzle

So, here's the puzzle. It's something of a classic in probability.

I tell you that John Smith has two children. I also tell you that one of them is a girl. I then ask you for the probability that the other is a girl.

Obviously, we need some ground rules here. We'll work with the slight simplification that boys and girls are equally likely, and that the probabilities of boy and girl are independent.

A plausible answer, and maybe the one I'm hoping for, is that since boys and girls are equally likely, the probability that the other is a girl is \(1/2\).

Let's pretend that was your answer, because now I get the chance to show you something interesting.

Ah-hah! I pounce.

No, you aren't thinking it through carefully enough. If you know that one child is a girl, then all you know about the children is that they aren't both boys. There are four equally likely sets of children: boy-boy, boy-girl, girl-boy and girl-girl. Since we've exclude boy-boy, then we are equally likely to have any of the remaining possibilities. In one third of them is the other child a girl, so the probability is actually \(1/3\).

I can dress this up more formally. \(A\) is the event John Smith has a daughter and \(B\) is the event John Smith has two daughters. I want \(P(B|A)\), the probability of \(B\) given \(A\). Fortunately, there's a formula for that. Since \[ P(A \cap B) = P(B|A)P(A) \] and \(P(A \cap B)=P(B)\) (since \(A \cap B = B\)), I see that \[ P(B|A) = \frac{P(B)}{P(A)} \] I know that \(P(B)=1/4\) and \(P(A)=3/4\), so (just as I said above), \(P(B|A)=1/3\).

This seems strange, and counter-intuitive. But I'm pretty sure I didn't get it wrong, because I wrote a little bit of code and simulated it: the following code randomly creates \(n\) two-child families, and finds the fraction of the time that a family with at least one daughter actually has two.

import random

def sim(n):
    hasGirl=0
    hasTwoGirls=0
    for ii in range(n):   
        kids=[random.randrange(2),random.randrange(2)]
        if sum(kids)>0:
            hasGirl += 1
        if sum(kids)==2:
            hasTwoGirls+=1
    print(1.0*hasTwoGirls/(1.0*hasGirl))

Running this for moderately large \(n\) certainly suggests that the probability is \(1/3\).

How much more convincing can you get?

But then I read this book, Standard Deviations by Gary Smith. It's an excellent exposition of how statistics and data can be misused to mislead. I was thoroughly enjoying it until...

He considers this problem of the two children and demolishes the above argument.

The puzzle redux

John Smith has two children, and takes one of them for a walk. If the child that he took for a walk was a girl, what is the probability that the other child is also a girl?

"Ooh, I know this one," I thought. "It's a third."

And then I read on. Smith carefully and lucidly points one that this is wrong, and that the probability actually is \(1/2\).

Naturally I took personal offence at this, so I read it carefully a few times, trying to find his mistake.

I didn't find one.

So I did the obvious thing. I worked out the probability carefully.

There are two things I'm interested in here. \(A\) is the event The child John Smith took a walk with is a daughter and \(B\) is the event Both of John Smith's children are girls.

I want to know \(P(B|A)\). Just as before, I have \(A \cap B = B\), so that \[ P(B|A) = \frac{P(A \cap B)}{P(B)}=\frac{P(B)}{P(A)} \] But \(P(B) = 1/4\) and \(P(A)=1/2\), so \(P(B|A)=1/2\).

By now I was feeling decidedly uncomfortable, so I did the next obvious thing. I wrote a little bit of code and simulated the situation where a randomly chose child from a randomly chosen family is taken for a walk \(n\) times, and finds what proportion of the time the child taken for a walk is a girl, she has a sister at home:

import random

def walksim(n):
    girlWalk=0
    hasTwoGirls=0
    for ii in range(n):
        kids=[random.randrange(2),random.randrange(2)]
        if kids[random.randrange(2)]==1:
            girlWalk += 1
            if sum(kids)==2:
                hasTwoGirls+=1
    print((1.0*hasTwoGirls)/(1.0*girlWalk))

Running this was moderately large values of \(n\) convinced me that yes, indeed, the probability was \(1/2\).

By now I was thoroughly confused.

So What's Going On?

According to a quotation popularly attributed to Einstein, it's a mark of lunacy to keep asking the same question, hoping for a different answer. So naturally, I was a little concerned about just what it might mean if you managed to actually get a different answer.

So I thought about it a bit harder, and came to finally came to a (in retrospect, obvious) conclusion: they aren't actually the same puzzle.

So what's the difference? In version one, you're told that one of John Smith's children is a daughter. In version two, you're told that John Smith has taken one of his children for a walk, and that child is a daughter. Either way, we've been told that one of John Smith's children is a daughter.

But we haven't actually been given quite the same information.

Version 1 simply guarantees that at least one child is a daughter; version 2 tells us that the particular child that John Smith took for a walk is a daughter. The possibility that the child left at home is a daughter is excluded from the calculation in version 2, but included in version 1, which is why the probability that both are daughters is lower in version 1 than in version 2.

So the whole confusion boils down to what you think it means to be told that One of John Smith's children is a daughter. If you interpret it as saying no more than John Smith has at least one daughter, then the answer is \(1/3\). But if you interpret as saying that one particular child is a daughter (the older one, or the one standing on the left, or the only one who has ever been locked alone in the cupboard under the stairs for misbehaving, or, indeed, the one that he took for a walk that day) then the answer is \(1/2\).

To the extend that there's a disagreement here, it isn't an internal disagreement in probability theory (which would be a Very Bad Thing Indeed), it's a disagreement of interpretation of what One of John Smith's children is a daughter means.

The moral

As is often the case, the problem isn't the mathematical analysis: in each version that was straightforward. The problem is setting up the right mathematical model for the situation, which can be very tricky if there's any linguistic ambiguity in the description.

Sunday, 25 March 2018

Cantor and infinity in a countable universe.

Cantor's theorem states that the cardinality of a set is always strictly less than the cardinality of its power set. But there is a universe for set theory which is countable. What's going on here?

There's a really nice PBS Infinite Series video on Youtube How Big are all Infinities Combined, exploring the sizes of infinite sets. (If you haven't watched it, this would be a great time. I'll wait.) In this video they make heavy use of Cantor's theorem, which says

Given any set \(S\), its power set \(\mathcal{P}(S)\) is bigger than \(S\)

where \(\mathcal{P}(S)\) is the set of all subsets of \(S\).

The amazing thing about this is that it is true even when \(S\) is infinite, though we have to work a little to say what we mean by bigger than here: what we mean is that there is no injective function \(\mathcal{P}(S) \to S\), though there is an obvious injective function \(i:S \to \mathcal{P}(S)\), given by \(i(x)=\{x\}\) for each \(x \in S\).

It's a non-trivial fact (the Cantor-Schröder-Bernstein theorem) that given two sets \(A\) and \(B\), if there is an injection from \(A\) to \(B\) and an injection from \(B\) to \(A\), then there is a bijection from \(A\) to \(B\); so it makes sense to say that \(A\) is smaller than (or equal to) \(B\) in size if there is an injection from \(A\) to \(B\), but none from \(B\) to \(A\).

This immediately tells us that there is a whole tower of infinite sets that we can make; starting with the natural numbers \(\mathbb{N}\), then we can think about \(\mathcal{P}(\mathbb{N})\), \(\mathcal{P}(\mathcal{P}(\mathbb{N})\), and so on, each of them strictly larger than its predecessor.

It's not very hard to prove that the second of these, \(\mathcal{P}(\mathbb{N})\), is the same size as \(\mathbb{R}\). It is an interesting question whether there is any infinite size between that of \(\mathbb{N}\) and \(\mathbb{R}\), but one which I won't consider here.

In the video (you have watched it now, right?) the question is how big all the infinities combined is.

But there's even an issue with what it means for one infinite set to be bigger than another, and that's what I want to explore here.

The problem is, to a large extent, knowing just what sets and functions are.

It's easy enough to decide what a function is if you know what a set is.

If \(A\) and \(B\) are sets, we think informally as a function \(A \to B\) as a rule that assigns to each element of \(A\) an element of \(B\). We can formalise this by saying that a function \(A \to B\) is a subset of \(A \times B\) (the set of all ordered pairs \((a,b)\) where \(a \in A\) and \(b \in B\)) which has the property that each element of \(A\) appears in exactly one of the ordered pairs.

But what's a set?

It would be really nice if we could just say that a set was a (somehow) well-defined collection of objects. But trying to do without some kind of restriction on how we define the sets tends to lead to real problems, the most famous of which is Russell's paradox:

Let \(X\) be the set of all sets which are not elements of themselves. Then if \(X \in X\), by the definition of \(X\), \(X \not \in X\). But conversely, if \(X \not \in X\) then again by the definition of \(X\), \(X \in X\).

This is a genuine paradox in the world of naive set theory, so we need to be a bit more careful about what counts as a description of the elements comprising a set if we want to avoid problems like this.

The (ok, a) solution is to develop axiomatic set theory, and one particularly popular version of that is ZFC (which stands for Zermelo-Fraenkel plus the Axiom of Choice). I won't explain just how ZFC works here, but in essence it is a set of rules that tell us how we can build new sets out of existing ones, and a couple of particular sets that we insist on (so that we can build others out of them). One of the sets we insist on is the set of von Neumann integers, so \(\mathbb{N}\) is definitely in in there.

There's always the question of whether our axioms are consistent. Most mathematicians are happy that the ZFC axioms are in fact consistent, partly because we can describe a collection of objects (which we call sets) that seem to satisfy all the axioms. The study of particular collections of objects and operations on them which satisfy particular sets of axioms is called model theory, and it throws up some surprising results.

We'll call a collection of sets that satisfy the axioms of ZFC a universe.

There is a fundamental theorem of model theory, the downward Löwenheim-Skolem theorem that tells us that there is a countable universe (yes, the entire universe is countable) that satisfies the axioms of ZFC.

But there's clearly something fishy about this. Cantor's theorem also follows from these axioms of set theory, and that tells us that the power set of \(\mathbb{N}\) is uncountable.

How does an uncountable set fit into a countable universe? This is Skolem's paradox.

And this is where the issue of just what is a set rears up in all its glory and horror.

Let's review the situation.

The entire universe is countable, so in this model of set theory there are only countably many subsets of \(\mathbb{N}\). (This model clearly fails to include most of the subsets that we might think ought to be there.) Since the universe is countable and \(\mathcal{P}(\mathbb{N})\) is countable, there is a bijection between \(\mathbb{N}\) and \(\mathcal{P}(\mathbb{N})\).

But Cantor has proved that there can be no such bijection.

But remember what a function \(A \to B\) is: it is a subset of \(A \times B\) satisfying a certain constraint, namely that every element of \(A\) occurs in exactly on of the ordered pairs.

We know that the \(\mathcal{P}(\mathbb{N})\) in this universe is countable, so there is a collection of ordered pairs \((n,X)\), one for each natural number \(n\) and such that every \(X \subseteq \mathbb{N}\) in this universe appears in the collection.

The escape clause is the one that tells us that this collection is not—cannot be—one of the sets in the universe.

Looking at the universe from outside, we see a function—what we might call an external function—giving a bijection between \(\mathbb{N}\) and this universe's model of \(\mathcal{P}(\mathbb{N})\).

But if we live inside the universe, we can't see this: it doesn't exist in the universe. There is no internal function which matches the natural numbers with the sets of natural numbers.

You may well have an objection to raise at this point: the real numbers are a complete ordered field, and it's also a theorem that this defines them up to isomorphism. And we know that the real numbers (at least in the mathematical universe we're accustomed to) really are uncountable. So how do they fit into this countable universe?

The answer to this is unavoidably a little technical.

The completeness axiom for the real numbers says something about all subsets of \(\mathbb{R}\); this is a second order axiom. The axioms of ZFC aren't powerful enough for this type of operation. They are first order, which (roughly speaking) means that they let you say something about all elements of a set, but not all subsets of it.

If you use the first order version of the axioms for a complete ordered field, there is more than one structure satisfying the axioms. Some are countable (as must be the case in the countable universe of ZFC guaranteed by the Löwenheim-Skolem Theorem), and some are larger (indeed there are models of all infinite cardinalities).

So all the above is true if we use the power of first order axioms to specify what a set is. If we allow second order axioms, so that we can use the notion of 'all subsets', then the game changes. Then why don't we use some kind of second order theory, and avoid this kind of nonsense?

It's very tempting to do this. Unfortunately, there are problems with second order theories too. Though it solves some of the problems, for example tying down the real numbers much more specifically, the notion of proof in a second order theory is not as well behaved as in first order theory. Consequently, logicians who work on the foundations of mathematics explore the strengths and weaknesses of first and second order theories.

And what about the normal 'working mathematicians'? Most aren't particularly concerned with these foundational issues: and it's quite possible to go through a degree, a PhD, and a research career in mathematics without meeting or having any reason to care about them. In any given context, most mathematicians simply use first or second order axioms according to which are more convenient.

Acknowledgement

This post was sparked off by the Infinite Series video How Big are all Infinities Combined. Gabe Perez-Giz (@fizziksgabe) also bears some responsibility, since when I made a brief comment on twitter about this he asked if I had a blog post about it. Well, now I do. Thanks also to James Propp (@JimPropp) for some helpful suggestions on presentation.