Monday, September 17, 2018

What's a number? 3 - in an orderly manner

Why is infinity plus 1 bigger than 1 plus infinity? What about infinity times 2 and 2 times infinity? It all depends, on course, on what you mean by plus, times, and infinity.

Back to the beginning

In a previous blog post I discussed the natural numbers (by which I mean the non-negative integers, though some people exclude \(0\)). To recap briefly, we can describe by these means of how they behave, using the Peano axioms, and also give a concrete realization of them, namely the von Neumann numbers.
These von Neumann numbers are given by starting with the empty set \(\emptyset\), which is \(0\), then definining a successor function by saying that the successor of any number \(n\) is the set \(s(n)=n ; \{n\}\), and then collecting together everything you can make that way. Then \(1=s(0)=\{0\}, 2=s(1)=\{0,1\}\) and so on.
With this notion of a number, we can define the usual addition and multiplication; we also have an order, so \(m \leq n\) if there is some \(p\) such that \(n=m+p\).
I already looked at how to build on these by extending them in various ways to include negative numbers, rational numbers, and finally complete them with the irrational numbers to form \(\mathbb{R}\), the set of real numbers.
This time I want to go back to these natural numbers, and build on them in a different way: this time by taking the ordering on them, and seeing how we can enlarge our set of numbers in an interesting way using this idea.
But first, there's going to be a bit of a digression, in which we'll see some general ideas involving ordering.

(Well) ordered sets

We say that a set, \(S\) is ordered if there's a relationship \(\leq\) on it which has the properties
  1. If \(a,b,c \in S\) then \(a \leq b\) and \(b \leq c\) implies \(a \leq c\)
  2. If \(a,b \in S\) and \(a \leq b\) and \(b \leq a\) then \(a=b\)
  3. If \(a,b\in S\) then \(a \leq b\) or \(b \leq a\) (or both)
So clearly \(\mathbb{N}\), the set of natural numbers, is ordered, with \(\leq\) being the usual less than or equal to.
The same is true for \(\mathbb{Q}\) and \(\mathbb{R}\), the sets of rational and real numbers.
It's natural (and sensible) to say that two sets \(A,B\) are the same size if they can be matched up so that each element of one set is matched to a different one of the other: such a matching is a bijective function \(f:A \to B\).
But with ordered sets we have another ingredient: we say that two ordered sets \(A\) and \(B\) are similar, written \(A \sim B\), if there is a bijective function \(f:A \to B\) which preserves the ordering, so for any \(a_1,a_2 \in A\), \(f(a_1) \leq f(a_2)\) is equivalent to \(a_1 \leq a_2\).
It's not hard to show that there's bijection between the natural numbers and the non-negative rationals. It's a bit harder to see that the two sets aren't similar, but if you notice that if you choose any positive rational number, there are infinitely many smaller ones, but there are only finitely many natural numbers smaller than any given natural number, then you can see it's pretty hopeless to try to find a similarity.
In fact, the natural numbers have another important property. In particular, they have the property that any set of natural numbers contains a smallest element. An ordered set with this property is called a well-ordered set
Neither the set of rational numbers nor the set of real numbers is well-ordered (with the usual ordering): the set of all rational or reals greater than \(0\) has no least element. So this is a bit special.
Well-ordered sets are pretty special: they behave a lot like numbers.
But we have to do a little bit or work here.
First, if \(A\) is a well-ordered set, and \(a \in A\), then \(i(a)\) is the set of all elements of \(A\) less than \((a)\), and is called the initial segment of \(a\). (It's common to use \(s(a)\) for the initial segment of \(a\), but I'm already using that for the successor.)
For example, in \(\mathbb{N}\), the intial segment of \(3\) is \(\{0,1,2\}\).
This is actually a bit of a mind-expander. In the von Neumann numbers, \(3\) is, by definition, \(\{0,1,2\}\), so \(i(3)=3\), and in general, for any \(n \in \mathbb{N}\), we have \(i(n)=n\). You might want to take a break and walk around the block, or do some pressups, or something, until that settles in. I know it took me a while to get used to the idea.
We think of two similar well-ordered sets as being ''really the same''; they are representations of the same ordinal number. (This is what we mean by ordinal number, but I may sometimes forget myself and talk about a particular ordered set as an ordinal number, rather than as a representation of on.)
For example, the sets of integers \(3=\{0,1,2\}\), \(\{1,2,3\}\) and \(\{2,4,7\}\) are all similar, and we can say that each represents the ordinal \(\pmb{3}\).
Now comes the miracle.
If \(A\) and \(B\) are well-ordered sets, then exactly one of the following is true:
  1. \(A\) is similar to an initial segment of \(B\)
  2. \(B\) is similar to an initial segment of \(A\)
  3. \(A\) is similar to \(B\).
It then turns out that this notion is similar to all of or to an initial segment of can be thought of as \(\le\), and ordinal numbers are themselves well-ordered. (Though you have to be a little bit careful about this for rather technical reasons.)
Just as with the usual numbers, we say \(A \lt B\) if \(A \leq B\), and \(A \neq B\), which is saying that \(A\) is an initial segment of \(B\).
And now we can start to do something with this machinery. We'll think a bit about how these ordinals extend the von Neumann numbers.

The first infinite number, \(\omega\)

So first of all, every von Neumann number is a well-ordered set, so represents an ordinal. But (and this is the point) there are more. In particular, the set of all natural numbers is also well-ordered, and so represents an ordinal. We call this ordinal \(\omega\).
What can we say about \(\omega\)?
The first thing is, every von Neumann number is less than it, because every von Neumann number is an initial segment.
Second, it's the smallest ordinal which is bigger than all the von Neumann numbers. For suppose that \(\alpha\) is also an ordinal which is bigger than all the von Neumann numbers, and \(\alpha \leq \omega\). Then \(\alpha\) must be an initial segment of \(\omega\); but the initial segments of \(\omega\) are just the von Neumann numbers themselves and \(\omega\). If \(\alpha\) is a von Neumann number, then it is not bigger than all the von Neumann numbers. The only alternative is that \(\alpha=\omega\).
So we can see that \(\omega\) is the smallest infinite ordinal.
There are lots of different representations of \(\omega\): here are a few: \[ \begin{split} &\{1,2,3,\ldots\}\\ &\{1,2,3,4,\ldots\}\\ &\{1,3,5,7,\ldots\}\\ &\{0,2,4,6,\ldots\} \end{split} \] Each of these is similar to \(\mathbb{N}\), and so is a representative of \(\omega\).
Although \(\omega\) is not itself an von Neumann number, it is very like one. In fact, ordinals are sufficiently like von Neumann numbers that we can do a lot of the same stuff as we can with the natural numbers, and in particular we can do arithmetic.
Before going on to look at addition and multiplication in general, let's just think about the successor, i.e what we get when we add \(1\) to a number.
With the finite (von Neumann) numbers, we define \[ n+1 = s(n) = \{0,1,2,\ldots, n\} \] and there's nothing to stop is doing the same with \(\omega\): \[ s(\omega) = \omega+1 = \{0,1,2,3,\ldots,\omega\} \] and we have to note that in this ordered set, \(\omega\) has infinitely many predecessors, but no immediate predecessor.
And of course, we can do this as often as we like: \[ \omega+2 = \{0,1,2,3,\ldots,\omega, \omega+1\} \] We can even build up the entire collection \[ \{0,1,2,\ldots,\omega,\omega+1,\omega+2,\ldots\} \] which it seems sensible to think of as \(\omega+\omega\), which is the result of doubling \(\omega\).
This all suggests that we might be able to add and multiply ordinals, if we paid enough attention to how to generalise the procedure for for numbers. And we can (or I probably wouldn't have raised the issue). But we have to be careful: there are a few surprised in store.

Arithmetic with ordinals

Let's start off with


It would be nice to use the same definition as with the von Neumann numbers, but that (as it stands) relies on each number other than \(0\) have a predecessor, and we don't have that any more.
We have to generalise the idea somehow. There is more than one way to do this, and I'm just going to look at the one that I find easiest to understand.
So, if \(\alpha\) and \(\beta\) are ordinals, maybe finite, maybe note, then here's one way we can define their sum.
We take well-ordered sets \(A\) and \(B\) corresponding to \(\alpha\) and \(\beta\), and define their ordered union to be the ordered set \(A ; B\) which contains all the elements of \(A\) followed by all the elements of \(B\). This ordered set is (represents) \(\alpha+\beta\).
It's important to note here than when I write these out, it's the ordering given by the listing of the elements that matters, so \[ 3+4=\{0,1,2\} ; \{0,1,2,3\} = \{0,1,2,0,1,2,3\} \sim \{0,1,2,3,4,5,6\} = 7 \]
It's not hard to convince yourself that this gives the right answer when \(\alpha\) and \(\beta\) are ordinary finite numbers, so this does generalise the usual addition. There's the slight technicality that we should check that this ordered union produces a well-ordered set: but it does.
You can also see straight from the definition that if \(\alpha,\beta\) and \(\gamma\) are ordinals, then \[ (\alpha+\beta)+\gamma = \alpha+(\beta+\gamma) \] so addition is associative.
Now comes the first surprise: addition of ordinals is not in general commutative. We can see this in the simplest possible case, comparing \(1+\omega\) with \(\omega+1\).
In the first case, since \(1=\{0\}\), we have \[ 1+\omega = \{0,0,1,\ldots\} \sim \{0,1,\dots\} = \omega \] Where the first \(0\) in there comes from \(1\), and precedes the second one, which comes from \(\omega\) (and I'm being a little sloppy by thinking of a particular well-ordered set as the actual ordinal, rather than a representative for it).
On the other hand, \[ \omega+1 = \{0,1,2,\ldots,0\} \sim \{0,1,2,\ldots,\omega\} \] which is definitely not similar to \(\omega\). In fact, since \(\omega\) is an initial segment of this, we know that \(\omega \lt \omega+1\).
And from this definition, we see straight from the definition that \[ \omega+\omega = \{\omega,\omega\}=\{0,1,2,\ldots,0,1,2,\ldots\} \] in agreement with the guess up above.
So, we have a definition for the sum of two ordinals which agrees with what we know for finite ordinals, and still works for infinite ones. But, as often happens when we generalise a good notion, some of the properties are lost - in this case, commutativity.
So, what about multiplication? This can also be done.


As before, we have ordinals \(\alpha\) and \(\beta\), with corresponding well-ordered sets \(A\) and \(B\). This time, instead of an ordered union, we define an ordered Cartesian product.
We have \[ A \times B = \{(a,b) : a \in A, b \in B\} \] and we say that \((a_1,b_1) \lt (a_2,b_2)\) if
  1. \(b_1 \lt b_2\), or
  2. \(b_1=b_2\) and \(a_1 \lt a_2\)
This gives us a copy of \(A\) for each element of \(B\), and the copies are ordered by the ordering of the elements of \(B\).
As you might hope, multiplying by \(1\) leaves ordinals unchanged: if \(\alpha\) is any ordinal, then \[ 1 \times \alpha = \alpha \times 1 = \alpha. \]
As with addition, the multiplication of ordinals is associative, though it's a little harder to see than with addition.
But, also as with addition, commutativity is lost. And as before, it is the simplest example imaginable (well, imaginable to me) which shows us this.
First, we have \[ \begin{split} 2 \times \omega &= \{0,1\} \times \{0,1,2\ldots\} \\ &= \{(0,0),(1,0),(0,1),(1,1), \ldots\}\\ & \sim \{0,1,2,\ldots\}\\ &=\omega \end{split} \] But then \[ \begin{split} \omega \times 2 &= \{0,1,2,\ldots\} \times \{0,1\}\\ &= \{(0,0),(0,1),\ldots, (1,0),(1,1),\ldots\}\\ &\sim \{\omega,\omega\}\\ &=\omega+\omega \end{split} \] So again, we have lost commutativity.
The other property we might hope for is distributivity. Is this preserved? Well, yes and no.
If \(\alpha,\beta\) and \(\gamma\) are ordinals, then (pretty much straight from the definition) \[ \alpha \times (\beta+\gamma) = \alpha \times \beta + \alpha \times \gamma. \]
But on the other hand, \[ \omega = 2 \times \omega = (1+1) \times \omega = 1 \times \omega + 1 \times \omega \neq \omega \] so in general \[ (\alpha+\beta) \times \gamma \neq \alpha \times \gamma + \alpha \times \beta \] So the ordinals have an arithmetic that extends the arithmetic of finite numbers, but we do have to be a bit more careful.

Is there more?

There's more.
For example, you can define a notion of exponentiation, and you can build immense towers of ordinal numbers, each larger than everything that comes before it. There is an inductive principle, called the principle of transfinite induction, which gives a powerful tool for proving theorems involving ordinal numbers.
The behaviour of these objects is rich and fascination. For example, one of my favourite results is that if \(\alpha\) is any ordinal, then any sequence of (strictly) decreasing ordinals starting at \(\alpha\) must reach \(0\) after finitely many terms.
But this has already been more than long enough.
A fairly technical presentation is given by the wiki page. If that's not to your taste, then Rudy Rucker (@rudytheelder) has a novel, White Light, exploring aspects of infinity.

Saturday, July 21, 2018

What is school mathematics for?

What is the purpose of school mathematics? I have opinions… Here I shall mostly be ranting about what I think school mathematics ought to do.

Note that when I talk about what schools actually do, I'm referring to the English educational system: YMMV.

So, on with the show.

To a first approximation, I think that school mathematics should provide three things.

Basic skills

First off, everybody should leave school numerate. They should be arithmetically competent, and be able to cope with percentages, ratios, proportions and the like. They should also have a reasonably well-developed number sense, and be able to make plausible estimates and approximations.

For example, the total cost (say) of a shopping basket shouldn't come as a complete surprise, nor should the cost per head of population of something that costs the country a billion pounds, nor the fraction of one's own disposable income that that amounts to. And how to work out whether a 500g box of cereal or a 750g box is the better deal should not be a mystery (or rely on the shelf label helpfully giving the price per 100g).

This is the kind of stuff that the mythical general public probably think of as mathematics. But it's not the only stuff that you comes in handy in 'real life'.

We also all need some basic statistical understanding, at least we do if we aren't going to have the wool pulled over our eyes all the time. By this I mean understanding data and its presentation, what can be reliably inferred from it, and the nature of uncertainty in what is reported (or predicted!).

This is, I think, a bare minimum of the mathematics that you need to cope with everyday life (without being taken advantage of) and take meaningful part in a democracy (so that your decisions have a fighting chance of being well-informed).

I leave you to decide for yourself whether this utilitarian need is met. (And if it's as much of a Bad Thing as I think it is for it not to be met.)

Intellectual development

Outside that, mathematics is there to provide you with intellectual development.

The purpose of this material is not to provide every pupil with a bag of mathematical techniques which they will employ in day-to-day life or in their jobs. Attempt to persuade them of a utilitarian value are doomed to failure, because they all know adults. The purpose of this material is the same as exposing them to great literature or art; it's to expand their mental horizons.

This shouldn't consist (only) of methods and algorithms to be learned, but try to provide an appreciation of the structure and process of mathematics. Mathematics is, after all, one of the subjects where you don't just have to take somebody else's word for it. You can give a satisfying argument about why something is the case. There's a huge amount here that can be explored with no more than numbers and basic algebra.

For example, nobody should leave school knowing that 'a minus times a minus is a plus' without also knowing that this isn't just an arbitrary rule, that it follows from some more fundamental choices, and that those choices themselves are made for a good reason.

Currently, there is a lot of material in the compulsory maths curriculum for school pupils up to the age of 16. There's a lot of learning to do, and a lot of skill to be acquired. And there's a lot of teachers working extremely hard to help their pupils acquire the knowledge and develop the skills.

I think that the curriculum is drastically overloaded, and encourages the wrong (I did say I have opinions) kind of learning: a procedural, or instrumental, skill based learning which is quicker to achieve, but doesn't provide the deeper understanding that an appreciation of the mathematics requires.

I'd much rather see a much smaller curriculum, with time for a deeper understanding of fewer topics to be developed.

You can even show how if gives an enriched understanding of the world around us.

"How much shorter is a short cut corner to corner compared to going along the pavement round the edge of a rectangular lawn?" is an interesting question, and nobody is pretending that a builder or a plumber needs to use Pythaogoras in everyday life to work out how long a pipe or a rafter has to be.

Note that this isn't going to rob anybody of job opportunities. The requirement to have some arbitrary number of GCSE's (including English and mathematics) at some arbitrary level of achievement isn't selecting the people with the required level of mathematics to be able to do a job—though there may be some hope that it selects out those who can't read, write or count.

Preparation for further study

And of course, we also have to prepare those who might go on to study maths at higher level. What we should be providing here is a basis on which undergraduate study can be built. A good foundation for the more advanced and abstract material covered in a degree course.

The current preparation expected (and required) by Universities is A-level, which includes a large chunk of calculus. I feel in a better position to comment on this, because every year I see a lot of students who have been moderately to very successful at A-level mathematics. As the years have passed, I've become more convinced that this does not provide a particularly good foundation for what we do at university.

Again, the problem seems to be one of an overstuffed curriculum. I see a lot of students who can sort of do a quite a lot of stuff. But a large proportion of them seem to have, again, an entirely instrumental learning: when I see this, I do that.

In fact, some of them by this stage have a fairly well-developed conviction that that is what mathematics is: it's a bag of procedures and algorithms which you learn to apply to the problems that you're been taught how to solve using them. These get downright panicky, and in some cases even resentful, at the idea of having to explain why some mathematical fact is true, or why some procedure works.

Unfortunately for them, there tends to be much more emphasis on why things are true or work in undergraduate maths than they are used to.

So, back to my previous point. They should be arriving with a basis on which further undergraduate study can be developed.

I don't think this is well-served by having a lot of stuff which so many students have learned to do without knowing why it works.

I think it would be better served by having a reduced curriculum, but with a deeper understanding of the material. I'd much rather see students arrive with a strong understanding of algebra, and a decent idea of what constitutes a proof, than see them arrive with an extensive bag of tricks which they don't really understand. OK, it would be even better if they had the deeper understanding of all the stuff they currently meet. But it takes time and effort to develop that understanding. You can't have both.

Yes, that would mean that we have to start calculus from scratch at university. But it's pretty standard to do that anyway (if quite fast) to try to give the deeper understanding that we want; I'm not sure that it would make a big difference to what we do in the first year.

And even if it did, that might not be a terribly Bad Thing.

There's a whole other discussion to be had about what the content of a mathematics degree ought to be, but one point on which we could probably all agree is that very little of the content of a mathematics degree is of great relevance to one's life after graduation. For those who do use mathematics subsequently, whether commercially or academically, the most important thing is likely to be the ability to learn new mathematics efficiently, and to understand what's going on.

And I'd be happy to maintain that this is better done by developing understanding, if necessary at the expense of some extent of coverage.


  • I think I'm pretty safe in these opinions. I might be entirely wrong, but nobody will ever do the experiment (i.e. completely overhaul mathematics education) to find out.
  • For a nice summary of what I called instrumental and deeper learning, there is Richard Skemp's article

    Relational Understanding and Instrumental Understanding, Mathematics Teaching 77 20–26, (1976).

    available here, courtesy of @republicofmath, and for a much more extensive discussion, his book The psychology of learning mathematics

Wednesday, June 27, 2018

Why is subtraction so hard?

Subtraction presents various problems to learners of mathematics, not least with the mechanics of hand calculating the result of subtracting one multi-digit number from another. But that's not the kind of difficulty I'm interested in: I'm more interested here in the difficulties that arise when computations involving several additions and subtractions of integers are involved, or at the slightly more advanced level where algebraic computations involving several additions and subtractions are required. I'm going to take a look at what I think lies underneath the difficulty.

The procedure required to deal with subtraction starts off deceptively simple.

Addition and subtraction are of the same priority, and so a calculation involving a string of additions and subtractions is calculated from right to left: \[ 1-2+3-4-5=-1+3-4-5=2-4-5=-2-5=-7 \] Slightly more complicated, we may have things inside parentheses which consequently have to be evaluated before the result is combined with the rest: \[ 2-(3-4) = 2-(-1) = 3 \] But already we find lots of mistakes being made. It isn't uncommon for the first calculation to turn into some variation on \[ 1-2+3-4-5=1-2+3-(-1)=-1+2=1 \] or for the second to go down the route of \[ 2-(3-4)=2-3-4=-1-4=-5 \]

The root of the problem

In one sense, the root of the problem is combinatorial. There are lots of different ways of combining the various numbers in the calculations, but they don't all give the same answer. There are more ways of getting the wrong answer than the right one.

But that doesn't really get to the bottom of it. Why do learners combine things in the wrong way when they've been given careful explanation and instruction of the right way? (And it's not just novices: errors along these lines persist with undergraduates!)

I think that the real problem is that subtraction is a horrible operation, in many ways.

The horrors of subtraction

  • First, we learn to add together positive integers, or natural numbers. The result of adding together two natural numbers is always a natural number: addition is a well defined binary operation on natural numbers.

    But subtraction isn't.

    When we subtract one natural number from another, the result can be negative. Suddenly a whole new can of worms is opened up, because we need to deal with the arithmetic of negative numbers, which, as we all know, is a paradise of opportunity for sign errors.

  • Next, subtraction isn't commutative. This doesn't seem to be subtle, but let's take a quick look: \[ 2-1=1 \qquad \text{ but } \qquad 1-2 =-1 \] The answers are different, but only by being of opposite signs; and to rub salt in the wound, the second may well be done by subtracting \(1\) from \(2\) and changing the sign. Losing a minus sign is easy for all of us.
  • Even more horrible, subtraction is not associative: \[ (1-2)-3 \neq 1-(2-3) \]

    But we get very used to using associativity and commutativity without even noticing it when adding. It's much harder to remember not to do it when it's inappropriate, once you've become thoroughly accustomed to doing it.

  • And lastly, to cope with the problem we end up with a plethora of rules about combinations of signs. More to remember, often just regarded as more rules to learn. I cannot express in words the horror I feel when I hear students muttering (or even writing out) "FOIL" when doing algebraic calculations; I haven't yet heard "KFC" from my students when working with fractions, but I've seen it on the interwebs often enough to know it's only a matter of time.
Now, there are many rules in algebra and arithmetic, but a relatively small number of principles (axioms) from which they all follow.

Unfortunately, you can't really teach arithmetic to five-year-olds by starting off with the axioms of a field.

But maybe there comes a time to revisit the collection of rules and see where they come from. At least, I always find it illuminating to go back to elementary stuff and see what new insight I might have based on further study.

Avoiding the issue

So here's my approach to a better (I have many opinions, and very few of them are humble) view of the arithmetic and algebra of subtraction.

Uninvent subtraction.

OK, so I haven't gone back in time and relived my education without the notion. What I mean is that I look back and say that after all this time I now see that there's a better way to think of subtraction.

First, any number has \(a\) an additive inverse \(-a\): a number we add to it to get \(0\). Then if \(a\) and \(b\) are two numbers, I say that \[ a-b \] is a (somewhat undesirable, but unavoidable) notation for \[ a+(-b) \] which is the sum of \(a\) and \(-b\), the additive inverse of \(b\).

And at this point I really, really wish that there were a conventional typographical distinction between \(-\), the binary subtraction operator, and \(-\), the unary additive inverse operator. Oh well.

This has the immediate consequence that we can write our sums slightly differently: \[ 1-2+3-4-5 = 1+ (-2) + 3 +(-4) + (-5) \] and these quantities can now be rearranged at will, since addition is associative and commutative.

In the second example, we have \[ 2-(3-4) = 2 + (-(3-4)) \] And what do we add to \(3-4\) to get \(0\)? we add \(4-3=1\), so \(-(3-4)=1\) and \[ 2-(3-4) = 2 + 1 = 3 \]

Of course, it's still necessary to be able to add together combinations of positive and negative numbers: there is no free lunch here. But it's a way of thinking about the computation that reduces the temptation/opportunity to make mistakes, so maybe it's a slightly cheaper lunch.

One consequence is that if I see \[ x+5=7 \] I don't think 'subtract \(5\) from each side', I think 'add \(-5\) to each side'.

I find it a useful way to think about what's going on.

I try to stress this with my students, but with mixed success. And just about everybody who's teaching early undergraduate material will surely be doing something like this in the students' first encounter with abstract algebra and axiomatic systems.

My general impression is that as long as they're doing a set problem of the 'show from the axioms of a field' type, most students can be persuaded to work this way.

But I find that as soon as that context is gone and they're back in the old familiar territory of doing algebra or arithmetic, for most it's also back to the old familiar way of proceeding. And for quite a few this has just too many opportunities for the old familiar way of going wrong.

But also

I have very similar thoughts about division, the reconstruction of which I leave as an exercise for the sufficiently motivated reader.

Tuesday, May 29, 2018

What does it mean to say that the universe is expanding?

Q: The universe is expanding, so what is it expanding into?
A: The question is wrong.
OK, that answer could use some expansion of its own.

Newtonian Cosmology

Here's one version of the story.
Careful measurements (and careful interpretation of the measurements) tell us that the stuff in the universe is getting further apart. What's more, if we look around us, then to a good degree of approximation, everything is travelling away from us, and the farther away it is, the faster it's getting away from us. In fact, the speed at which distant galaxies are receding from us is proportional to how far away they are. We know this because of red shift in the spectra of these galaxies, from which we can calculate how fast they are travelling in order for a Doppler shift to cause that much red shift.
This seems to suggest that we are (surprisingly or not will depend on your religious and philosophical framework) at the centre of the universe; it is perhaps less flattering that everything is trying to get away from us.
Let's start off by thinking about how special our position is. We think of the universe as being full of a homogeneous dust (for scale, the dust particles are clusters of galaxies). In the simplest model, we are at the origin, and at time \(t\) the the dust particle at position \(\pmb{x}\) has velocity \(\pmb{v}=H(t) \pmb{x}\). Then \(H(t)\) is the scale factor which relates distance to velocity of recession, and we call it \(H\) the Hubble parameter, to honour Hubble who made the initial observations.
But what about somebody who isn't at the origin? What about somebody living in a different dust particle, far, far away, say at position \(\pmb{x}'\)? What would they see?
Let's use \(\pmb{v}'\) to denote the velocity as seen by this observer at position \(\pmb{x}'\). We have to subtract their velocity from the velocity we see at each point to get the velocity relative to them at each point. Then we get \[ \pmb{v}' = H(t) \pmb{x} - H(t) \pmb{x}' = H(t)(\pmb{x}-\pmb{x}') \] But \(\pmb{x}-\pmb{x}'\) is just the position as seen by the observer at \(\pmb{x}'\); this other observer also sees everything in the universe recede at a rate proportional to distance, and it's even the same constant of proportionality. Rather suprprisingly, this suggests that we aren't anywhere special, and that the cosmological principle (or principle of mediocrity) that we are nowhere special is compatible with the observations.
A concrete approach to this that may be useful is the currant cake, or raisin bread analogy. If you think of the dust particles as the items of fruit in the cake, then as it cooks and the cake rises, the currants (or raisins) get farther apart, in just his way -at least, if we assume that the rises process amounts to the whole cake scaling up in a uniform way.
But this model is a bit misleading, as there is an edge to the cake, and the cake is expanding into the surrounding oven. This isn't so for the dust-filled universe.
It's important to notice is that the universe is infinite - it's the whole of three dimensional Euclidean space - forever. And it's always full of this cosmological dust: but if we go back in time, we see that the dust particles are getting closer together, so the dust must be getting denser, and if we go forward in time, it is getting more rarefied. But there's no 'edge' to the material, no expanding of anything, even though the dust particles are getting further apart. This is one of the joys of having infinite space: it can be full of stuff, but there's nothing to stop the contents spreading out.
There's a slight problem when we try to push back to \(t=0\), when the density would have to be infinite: but I avoid that by saying that I have a model for a universe that seems compatible with Hubble's observations for all \(t \gt 0\), and I've no idea what kind of Big Bang happened at \(t=0\); that's not part of my model. There's certainly no sense in which the universe started off as a single 'infinitely dense point' and exploded outward.
So, great news! As long as the universe just happens to be full of stuff exploding outwards, we have a rather neat mathematical model compatible with the observations, and all we need is some physics compatible with this model to make it plausible.
Aye, there's the rub.

Problem One

The first problem is a kinematic one. No matter how small \(H(t)\) is, objects sufficiently far away have to be moving faster than light. This is definitely hard to reconcile with special relativity, and special relativity is very well confirmed these days, so don't just want to abandon it and go back to Newtonian mechanics. This cosmological model really is only good for Newtonian mechanics.
We could try to fix it by slowing down the expansion sufficiently far away, so that Hubble's law is only valid over a certain scale: but that gets hard to reconcile with the cosmological principle.
Or we might try to fiddle with special relativity by arguing that it's great at small scales, but needs correction at larger ones.

Problem Two

But even if we only look for a Newtonian model, there is still a serious problem.
We are at the origin of the universe, and we see everything receding from us. But as thing get farther away, they speed up: everything is not only getting farther apart, but is acceleration. So our distant observer, living in a dust particle far, far away, is accelerating. That's an issue because it means that we are, after all, in a special place. Though all observers see a universe expanding in the same way, only we don't feel a force pushing on us to make us accelerate.
This is actually two problems.
There's the actual physical issue of just what is supposed to be doing the pushing. We need a pretty weird model of gravity to be compatible with gravity being attractive (so as to hold solar systems and galaxies together) at one scale, but repulsive (so as to make stuff accelerate apart at the bigger scale).
And there's the philosophical issue of what makes us so special: in this model we are, once again, in the actual centre of the universe.
So, can we find another way of modelling the universe which retains the good property of satisfying Hubble's law, but doesn't have either of the problems of this repulsive force which only becomes significant at large distances, or puts us in a privileged position?

Relativistic cosmology

Well of course we can. As usual, that's the sort of question that only gets asked if there is a good answer available.
But it involves a radically new way of understanding how the universe fits together.

Newtonian space-time

In the Newtonian picture of space-time, with the origin at the centre of expansion, we have spatial position, represented by a position vector \(\pmb{x}\) and time, represented by \(t\). If we have two events \(A\) and \(B\), which happen respective at position \(\pmb{x}_A\) and time \(t_A\), and at position \(\pmb{x}_B\) and time \(t_B\) then the distance between them is the usual Euclidean distance \(\|\pmb{x}_A -\pmb{x}_B\|\), and the time between them is \(|t_A-t_B|\).
This is all very persuasive, and seems to match well to experience.
But it isn't quite right, in ways which don't show up until things are moving fairly fast, at which point we have to worry about special relativity.
Fortunately, there's another way to think about distance and time, which is both compatible with special relativity (in the sense that it is very well approximated by special relativity over small regions) and with the observations of red shift (though for a somewhat subtler reason than you might expect).

Space-time and the metric

In this picture, we don't have separate notions of spatial and temporal distance, but a unified notion. Just as before, every point has a time coordinate \(t\) and space coordinates \(\pmb{x}\). But now we write \[ ds^2 = c^2 dt^2 - a(t)^2 dx^2 \] which is a way of packaging up a combination of difference between the (time and space) coordinates of two nearby events: \(dx^2\) is a shorthand for the sum of the square of the spatial coordinate differences.
So what does this new thing tell us?
If two events happen in the same place, then the time between them is \(ds/c\). If they happen at the same time, then the physical distance between them is \(a(t)\) times the coordinate distance, which is the square root of the sum of the squares of the coordinate differences (i.e. the usual Pythagoras' theorem thing). This really is the physical distance, as in it counts how many meter sticks (or yard sticks if you're old) you'd have to use to join them up.
It's worth noting that over a small enough region of space-time, this is indistinguishable from special relativity: so we really are doing what was hinted at above, by saying that special relativity (great though it is over small regions) might need to be adjusted to deal with large regions.
\(a(t)\) is called the scale factor, and it relates the coordinate distance between two simultaneous events to the spatial distance. It's a function of time, so if we have two objects, both of whose positions are fixed (i.e. their spatial coordinates are unchanging), the distance between them is nevertheless changing as time passes. This isn't because they're moving apart (or together, for that matter), but because the notion of distance is now something that depends on time.
Let's take a look at this: suppose we have two objects, at fixed spatial locations, so the coordinate distance between them, \(d_C\) is unchanging. The physical distance between then is \(d_P = a(t) d_C\). Now we can think about how fast \(d_P\) is changing. We have \[ \dot{d}_P = \dot{a}(t) \times d_C = \frac{\dot{a}(t)}{a(t)} \times a(t) d_C = \frac{\dot{a}(t)}{a(t)} \times d_P. \] So the rate of change of the distance between these fixed objects is proportional to the distance between them, and the Hubble parameter is \(\dot{a}/a\).
When \(a(t)\) is increasing, we loosely (and maybe misleadingly) call this expansion of space, or even worse, expansion of the universe.
OK, we have Hubble's law again, in the sense that the dust particles are all moving apart with a rate proportional to separation. And again, the universe is always full of these dust particles. But now the particles are stationary (meaning that their spatial coordinates don't change) and geometry is evolving so that the distance between them is changing.
But now we get more.
We also find that particles at fixed coordinates are not accelerating, so no force is required to get this effect of everybody seeing all the material in the universe obeying Hubble's law.

Red shift and recession

But there's something fishy about all this.
The original observation that gives rise to Hubble's law is obtained from measurements of red shift, interpreted as relative velocity. What does all that mean if everything is 'staying still' while 'geometry evolves'?
At this point we have to take into account that when we look at things a long way away, we are seeing them as they were a long time ago. In fact, we're seeing them when the scale factor was a bit different.
Amazingly, there's a geometric effect which exactly (well, almost exactly) fits the required bill.
It turns out that if a light signal is emitted at time \(t_e\) and received at time \(t_r\), then the signal is red-shifted; the ratio of the wavelength of the emitted signal to that of the received one is the ratio of the scale factor at these two times. This is called cosmological red shift and is an entirely geometric effect.
How does this fit in with Hubble's observation, though?
If you consider what an observer sees when looking out at a distance object, the cosmological redshift matches (to a very high degree of approximation) the redshift that you would get if there were no cosmological redshift, and the redshift was due to a Doppler effect from a velocity that's just the same as the rate of change of physical distance.
This is even more amazing than the fact that Hubble's law pops out.
Actually, this isn't exactly true. It is, though, true to a first degree of approximation, and it's very hard to measure distances and redshifts accurately enough to see any deviation. More importantly, it doesn't affect the result that the dependence of rate of recession (really redshift) on distance is the same no matter where in the universe you live.

So how does the scale factor evolve?

Up at the top, I was unhappy with the Newtonian approach because it had some mysterious field of force pushing stuff apart. I seem to have just replaced that with a mysterious scale factor that does the same job.
Not so. In the same way as Newtonian gravity tells us how massive bodies pull on each other, Einsteinian gravity has an equation of motion that tell us how \(a(t)\) behaves.
In Einsteinian gravity, there is the metric (and we've restricted attention to a particular class of metric already) and there is a field equation (the Einstein Field Equation, or EFE) which relates how the metric varies in space and time to the matter filling the universe. In order to work out how \(a(t)\) evolves, we need to have a model for the material that fills the universe.
This model comes in two parts.
First, there's the general model. The simplest thing that has a chance of working is a perfect fluid: so we're considering galactic clusters as parcels of a compressible fluid which has negligible viscosity.
Then, there's the specific model. A fluid has a density and a pressure, and a functional relation between the two is called an equation of state. There are standard equations of state for describing a universe full of easily compressible dust, or full of electromagnetic radiation.
Once you've chosen an equation of state, you can work out what the EFE tells us: and this is a differential equation for \(a(t)\), which can then be solved. One part of constructing a good cosmological model is deciding just what equation of state to use for your model.
Let's not go there.
The point is that \(a(t)\) isn't arbitrary; it comes out of the physical model.
In fact, all this theory was worked out by Friedmann decades before Hubble's observational work.
It's rare in science for somebody to manage to predict something before it actually happens, so stop for a moment to be impressed.

What have we got for our effort?

By investing some effort in having a model of space-time in which the geometry itself is dynamic, we get a universe in which Hubble's law is satisfied, but most of the ingredients are now interpreted in a new way. In particular, the recession of distant galaxies is no longer due to them moving, but now due to the fact that the definition of distance itself depends on time in such a way that the distance between stationary objects can be changing in time.

Have I missed anything out?

Oh boy, have I ever.
I've missed out
  • all the controversy in the interpretation of distance/velocity measurements.
  • just about all the calculations that justify the claims I make. Nothing new there.
  • what the Einstein Field Equations say. That's a big topic just in its own right, and it's a bit technical.
  • what the matter model of a perfect fluid actually looks like.
  • any discussion of the curvature of space. (I've only considered the simplest case.)
  • and much, much more.
Fortunately, there's lots of excellent material out there. If all I've done is whet your appetite for more, I'm happy.

Friday, May 18, 2018

It's all the same to me.

Mathematics is the subject, probably more than any other, that we associate with precision. And yet a particularly powerful tool in mathematics consists of a fruitful failure to distinguish between objects.
The basic idea is that we have a set of objects, but rather than thinking of them all as separate, we partition the set up into a collection of subsets which, between them, cover the entire set, and no two of which have any element in common. These subsets are called equivalence classes, and an element of an equivalence class is a representative. The set of equivalence classes itself is called a quotient space.
And most of the time, it's the quotient space that's the interesting thing, not the equivalence classes themselves. It can be all too easy to lose sight of this if one is buried in the detail of showing that a given relation between elements of a set does this job of partitioning into equivalence classes, and so is an equivalence relation.

Some Examples

A simple example is to partition up positive integers, as normally expressed in base ten, into classes of length; 1 digit numbers, 2 digit number, 3 digit numbers, and so on.
But this isn't very interesting. There's not much useful structure in this quotient space for us to play with. For example, if we add together two 1 digit numbers, sometimes the answer is 1 digit long, and sometimes it is 2 digits long.
On the other hand there are others, just as familiar, but more interesting.


Partition up integers (all of them this time, not just positive ones) into odd and even. This time things are very different. The quotient space consists of just two classes, which we can call ODD and EVEN. If we add together two odd numbers, the answer is even, whatever odd numbers we start with. Similarly if we add an odd and an even, or multiply.
The key here is that we do the arithmetic on two representatives, but no matter which representatives we pick, the answer is in the same equivalence class. This means that the quotient space \(\{\textrm{ODD},\textrm{EVEN}\}\) inherits operations of addition and multiplication from the set of all integers, \(\mathbb{Z}\).
There's another example, just as familiar, but the source of well-known problems.


Two pairs of integers, \((m,n)\) and \((M,N)\) (where \(n\) and \(N\) are non-zero) are equivalent if \(mN=Mn\). This partitions the set of all pairs of integers (where the second integer is not zero) in the same way as being odd or even partitions the set of all integers. Secretly I'm thinking of the first pair as the fraction \(\frac{m}{n}\) and the second pair as the fraction\(\frac{M}{N}\).
This relationship - this equivalence relation - captures exactly the notion that the ratio \(m:n\) is the same as the ratio \(M:N\), in other words that that the two fractions represent the same rational number. We can think of each equivalence class as a rational number, and a pair of integers, i.e. a fraction, in the equivalence class, as a representative.
Then the usual arithmetic of fractions gives a way of adding and multiplying these equivalence classes, so that the equivalence class of the result doesn't depend on the choice of representatives of the classes being added or multiplied. In other words the rules for doing arithmetic with (and simplifying) fractions gives the arithmetic of rational numbers.
I can't shake the suspicion that the core of why schoolchildren have so much difficulty getting to grips with fractions is this idea of different representatives for the same rational number. It's a subtle idea, and it takes quite a lot of getting used to.

Going further

These ideas can be extended in various ways, some more familiar than others.

Congruence arithmetic

In the case of odd and even numbers, although it isn't the way we tend to think about it, we're regarding two numbers as equivalent if their difference is a multiple of \(2\).
Of course, there's nothing special about \(2\). We could pick any number to form the equivalence classes and it would work in just the same way. If we used \(3\), then sensible representatives are \(0,1,2\); if we used \(4\) then \(0,1,2,3\), and so on. If we are being careful, we use a notation such as \([0]\) to denot the equivalence class that \(0\) lives in. To save wear and tear on our fingers and keyboards, we can rather naughtily just use these representative numbers to denote the equivalence classes.
If we obtain our equivalence classes by using \(n\), then we call the resulting quotient space \(\mathbb{Z}_n\).
Actually, there is something a bit special about \(2\). If we multiply together two elements of \(\mathbb{Z}_2\), then the result is only \(0\) if one of the elements is. This is also the case with \(\mathbb{Z}_3\).
But with \(\mathbb{Z}_4\), we see that \(2 \times 2 = 0\).
OK, so from what I've written so far, it might be \(4\) that's special.
Really, what's special - or at least, well-behaved - is \(\mathbb{Z}_p\), where \(p\) is a prime. In this case, a product can only be zero if one of the multiplicands is zero. In fact, if we work with a prime, then we get a structure that is actually better behaved than \(\mathbb{Z}\), the set of all integers. We can divide by any non-zero element, in the sense that if \(a\) is not \(0\), then there is a unique solution to \(ax=1\); and multiplying by this number does just the job of dividing by \(a\). In the jargon, \(\mathbb{Z}_p\) is a field, which basically means that has a well-behaved addition, subtraction, multiplication and division.
This results in both fun and profit: the resulting number systems are fun to play with, and form the mathematical basis for some of the contemporary forms of cryptography.

Complex Numbers

We can use the same ideas to construct the complex numbers without inventing a new kind of 'imaginary' number whose square is negative.
The starting point is to consider the set of all real polynomials, and then to consider two polynomials as equivalent if their difference is a multiple of \(x^2+1\).
Then one (and the useful) representative for any polynomial is the remainder when it is divided by \(x^2+1\), so the representatives are of the form \(a+bx\) where \(a,b\) are real numbers. But then \(x^2 = 1\times(x^2+1)-1=-1\), where I misuse '\(=\)' to mean 'is equivalent to'. Then the arithmetic of these objects looks just like normal complex arithmetic with \(i^2=-1\), except that I have an \(x\) where you might expect to see an \(i\).
To phrase that slightly differently, the complex numbers, \(\mathbb{C}\), are the quotient space obtained by regarding two polynomials as equivalent if they differ by a multiple of \(x^2+1\).
This doesn't give us any algebra we didn't have before, but it does show that there is a perfectly respectable way of constructing the complex numbers out of non-problematic objects. We don't have to postulate some new kind of number whose square is negative, and then worry about whether we have accidentally made mathematics inconsistent. Well, at least we can be sure that complex arithmetic is no more inconsistent than real arithmetic.
And just as there is nothing special about \(2\), there is nothing special about \(1+x^2\); we could use any polynomial here.
In fact, the similarity to \(\mathbb{Z}_n\) is profound.
Just as with the integers, if the polynomial we work with cannot be expressed as a product of lower degree polynomials, we can add, subtract and multiply any pair of polynomials, and also divide by any non-zero one.
This idea lets us construct a whole new collection of algebraic structures, including in particular the finite fields, which are important in aspects of statistical experiment design and error correcting code.

The wild blue yonder

It doesn't stop here. Examples of naturally occurring partitions and quotient spaces abound thoughout mathematics, in geometry, algebra and analysis.

Linear algebra

If \(V\) is a vector space, and \(S\) is a subspace of \(V\), then we can think of two vectors \(u\) and \(v\) as equivalent if \(u-v \in S\). This relationship partitions \(V\) up into subsets that look like copies of \(S\) obtained by translation away from the origin, called cosets. The quotient space is again a vector space.
This turns out to be an important construction in analysis, where \(V\) and \(S\) are spaces of functions.

Group theory

If \(G\) is a group and \(H\) is a subgroup of \(G\), the we can say that two elements \(g_1,g_2 \in G\) are equivalent if \(g_1g_2^{-1} \in H\), and again this gives us a partition of \(G\) into cosets. This time it isn't automatic that the quotient space gets a well-defined multiplication from \(G\); \(H\) has to satisfy a certain condition, called being a normal subgroup.
In the case of Abelian groups, this lets us break a big group down into a kind of tower of other groups, in a way analogous to the prime decomposition of an integer.
It is also important in Galois theory, which relates the structure of certain groups to the solubility of polynomials.


We start off with the \(x-y\) plane, and consider two points to be equivalent if their \(x\) and \(y\) coordinates differ by integer amounts. This time the quotient space is a new surface, which we can visualise as a unit square with opposite edges glued together: a familiar structure in certain video games, where leaving the screen to the left, or top, brings you back in at the right, or bottom. In fact, this quotient space is a torus.
Constructing surfaces as quotient spaces is a powerful tool in the attempt to classify all the possible surfaces. And it also suggest the question of when two surfaces should be regarded as 'really the same': another partitioning of all surfaces described in different ways into equivalence classes of 'the same surface, but a different description'.

And so on

This barely scratches the surface. The more mathematics, and the more mathematical structures, you meet, then more you come across interesting quotient spaces.
Googling for 'quotient' plus the thing you are interested in (group, surface, vector space, and many more) will probably return something cool, so don't waste any more time here, go and look.

Wednesday, May 2, 2018

Base n to mod n in one easy step.

We learn how to write numbers (well, integers) in bases other than ten fairly early in our mathematical education. A bit later, we learn how about congruence arithmetic and doing arithmetic modulo any integer \(n \gt 1\). I remember that there was a certain degree of initial confusion between the two, though that was really just about remembering which label went with what: after all, the two are entirely different, aren't they?

As it turns out, not so much.

Let's have a brief reminder of congruence arithmetic, and then see what representing numbers in different bases has to do with it.

Congruence arithmetic modulo \(n\)

The simplest and most familiar form of congruence arithmetic is something we're all familiar with long before we meet congruence arithmetic as a thing.

We all know right from a very early stage that an even number plus an even number is even, an even plus an odd is odd, and so on. But an even number is one which leaves a remainder of zero when it's divided by two, and an odd number is one which leaves a remainder of one. So these odd and even rules are telling us that when we do sums with integers, the remainder on division by two is well behaved with respect to addition and multiplication. Or in the hieratic jargon, we can do arithmetic modulo two.

It's not so obvious that this works with numbers other than two, but it's just as true.

Choose any integer \(n>1\). We say that two integers, \(a\) and \(b\) are congruent modulo \(n\) if their difference is a multiple of \(n\), or, equivalently, if they have the same remainder on division by \(n\).

Then with a bit of work, we can show that arithmetic modulo \(n\) makes sense: it doesn't matter what numbers we start with, the remainder after doing an arithmetic operation only depends on the remainders of the numbers we combine.

This takes a little bit of algebra. On the other hand...

Arithmetic modulo ten and modulo two

If we carry out the usual addition and multiplication operations of integer arithmetic, writing all numbers in base ten, then it is obvious (as long as we believe the standard algorithms!) that the units digit of a sum or product depends only on the units digits of the numbers multiplied or added.

But the units digit of a number (written in base ten) is just the remainder when you divide by ten! Looking only at the units digits is doing arithmetic modulo ten.

In just the same way, if we write our numbers in binary, then the units digit of a sum or product depends only on the units digits of the numbers combined. And the units digits is \(1\) for an odd number and \(0\) for an even one.

Looking only at the units digits is now doing arithmetic modulo two.

Arithmetic modulo \(n\) again

But there's nothing special about two or ten (at least, not in this regard).

If we choose any integer \(n \gt 1\), and write integers in base \(n\), then the units digit of a sum or product depends only on the units digits of the numbers combined.

And now, looking at the units digits is working modulo \(n\).

So although base \(n\) and modulo \(n\) look like entirely different ideas, they turn out to be rather closely related.

What's the point?

I'm not entirely sure. Thinking about last digits in base \(n\) might be an interesting way to approach congruence arithmetic, as it comes from a more concrete direction. It also suggests various investigative projects, such as when the last digit of a product can be zero, or how negative numbers fit into this picture. For me, the point was really that there is actually a real relationship between these two apparently different bits of maths, and it's always fun to see connections like that.

Monday, April 9, 2018

The two child puzzle

How careful do you have to be with probability problems? Very careful. Very, very careful. Lack of enough care leads to (apparent) paradoxes.

The puzzle

So, here's the puzzle. It's something of a classic in probability.

I tell you that John Smith has two children. I also tell you that one of them is a girl. I then ask you for the probability that the other is a girl.

Obviously, we need some ground rules here. We'll work with the slight simplification that boys and girls are equally likely, and that the probabilities of boy and girl are independent.

A plausible answer, and maybe the one I'm hoping for, is that since boys and girls are equally likely, the probability that the other is a girl is \(1/2\).

Let's pretend that was your answer, because now I get the chance to show you something interesting.

Ah-hah! I pounce.

No, you aren't thinking it through carefully enough. If you know that one child is a girl, then all you know about the children is that they aren't both boys. There are four equally likely sets of children: boy-boy, boy-girl, girl-boy and girl-girl. Since we've exclude boy-boy, then we are equally likely to have any of the remaining possibilities. In one third of them is the other child a girl, so the probability is actually \(1/3\).

I can dress this up more formally. \(A\) is the event John Smith has a daughter and \(B\) is the event John Smith has two daughters. I want \(P(B|A)\), the probability of \(B\) given \(A\). Fortunately, there's a formula for that. Since \[ P(A \cap B) = P(B|A)P(A) \] and \(P(A \cap B)=P(B)\) (since \(A \cap B = B\)), I see that \[ P(B|A) = \frac{P(B)}{P(A)} \] I know that \(P(B)=1/4\) and \(P(A)=3/4\), so (just as I said above), \(P(B|A)=1/3\).

This seems strange, and counter-intuitive. But I'm pretty sure I didn't get it wrong, because I wrote a little bit of code and simulated it: the following code randomly creates \(n\) two-child families, and finds the fraction of the time that a family with at least one daughter actually has two.

import random

def sim(n):
    for ii in range(n):   
        if sum(kids)>0:
            hasGirl += 1
        if sum(kids)==2:
Running this for moderately large \(n\) certainly suggests that the probability is \(1/3\).

How much more convincing can you get?

But then I read this book, Standard Deviations by Gary Smith. It's an excellent exposition of how statistics and data can be misused to mislead. I was thoroughly enjoying it until...

He considers this problem of the two children and demolishes the above argument.

The puzzle redux

John Smith has two children, and takes one of them for a walk. If the child that he took for a walk was a girl, what is the probability that the other child is also a girl?

"Ooh, I know this one," I thought. "It's a third."

And then I read on. Smith carefully and lucidly points one that this is wrong, and that the probability actually is \(1/2\).

Naturally I took personal offence at this, so I read it carefully a few times, trying to find his mistake.

I didn't find one.

So I did the obvious thing. I worked out the probability carefully.

There are two things I'm interested in here. \(A\) is the event The child John Smith took a walk with is a daughter and \(B\) is the event Both of John Smith's children are girls.

I want to know \(P(B|A)\). Just as before, I have \(A \cap B = B\), so that \[ P(B|A) = \frac{P(A \cap B)}{P(B)}=\frac{P(B)}{P(A)} \] But \(P(B) = 1/4\) and \(P(A)=1/2\), so \(P(B|A)=1/2\).

By now I was feeling decidedly uncomfortable, so I did the next obvious thing. I wrote a little bit of code and simulated the situation where a randomly chose child from a randomly chosen family is taken for a walk \(n\) times, and finds what proportion of the time the child taken for a walk is a girl, she has a sister at home:

import random

def walksim(n):
    for ii in range(n):
        if kids[random.randrange(2)]==1:
            girlWalk += 1
            if sum(kids)==2:
Running this was moderately large values of \(n\) convinced me that yes, indeed, the probability was \(1/2\).

By now I was thoroughly confused.

So What's Going On?

According to a quotation popularly attributed to Einstein, it's a mark of lunacy to keep asking the same question, hoping for a different answer. So naturally, I was a little concerned about just what it might mean if you managed to actually get a different answer.

So I thought about it a bit harder, and came to finally came to a (in retrospect, obvious) conclusion: they aren't actually the same puzzle.

So what's the difference? In version one, you're told that one of John Smith's children is a daughter. In version two, you're told that John Smith has taken one of his children for a walk, and that child is a daughter. Either way, we've been told that one of John Smith's children is a daughter.

But we haven't actually been given quite the same information.

Version 1 simply guarantees that at least one child is a daughter; version 2 tells us that the particular child that John Smith took for a walk is a daughter. The possibility that the child left at home is a daughter is excluded from the calculation in version 2, but included in version 1, which is why the probability that both are daughters is lower in version 1 than in version 2.

So the whole confusion boils down to what you think it means to be told that One of John Smith's children is a daughter. If you interpret it as saying no more than John Smith has at least one daughter, then the answer is \(1/3\). But if you interpret as saying that one particular child is a daughter (the older one, or the one standing on the left, or the only one who has ever been locked alone in the cupboard under the stairs for misbehaving, or, indeed, the one that he took for a walk that day) then the answer is \(1/2\).

To the extend that there's a disagreement here, it isn't an internal disagreement in probability theory (which would be a Very Bad Thing Indeed), it's a disagreement of interpretation of what One of John Smith's children is a daughter means.

The moral

As is often the case, the problem isn't the mathematical analysis: in each version that was straightforward. The problem is setting up the right mathematical model for the situation, which can be very tricky if there's any linguistic ambiguity in the description.