Tuesday, 23 April 2019

Trig functions with (almost) no triangles

In this post I'm going to look at what we can say about the trigonometric functions, \(\sin\) and \(\cos\) without actually doing any classical trigonometry, in other words, without using triangles or ratios of sides - if we know enough other maths. Note that this is not a "we should teach it this way" article it's an "isn't it great how mathematics all hangs together" article.


We start off with a little Euclidean geometry, as described in Cartesian coordinates.

Consider the unit circle in the Euclidean plane, centred on the origin, so given by \[ x^2+y^2=1 \]

and define \(\cos(t)\) and \(\sin(t)\) as the \(x\) and \(y\) coordinates of the point that you get to on this circle if you start at the point \((1,0)\) and travel a distance \(t\) anti-clockwise around the circle.

This is how I define the functions \(\cos\) and \(\sin\); at some point I should make contact with the usual definitions in terms of ratios of side lengths in right-angled triangles. But not yet.

But we can immediately notice that since I've defined these functions as the \(x\) and \(y\) coordinates of points on this circle, we know that \[ \sin^2(t)+\cos^2(t)=1 \] and since the circumference of the circle is \(2\pi\), both functions are periodic with period \(2\pi\).

I can also extend my definition to negative values of \(t\), by thinking of this as distance in the clockwise direction from the same starting point.

Then we also see immediately from the symmetry of the circle that \[ \sin(-t)=-\sin(t), \qquad \cos(-t)=\cos(t) \]


And now a tiny nod towards differential geometry.

Let's think about the circle, where I now parameterise it by the \(t\) I defined above: so we can associate with any value of \(t\), the point \[ \pmb{r}(t) = (\cos(t),\sin(t)) \] (and I commit the usual sin of identifying a point in the plane with its position vector). If I need to, I can think of

So, what can I say about \(\dot{\pmb{r}}\) (where I use the overdot to denote differentiation with respect to \(t\), in the finest Newtonian tradition).

Again, since each point lies on the unit circle, we have \[ \pmb{r}(t).\pmb{r}(t)=1 \] so, differentiating, we have \[ 2\pmb{r}(t).\dot{\pmb{r}}(t)=0 \] and the tangent to the circle at any point is (as we already know from Euclidean geometry) orthogonal to the radius to that point.

From this, we know that \[ \dot{\pmb{r}}(t) = k(-\sin(t),\cos(t)) \] for some \(k\).

We can say more, though. Since \(t\) is distance along the circle, so that for all \(t\) \[ t = \int_{0}^t \|\dot{\pmb{r}}(u) \| du, \] then we must have \[ \| \dot{\pmb{r}}(t) \| = 1. \]

Finally, we note that when \(t=0\), the tangent vector points vertically upwards. This finally leaves us with \[ \dot{\pmb{r}}(t)=(-\sin(t),\cos(t)) \] or, in other words, \[ \frac{d}{dt}\sin(t) = \cos(t), \qquad \frac{d}{dt}\cos(t)=-\sin(t). \]


Now we take a little detour through some real analysis.

For no apparent reason, I introduce two functions defined by power series, \[ \text{SIN}(t) = \sum_{n=0}^\infty (-1)^{n}\frac{t^{2n+1}}{(2n+1)!}, \qquad \text{COS}(t) = \sum_{n=0}^\infty (-1)^{n} \frac{t^{2n}}{(2n)!}. \]

We can immediately see that \(\text{SIN}(0)=0\) and \(\text{COS}(0)=1\).

But since the two functions are defined by power series, we can differentiate both term by term and find that \[ \frac{d}{dt}\text{SIN}(t) = \text{COS}(t), \qquad \frac{d}{dt}\text{COS}(t)=-\text{SIN}(t). \]

Differentiating again, we find that each of \(\sin\), \(\cos\), \(\text{SIN}\) and \(\text{COS}\) satisfies the ordinary differential equation \[ \ddot{X} = -X. \] Furthermore, both \(\sin\) and \(\text{SIN}\) satisfy the same initial conditions at \(t=0\), namely that the value of the function is \(0\), and its derivative is \(1\); similarly for \(\cos\) and \(\text{COS}\). We therefore deduce that \[ \sin(t)=\text{SIN}(t), \qquad \cos(t)=\text{COS}(t). \]

This gives us two things:

  • An effective way of calculating the functions (at least for small values of \(t\)).
  • Another handle on the functions, which we are just about to use.

Addition Formulae

Finally we make use of just a little complex analysis.

Looking at the power series we now have for \(\sin\) and \(\cos\), and remembering the power series for the exponential function, we have \[ \exp(it)=\cos(t)+i\sin(t). \]

But now we have \[ \begin{split} \cos(s+t)+i\sin(s+t) &= \exp(i(s+t))\\ & = \exp(is)\exp(it)\\ & = \cos(s)\cos(t)-\sin(s)\sin(t)\\ & +i(\sin(s)\cos(t)+\cos(s)\sin(t)) \end{split} \] from which we extract \[ \cos(s+t)=\cos(s)\cos(t)-\sin(s)\sin(t), \qquad \sin(s+t)=\sin(s)\cos(t)+\cos(s)\sin(t). \]

And finally, some triangles

It's time to make contact with the triangle notion of these trigonometric functions, so consider a right angled triangle, and let \(\theta\) be the radian measure of the angle at one of its vertices, \(V\), (not the right angle). Place this triangle in the first quadrant, with the vertex \(V\) at the origin, and the adjacent side along the \(x\)-axis. (This may require a reflection.) Call the vertex on the \(x\) axis \(U\) and the remaining vertex \(W\).

But we can scale the triangle by dividing each side length by the length of the hypotenuse, which doesn't affect any of the ratios of side lengths.

The traditional definition of the sine of angle \(\theta\) is that it is the length of the adjacent side to \(V\) divided by the length of the hypotenuse; similarly \(\cos(\theta)\) is the length of the side opposite to \(V\) divided by the length of the hypotenuse. Then in our rescaled triangle, the vertex \(U'\) corresponding to \(U\) lies on the unit circle, and (by the definition of radian measure of angles) the angle \(\theta\) is the distance around the circle from the point \((1,0)\).

But now, the length of the hypotenuse is \(1\), and so the ratio for the sine of \(\theta\) is just our \(\sin(\theta)\), and similarly for \(\cos(\theta)\).

So, finally, we see that \(\sin\) and \(\cos\) have all the old familiar properties.

I love it when a bunch of different bits of maths come together.

Thursday, 21 March 2019

Solar garbage disposal.

Why can't we drop our dangerous waste into the sun?

It's tempting to consider getting rid of dangerous (or undesirable) waste by shooting it into space and letting it fall into the sun; after all, it seems pretty unlikely that we could drop enough of anything into the sun to have an adverse effect on it.

So, just how hard is this?

Let's just start off by thinking about how hard it start off with something in the same orbit as the earth, and knock it out of that orbit into whose closest approach to the sun actually takes it inside the sun. We can worry later about getting it into this initial orbit.

Now, an orbit consists of an ellipse with one focus at the sun. Let's call the distance from the (centre of the) sun at perihelion (the closest point) \(r_1\), and the distance at aphelion (the farthest point) \(r_2\), and the orbital speeds at these positions \(v_1\) and \(v_2\) respectively. Then we have the (badly drawn) situation below:

Perihelion and aphelion convenient points to think about, because the orbital velocities at these points are perpendicular to the lines joining them to the focus. This makes it particularly easy to calculate the angular momentum there.

It's really quite hard to describe orbits properly: finding the position of an orbiting body as a function of time cannot be done in terms of the usual standard functions. However, we can avoid a lot of unpleasantness by being clever with two conserved quantities, the energy and angular momentum of the orbiting body. Since these are both conserved, they always have the value they have at perihelion.

At perihelion, then, the body has angular momentum and energy given by \[ \begin{split} L &= mv_1 r_1\\ E &= \frac{1}{2}mv_1^2 - \frac{GMm}{r_1} \end{split} \] where \(G \approx 6.67 \times 10^{-11}\text{m}^{3}\text{kg}^{-1}\text{s}^{-2}\) is Newton's gravitational constant, and \(M\) is the mass of the gravitating body (which for us will be the sun).

To reduce the algebra a little, we'll take advantage of the fact that everything is a multiple of \(m\), and think of the angular momentum and energy per unit of mass (i.e. per kilogram). That means we think about \[ \begin{split} l &= v_1 r_1\\ e &= \frac{1}{2}v_1^2 - \frac{GM}{r_1} \end{split} \]

A useful observation is that for a closed orbit, we must have \(e<0\); we'll save this fact up for later.

And now we'll do the sneaky thing of finding what the perihelion distance is for an orbit of a given angular momentum and energy.

The idea is simple enough: use the first equation to express \(v_1\) in terms of \(l\), then substitute that into the second equation to get a relationship between \(l\), \(e\) and \(r_1\).

This gives \[ e= \frac{l^2}{2r_1^2} - \frac{GM}{r_1} \] so that \[ r_1^2 +\frac{GMr_1}{e}- \frac{l^2}{2e} = 0 \]

So the solutions are given by \[ \begin{split} r_1 &= \frac{1}{2}\left( -\frac{GM}{e} \pm \sqrt{\frac{G^2M^2}{e^2} +2\frac{l^2}{e} }\right)\\ &= -\frac{GM}{2e} \left(1 \pm \sqrt{1+\frac{2l^2e}{G^2M^2}} \right) \end{split} \]

There are, of course, two roots to this quadratic: they are \(r_1\) and \(r_2\), the aphelion and perihelion values. (We could have gone through just the same process expressing \(l\) and \(e\) in terms of \(r_2\), and obtained the same quadratic.)

We should pay attention to signs here: we are taking a factor of \(|GM/e|\) out of the square root, but since \(e<0\), as mentioned above, this is \(-GM/e\).

Sanity check: for a circular orbit, the radius \(r\) is constant, and from Newton's law we obtain \(v^2/r = GM/r^2\). From this we find \[ e=-\frac{GM}{2r} \] and \[ l^2=v^2r^2 = GMr \] so that \[ \frac{2l^2e}{G^2M^2}=-1 \] and \(r_1=r_2\), i.e. the orbit is indeed circular.

Since \(r<0\), the square root term is smaller than \(1\), and we want the perihelion value, so we have \[ r_1 = -\frac{GM}{2e} \left(1 - \sqrt{1+\frac{2l^2e}{G^2M^2}} \right) \]

Now, let's see what the situation is for starting off with a body in the same orbit as the earth, and trying to adjust it so that it hits the sun. The earth's orbit is very nearly circular, so we'll just assume it is exactly circular, with constant radius and orbital speed.

By the magic of Google (or looking in a book), we find that the mass of the sun is\(M=1.99\times 10^{30}\text{kg}\), the radius of the sun is \(6.96 \times 10^{8}\text{m}\), the orbital radius of the earth is \(1.5 \times 10^{11} \text{m}\) and the orbital speed is \(2.98 \times 10^{4}\text{ms}^{-1}\).

In the diagram below, if the earth's orbit is the black one, we want to divert it to the red orbit by reducing the orbital speed from \(v\) to a smaller value \(u\), so that perihelion would occur inside the sun.

Now, we can set \(r_1 = 6.96 \times 10^{8}\text{m}\), and, noting that \(e\) and \(l\) are functions of the orbital speed, solve numerically for the orbital speeds which give a perihelion distance of less than \(6.96 \times 10^{8}\text{m}\). I asked a computer, and it told me that \(u\) must lie in \((-2858,2858)\), so we have to slow the orbit down by a change of speed \(\Delta v\) of at least \(29800-2858\approx 27000 \text{ms}^{-1}\).

For a kilogram of mass, that means we need to use \((\Delta v)^2/2 \approx 360\) MegaJoules to change its speed by enough. This is quite a lot of energy.

There's also the problem of getting the mass out of the earth's gravitational field: this takes about another \(62\) megajoules.

So in total, we need to use about \(420\) megajoules of energy to drop each kilogram of stuff into the sun.

For comparison, it requires about \(15\) megajoules to transport a kilogram of stuff to the ISS, in low earth orbit\(1\). An example from closer to home is that a megajoule is the kinetic energy of a vehicle weighing one tonne, travelling at about \(161\text{km}\text{h}^{-1}\).

So there we have it: we won't be just dropping our waste into the sun any time soon because apart from anything else, it's just so much work.


  1. You might wonder if it's possible to do better by pushing the object at an angle to its original orbit, rather than just by slowing it down. Unfortunately, the option discussed above is the one requiring the least energy.
  2. I've only considered the situation of giving the object an orbit which falls into the sun. Could one do better by arranging one which flies close past another planet and takes advantage of a slingshot orbit? Yes, and that was used to help send probes to Mercury. This can save energy, but it is a delicate manoeuvre; I'll leave you to try to work out if it might make this form of waste disposal more feasible.


Thanks to Colins @icecolbeveridge and @ColinTheMathmo for comments and feedback.

Monday, 11 March 2019

A matter of weight

What do you weigh? More to the point, what does that mean?

What is weight?

What it isn't: mass

Like many people, I find out what I weigh by standing on the bathroom scale. Usually, the answer is approximately 80 kilograms. Or so I try to convince myself.

Well, there's a problem with that. Kilograms is a measure of mass, not weight. But what's mass?

To answer this, we rely on Newton's laws. Mass is resistance to force. If you act on a mass, \(m\), with a force, \(\pmb{F}\), then it has an acceleration, \(\pmb{a}\), and the three are related by \[ \pmb{F} = m \pmb{a}. \]

Actually, things are a bit more complicated than that, but as long as the mass isn't changing, it's OK.

Then if I have a reliable way of exerting a known force, I can compare the masses of two different object by comparing the accelerations imparted to both. And if I have a reference mass (and there is one for the kilogram) then I can work out the mass of anything else I act on with this force in terms of the reference mass.

OK, so that's mass (well, the quick version) sorted out. But what is weight, and what is my bathroom scale really telling me?

What weight is absolutely

The first definition that you are likely to meet in a physics text is that it is the force exerted on a body by gravity: sometimes the actual force (as a vector), but more usually the magnitude of the force.

It is a remarkable fact that the force exerted by gravity on an object is exactly proportional to its mass, so that all objects, whatever their mass, fall with the same acceleration (in the absence of confounding factors like air resistance).

In SI units (i.e. those used by the civilized world), the unit of force is the Newton, and the unit of mass is the kilogram. At the surface of the earth, the force of gravity is approximately \(9.8 \text{N}/\text{kg}\); a mass of 1 kilogram weighs about \(9.8\) Newtons.

We can work with this. If I have access to materials which can be compressed or extended by acting on them with a force, then I can build a machine which measures the (magnitude of the) force I exert on it. And if I know that what people really want to know is their mass, I can calibrate it so that if the force acting on it is the force that gravity exerts on their body, then the scale shows the mass that gravity must be acting on, rather than the actual force.

This is typically how a bathroom scale works.

And is fine and excellent, as long as I know what the force of gravity is where I use the bathroom scale. There is a minor issue caused by the fact that the force of gravity isn't quite the same all over the surface of the earth; but the variation is fairly small (from \(9.78 \text{N}/\text{kg}\) to \(9.83 \text{N}/\text{kg}\), and for everyday purposes we can pretty much ignore this variation.

But now we bump up against a bit of a problem. For the scale to work properly, we have to stand still. You'll have noticed that if you jump up and down on it, the displayed weight oscillates along with you. It's less likely that you've taken it into a lift, but if you have you'll have noticed that when the lift has an upward acceleration you show a higher weight, and if it has downward acceleration you show a lower weight.

The snag is, of course, that to find your weight, the scale has to hold you stationary in the earth's gravitational field, and measure the force it takes to do that.

This raises a slightly awkward question: how do you know when the scale is holding you stationary?

This could well seem like a rather stupid question. All you have to do is look around.

But what if you are in an enclosed room? How could you tell the difference between being (say) on the moon, with a weight of approximately one sixth of your weight on earth, and being in a lift which has a downward acceleration of about five sixths the acceleration due to earth's gravity, or indeed being in a rocket a long way away from any body of mass, accelerating at about one sixth of the acceleration due to earth's gravity?

It turns out, that's really hard to do. You can't do it by making local measurements, which is a heavily coded way of saying that no matter how good your measuring apparatus is, if you work in a sufficiently small laboratory, over a sufficiently small time span, you can't detect the difference. (If you allow non-local measurements, which effectively means that you work in a region big enough that the force of gravity varies appreciably, then you can measure the resulting tidal effects: here is a nice look at tidal effects from the point of view of an orbiting astronaut, from @ColinThe Mathmo.

This is a bit of a blow. We have what looks like a perfectly sensible definition, but it turns out to be hard to use. So we do that standard trick. We replace it by a somewhat different definition, which is easier to use, but doesn't look quite so sensible.

At least, it doesn't look quite to sensible to start with. Once you get used to it, you can convince yourself that it is in fact the sensible definition, and it's the original definition that's misled.

What weight is relatively

So what do we do instead?

We accept that since what we can measure (reasonably) depends on the state of motion of the equipment and body, we build that into the definition.

We now define the weight of an object in a given frame of reference to be the (magnitude of the) force required to hold it stationary in that frame of reference.

So, if your scale says you weigh a particular amount, that's what you weigh: you don't have to know whether you're on the moon, or whether you have to compensate because the scale is accelerating.

This does have it advantages. Principally, it means that you weight what you feel as if you weigh. If you're in a rocket taking off, or visiting a heavy-gravity planet (we can dream) you weigh more if you're standing still on the earth. If you're in a lift plummeting you towards an untimely end, or on the moon, you weigh less than if you're standing still on the earth. And that's all there is to it.

So, to the question I've been avoiding up to this point: is an astronaut in orbit, say on a space station, weightless?

There are two answers to this.

  • No. The force on gravity on him is slightly less than it is on the surface of the earth, so he weighs a little less than he would at home.
  • Yes. If he tries to stand on a scale, it reads zero; it takes no force to keep him stationary in his space-station, so in the frame of reference of the space-station, he is weightless.

And both answers are correct. Whether the astronaut is weightless or not depends on the definition of weight that you prefer.

But in either case, the astronaut is definitely not outside the gravitational influence of the earth; gravity is still pulling on him with a very significant force.

I prefer the second definition, because it's the one that doesn't require you somehow to know the situation outside your laboratory; it only requires you to make local measurements. But that doesn't make it right, only preferable from some points of view.

At least, that's the situation if you live in a universe where gravitation acts according to Newtonian theory. But in fact we don't really, and there's a better way of trying to understand gravity, and then we have a better reason for preferring the second definition.

What weight is relativistically

There's a great mystery in Newtonian mechanics: why is it that the mass which interacts via gravity is exactly the same as (or at least, exactly proportional to) the mass which resists acceleration? Very sensitive experiments have been carried out to test this, and they've always come out unable to detect a difference. They agree to no worse that one part in one hundred billion.

So, why should this be?

The current answer is that there is a better way to think about gravitational influence than as a force. In general relativity, we have a way of relating the distribution of matter to a distortion in space and time, which changes the notion of a straight line: we generalize this to a kind of curve called a geodesic. So a particle moving in a gravitational field is one which travels along a geodesic: there is no force acting on it, it's just (just!) that the nature of space and time mean that it doesn't travel along the same trajectory as it would in the absence of gravity.

Then any particle (with no external forces acting on it) travels along a geodesic which is determined purely by a starting point and a starting velocity; mass does not have a part in the trajectory of a the motion of a particle in a gravitational field. Of course, part of the joy of this is that it does give very nearly the same answer as Newtonian gravity.

Very nearly, but not quite the same as Newtonian gravity. And with careful observation we find that the general relativity version of gravity matches measurements better than the Newtonian one.

But now there is no force of gravity.

If I'm standing on the surface of the earth, the only force acting on me is the force of the earth pushing me off the geodesic that the curvature of space and time wants me to travel along. If I'm on the space station, there is no force acting on me at all.

So in this framework, the only definition of my weight that makes any sense is that it is the magnitude of the force required in some frame of reference to keep me stationary in that frame of reference.

This looks superficially the same as the second definition in Newtonian gravity, but it's really quite different in principle. In the Newtonian case, it's avoiding the issue that there's no easy way to access the actual force of gravity, and settling for what is easy to measure. In the general relativistic case, it's saying that weight is an illusion caused by being in an unnatural frame of reference.

I try to take some consolation from that as my weight seems to gradually, but inexorably, increase.

Monday, 17 September 2018

What's a number? 3 - in an orderly manner

Why is infinity plus 1 bigger than 1 plus infinity? What about infinity times 2 and 2 times infinity? It all depends, on course, on what you mean by plus, times, and infinity.

Back to the beginning

In a previous blog post I discussed the natural numbers (by which I mean the non-negative integers, though some people exclude \(0\)). To recap briefly, we can describe by these means of how they behave, using the Peano axioms, and also give a concrete realization of them, namely the von Neumann numbers.
These von Neumann numbers are given by starting with the empty set \(\emptyset\), which is \(0\), then definining a successor function by saying that the successor of any number \(n\) is the set \(s(n)=n ; \{n\}\), and then collecting together everything you can make that way. Then \(1=s(0)=\{0\}, 2=s(1)=\{0,1\}\) and so on.
With this notion of a number, we can define the usual addition and multiplication; we also have an order, so \(m \leq n\) if there is some \(p\) such that \(n=m+p\).
I already looked at how to build on these by extending them in various ways to include negative numbers, rational numbers, and finally complete them with the irrational numbers to form \(\mathbb{R}\), the set of real numbers.
This time I want to go back to these natural numbers, and build on them in a different way: this time by taking the ordering on them, and seeing how we can enlarge our set of numbers in an interesting way using this idea.
But first, there's going to be a bit of a digression, in which we'll see some general ideas involving ordering.

(Well) ordered sets

We say that a set, \(S\) is ordered if there's a relationship \(\leq\) on it which has the properties
  1. If \(a,b,c \in S\) then \(a \leq b\) and \(b \leq c\) implies \(a \leq c\)
  2. If \(a,b \in S\) and \(a \leq b\) and \(b \leq a\) then \(a=b\)
  3. If \(a,b\in S\) then \(a \leq b\) or \(b \leq a\) (or both)
So clearly \(\mathbb{N}\), the set of natural numbers, is ordered, with \(\leq\) being the usual less than or equal to.
The same is true for \(\mathbb{Q}\) and \(\mathbb{R}\), the sets of rational and real numbers.
It's natural (and sensible) to say that two sets \(A,B\) are the same size if they can be matched up so that each element of one set is matched to a different one of the other: such a matching is a bijective function \(f:A \to B\).
But with ordered sets we have another ingredient: we say that two ordered sets \(A\) and \(B\) are similar, written \(A \sim B\), if there is a bijective function \(f:A \to B\) which preserves the ordering, so for any \(a_1,a_2 \in A\), \(f(a_1) \leq f(a_2)\) is equivalent to \(a_1 \leq a_2\).
It's not hard to show that there's bijection between the natural numbers and the non-negative rationals. It's a bit harder to see that the two sets aren't similar, but if you notice that if you choose any positive rational number, there are infinitely many smaller ones, but there are only finitely many natural numbers smaller than any given natural number, then you can see it's pretty hopeless to try to find a similarity.
In fact, the natural numbers have another important property. In particular, they have the property that any set of natural numbers contains a smallest element. An ordered set with this property is called a well-ordered set
Neither the set of rational numbers nor the set of real numbers is well-ordered (with the usual ordering): the set of all rational or reals greater than \(0\) has no least element. So this is a bit special.
Well-ordered sets are pretty special: they behave a lot like numbers.
But we have to do a little bit or work here.
First, if \(A\) is a well-ordered set, and \(a \in A\), then \(i(a)\) is the set of all elements of \(A\) less than \((a)\), and is called the initial segment of \(a\). (It's common to use \(s(a)\) for the initial segment of \(a\), but I'm already using that for the successor.)
For example, in \(\mathbb{N}\), the intial segment of \(3\) is \(\{0,1,2\}\).
This is actually a bit of a mind-expander. In the von Neumann numbers, \(3\) is, by definition, \(\{0,1,2\}\), so \(i(3)=3\), and in general, for any \(n \in \mathbb{N}\), we have \(i(n)=n\). You might want to take a break and walk around the block, or do some pressups, or something, until that settles in. I know it took me a while to get used to the idea.
We think of two similar well-ordered sets as being ''really the same''; they are representations of the same ordinal number. (This is what we mean by ordinal number, but I may sometimes forget myself and talk about a particular ordered set as an ordinal number, rather than as a representation of on.)
For example, the sets of integers \(3=\{0,1,2\}\), \(\{1,2,3\}\) and \(\{2,4,7\}\) are all similar, and we can say that each represents the ordinal \(\pmb{3}\).
Now comes the miracle.
If \(A\) and \(B\) are well-ordered sets, then exactly one of the following is true:
  1. \(A\) is similar to an initial segment of \(B\)
  2. \(B\) is similar to an initial segment of \(A\)
  3. \(A\) is similar to \(B\).
It then turns out that this notion is similar to all of or to an initial segment of can be thought of as \(\le\), and ordinal numbers are themselves well-ordered. (Though you have to be a little bit careful about this for rather technical reasons.)
Just as with the usual numbers, we say \(A \lt B\) if \(A \leq B\), and \(A \neq B\), which is saying that \(A\) is an initial segment of \(B\).
And now we can start to do something with this machinery. We'll think a bit about how these ordinals extend the von Neumann numbers.

The first infinite number, \(\omega\)

So first of all, every von Neumann number is a well-ordered set, so represents an ordinal. But (and this is the point) there are more. In particular, the set of all natural numbers is also well-ordered, and so represents an ordinal. We call this ordinal \(\omega\).
What can we say about \(\omega\)?
The first thing is, every von Neumann number is less than it, because every von Neumann number is an initial segment.
Second, it's the smallest ordinal which is bigger than all the von Neumann numbers. For suppose that \(\alpha\) is also an ordinal which is bigger than all the von Neumann numbers, and \(\alpha \leq \omega\). Then \(\alpha\) must be an initial segment of \(\omega\); but the initial segments of \(\omega\) are just the von Neumann numbers themselves and \(\omega\). If \(\alpha\) is a von Neumann number, then it is not bigger than all the von Neumann numbers. The only alternative is that \(\alpha=\omega\).
So we can see that \(\omega\) is the smallest infinite ordinal.
There are lots of different representations of \(\omega\): here are a few: \[ \begin{split} &\{1,2,3,\ldots\}\\ &\{1,2,3,4,\ldots\}\\ &\{1,3,5,7,\ldots\}\\ &\{0,2,4,6,\ldots\} \end{split} \] Each of these is similar to \(\mathbb{N}\), and so is a representative of \(\omega\).
Although \(\omega\) is not itself an von Neumann number, it is very like one. In fact, ordinals are sufficiently like von Neumann numbers that we can do a lot of the same stuff as we can with the natural numbers, and in particular we can do arithmetic.
Before going on to look at addition and multiplication in general, let's just think about the successor, i.e what we get when we add \(1\) to a number.
With the finite (von Neumann) numbers, we define \[ n+1 = s(n) = \{0,1,2,\ldots, n\} \] and there's nothing to stop is doing the same with \(\omega\): \[ s(\omega) = \omega+1 = \{0,1,2,3,\ldots,\omega\} \] and we have to note that in this ordered set, \(\omega\) has infinitely many predecessors, but no immediate predecessor.
And of course, we can do this as often as we like: \[ \omega+2 = \{0,1,2,3,\ldots,\omega, \omega+1\} \] We can even build up the entire collection \[ \{0,1,2,\ldots,\omega,\omega+1,\omega+2,\ldots\} \] which it seems sensible to think of as \(\omega+\omega\), which is the result of doubling \(\omega\).
This all suggests that we might be able to add and multiply ordinals, if we paid enough attention to how to generalise the procedure for for numbers. And we can (or I probably wouldn't have raised the issue). But we have to be careful: there are a few surprised in store.

Arithmetic with ordinals

Let's start off with


It would be nice to use the same definition as with the von Neumann numbers, but that (as it stands) relies on each number other than \(0\) have a predecessor, and we don't have that any more.
We have to generalise the idea somehow. There is more than one way to do this, and I'm just going to look at the one that I find easiest to understand.
So, if \(\alpha\) and \(\beta\) are ordinals, maybe finite, maybe note, then here's one way we can define their sum.
We take well-ordered sets \(A\) and \(B\) corresponding to \(\alpha\) and \(\beta\), and define their ordered union to be the ordered set \(A ; B\) which contains all the elements of \(A\) followed by all the elements of \(B\). This ordered set is (represents) \(\alpha+\beta\).
It's important to note here than when I write these out, it's the ordering given by the listing of the elements that matters, so \[ 3+4=\{0,1,2\} ; \{0,1,2,3\} = \{0,1,2,0,1,2,3\} \sim \{0,1,2,3,4,5,6\} = 7 \]
It's not hard to convince yourself that this gives the right answer when \(\alpha\) and \(\beta\) are ordinary finite numbers, so this does generalise the usual addition. There's the slight technicality that we should check that this ordered union produces a well-ordered set: but it does.
You can also see straight from the definition that if \(\alpha,\beta\) and \(\gamma\) are ordinals, then \[ (\alpha+\beta)+\gamma = \alpha+(\beta+\gamma) \] so addition is associative.
Now comes the first surprise: addition of ordinals is not in general commutative. We can see this in the simplest possible case, comparing \(1+\omega\) with \(\omega+1\).
In the first case, since \(1=\{0\}\), we have \[ 1+\omega = \{0,0,1,\ldots\} \sim \{0,1,\dots\} = \omega \] Where the first \(0\) in there comes from \(1\), and precedes the second one, which comes from \(\omega\) (and I'm being a little sloppy by thinking of a particular well-ordered set as the actual ordinal, rather than a representative for it).
On the other hand, \[ \omega+1 = \{0,1,2,\ldots,0\} \sim \{0,1,2,\ldots,\omega\} \] which is definitely not similar to \(\omega\). In fact, since \(\omega\) is an initial segment of this, we know that \(\omega \lt \omega+1\).
And from this definition, we see straight from the definition that \[ \omega+\omega = \{\omega,\omega\}=\{0,1,2,\ldots,0,1,2,\ldots\} \] in agreement with the guess up above.
So, we have a definition for the sum of two ordinals which agrees with what we know for finite ordinals, and still works for infinite ones. But, as often happens when we generalise a good notion, some of the properties are lost - in this case, commutativity.
So, what about multiplication? This can also be done.


As before, we have ordinals \(\alpha\) and \(\beta\), with corresponding well-ordered sets \(A\) and \(B\). This time, instead of an ordered union, we define an ordered Cartesian product.
We have \[ A \times B = \{(a,b) : a \in A, b \in B\} \] and we say that \((a_1,b_1) \lt (a_2,b_2)\) if
  1. \(b_1 \lt b_2\), or
  2. \(b_1=b_2\) and \(a_1 \lt a_2\)
This gives us a copy of \(A\) for each element of \(B\), and the copies are ordered by the ordering of the elements of \(B\).
As you might hope, multiplying by \(1\) leaves ordinals unchanged: if \(\alpha\) is any ordinal, then \[ 1 \times \alpha = \alpha \times 1 = \alpha. \]
As with addition, the multiplication of ordinals is associative, though it's a little harder to see than with addition.
But, also as with addition, commutativity is lost. And as before, it is the simplest example imaginable (well, imaginable to me) which shows us this.
First, we have \[ \begin{split} 2 \times \omega &= \{0,1\} \times \{0,1,2\ldots\} \\ &= \{(0,0),(1,0),(0,1),(1,1), \ldots\}\\ & \sim \{0,1,2,\ldots\}\\ &=\omega \end{split} \] But then \[ \begin{split} \omega \times 2 &= \{0,1,2,\ldots\} \times \{0,1\}\\ &= \{(0,0),(0,1),\ldots, (1,0),(1,1),\ldots\}\\ &\sim \{\omega,\omega\}\\ &=\omega+\omega \end{split} \] So again, we have lost commutativity.
The other property we might hope for is distributivity. Is this preserved? Well, yes and no.
If \(\alpha,\beta\) and \(\gamma\) are ordinals, then (pretty much straight from the definition) \[ \alpha \times (\beta+\gamma) = \alpha \times \beta + \alpha \times \gamma. \]
But on the other hand, \[ \omega = 2 \times \omega = (1+1) \times \omega = 1 \times \omega + 1 \times \omega \neq \omega \] so in general \[ (\alpha+\beta) \times \gamma \neq \alpha \times \gamma + \alpha \times \beta \] So the ordinals have an arithmetic that extends the arithmetic of finite numbers, but we do have to be a bit more careful.

Is there more?

There's more.
For example, you can define a notion of exponentiation, and you can build immense towers of ordinal numbers, each larger than everything that comes before it. There is an inductive principle, called the principle of transfinite induction, which gives a powerful tool for proving theorems involving ordinal numbers.
The behaviour of these objects is rich and fascination. For example, one of my favourite results is that if \(\alpha\) is any ordinal, then any sequence of (strictly) decreasing ordinals starting at \(\alpha\) must reach \(0\) after finitely many terms.
But this has already been more than long enough.
A fairly technical presentation is given by the wiki page. If that's not to your taste, then Rudy Rucker (@rudytheelder) has a novel, White Light, exploring aspects of infinity.

Saturday, 21 July 2018

What is school mathematics for?

What is the purpose of school mathematics? I have opinions… Here I shall mostly be ranting about what I think school mathematics ought to do.

Note that when I talk about what schools actually do, I'm referring to the English educational system: YMMV.

So, on with the show.

To a first approximation, I think that school mathematics should provide three things.

Basic skills

First off, everybody should leave school numerate. They should be arithmetically competent, and be able to cope with percentages, ratios, proportions and the like. They should also have a reasonably well-developed number sense, and be able to make plausible estimates and approximations.

For example, the total cost (say) of a shopping basket shouldn't come as a complete surprise, nor should the cost per head of population of something that costs the country a billion pounds, nor the fraction of one's own disposable income that that amounts to. And how to work out whether a 500g box of cereal or a 750g box is the better deal should not be a mystery (or rely on the shelf label helpfully giving the price per 100g).

This is the kind of stuff that the mythical general public probably think of as mathematics. But it's not the only stuff that you comes in handy in 'real life'.

We also all need some basic statistical understanding, at least we do if we aren't going to have the wool pulled over our eyes all the time. By this I mean understanding data and its presentation, what can be reliably inferred from it, and the nature of uncertainty in what is reported (or predicted!).

This is, I think, a bare minimum of the mathematics that you need to cope with everyday life (without being taken advantage of) and take meaningful part in a democracy (so that your decisions have a fighting chance of being well-informed).

I leave you to decide for yourself whether this utilitarian need is met. (And if it's as much of a Bad Thing as I think it is for it not to be met.)

Intellectual development

Outside that, mathematics is there to provide you with intellectual development.

The purpose of this material is not to provide every pupil with a bag of mathematical techniques which they will employ in day-to-day life or in their jobs. Attempt to persuade them of a utilitarian value are doomed to failure, because they all know adults. The purpose of this material is the same as exposing them to great literature or art; it's to expand their mental horizons.

This shouldn't consist (only) of methods and algorithms to be learned, but try to provide an appreciation of the structure and process of mathematics. Mathematics is, after all, one of the subjects where you don't just have to take somebody else's word for it. You can give a satisfying argument about why something is the case. There's a huge amount here that can be explored with no more than numbers and basic algebra.

For example, nobody should leave school knowing that 'a minus times a minus is a plus' without also knowing that this isn't just an arbitrary rule, that it follows from some more fundamental choices, and that those choices themselves are made for a good reason.

Currently, there is a lot of material in the compulsory maths curriculum for school pupils up to the age of 16. There's a lot of learning to do, and a lot of skill to be acquired. And there's a lot of teachers working extremely hard to help their pupils acquire the knowledge and develop the skills.

I think that the curriculum is drastically overloaded, and encourages the wrong (I did say I have opinions) kind of learning: a procedural, or instrumental, skill based learning which is quicker to achieve, but doesn't provide the deeper understanding that an appreciation of the mathematics requires.

I'd much rather see a much smaller curriculum, with time for a deeper understanding of fewer topics to be developed.

You can even show how if gives an enriched understanding of the world around us.

"How much shorter is a short cut corner to corner compared to going along the pavement round the edge of a rectangular lawn?" is an interesting question, and nobody is pretending that a builder or a plumber needs to use Pythaogoras in everyday life to work out how long a pipe or a rafter has to be.

Note that this isn't going to rob anybody of job opportunities. The requirement to have some arbitrary number of GCSE's (including English and mathematics) at some arbitrary level of achievement isn't selecting the people with the required level of mathematics to be able to do a job—though there may be some hope that it selects out those who can't read, write or count.

Preparation for further study

And of course, we also have to prepare those who might go on to study maths at higher level. What we should be providing here is a basis on which undergraduate study can be built. A good foundation for the more advanced and abstract material covered in a degree course.

The current preparation expected (and required) by Universities is A-level, which includes a large chunk of calculus. I feel in a better position to comment on this, because every year I see a lot of students who have been moderately to very successful at A-level mathematics. As the years have passed, I've become more convinced that this does not provide a particularly good foundation for what we do at university.

Again, the problem seems to be one of an overstuffed curriculum. I see a lot of students who can sort of do a quite a lot of stuff. But a large proportion of them seem to have, again, an entirely instrumental learning: when I see this, I do that.

In fact, some of them by this stage have a fairly well-developed conviction that that is what mathematics is: it's a bag of procedures and algorithms which you learn to apply to the problems that you're been taught how to solve using them. These get downright panicky, and in some cases even resentful, at the idea of having to explain why some mathematical fact is true, or why some procedure works.

Unfortunately for them, there tends to be much more emphasis on why things are true or work in undergraduate maths than they are used to.

So, back to my previous point. They should be arriving with a basis on which further undergraduate study can be developed.

I don't think this is well-served by having a lot of stuff which so many students have learned to do without knowing why it works.

I think it would be better served by having a reduced curriculum, but with a deeper understanding of the material. I'd much rather see students arrive with a strong understanding of algebra, and a decent idea of what constitutes a proof, than see them arrive with an extensive bag of tricks which they don't really understand. OK, it would be even better if they had the deeper understanding of all the stuff they currently meet. But it takes time and effort to develop that understanding. You can't have both.

Yes, that would mean that we have to start calculus from scratch at university. But it's pretty standard to do that anyway (if quite fast) to try to give the deeper understanding that we want; I'm not sure that it would make a big difference to what we do in the first year.

And even if it did, that might not be a terribly Bad Thing.

There's a whole other discussion to be had about what the content of a mathematics degree ought to be, but one point on which we could probably all agree is that very little of the content of a mathematics degree is of great relevance to one's life after graduation. For those who do use mathematics subsequently, whether commercially or academically, the most important thing is likely to be the ability to learn new mathematics efficiently, and to understand what's going on.

And I'd be happy to maintain that this is better done by developing understanding, if necessary at the expense of some extent of coverage.


  • I think I'm pretty safe in these opinions. I might be entirely wrong, but nobody will ever do the experiment (i.e. completely overhaul mathematics education) to find out.
  • For a nice summary of what I called instrumental and deeper learning, there is Richard Skemp's article

    Relational Understanding and Instrumental Understanding, Mathematics Teaching 77 20–26, (1976).

    available here, courtesy of @republicofmath, and for a much more extensive discussion, his book The psychology of learning mathematics

Wednesday, 27 June 2018

Why is subtraction so hard?

Subtraction presents various problems to learners of mathematics, not least with the mechanics of hand calculating the result of subtracting one multi-digit number from another. But that's not the kind of difficulty I'm interested in: I'm more interested here in the difficulties that arise when computations involving several additions and subtractions of integers are involved, or at the slightly more advanced level where algebraic computations involving several additions and subtractions are required. I'm going to take a look at what I think lies underneath the difficulty.

The procedure required to deal with subtraction starts off deceptively simple.

Addition and subtraction are of the same priority, and so a calculation involving a string of additions and subtractions is calculated from right to left: \[ 1-2+3-4-5=-1+3-4-5=2-4-5=-2-5=-7 \] Slightly more complicated, we may have things inside parentheses which consequently have to be evaluated before the result is combined with the rest: \[ 2-(3-4) = 2-(-1) = 3 \] But already we find lots of mistakes being made. It isn't uncommon for the first calculation to turn into some variation on \[ 1-2+3-4-5=1-2+3-(-1)=-1+2=1 \] or for the second to go down the route of \[ 2-(3-4)=2-3-4=-1-4=-5 \]

The root of the problem

In one sense, the root of the problem is combinatorial. There are lots of different ways of combining the various numbers in the calculations, but they don't all give the same answer. There are more ways of getting the wrong answer than the right one.

But that doesn't really get to the bottom of it. Why do learners combine things in the wrong way when they've been given careful explanation and instruction of the right way? (And it's not just novices: errors along these lines persist with undergraduates!)

I think that the real problem is that subtraction is a horrible operation, in many ways.

The horrors of subtraction

  • First, we learn to add together positive integers, or natural numbers. The result of adding together two natural numbers is always a natural number: addition is a well defined binary operation on natural numbers.

    But subtraction isn't.

    When we subtract one natural number from another, the result can be negative. Suddenly a whole new can of worms is opened up, because we need to deal with the arithmetic of negative numbers, which, as we all know, is a paradise of opportunity for sign errors.

  • Next, subtraction isn't commutative. This doesn't seem to be subtle, but let's take a quick look: \[ 2-1=1 \qquad \text{ but } \qquad 1-2 =-1 \] The answers are different, but only by being of opposite signs; and to rub salt in the wound, the second may well be done by subtracting \(1\) from \(2\) and changing the sign. Losing a minus sign is easy for all of us.
  • Even more horrible, subtraction is not associative: \[ (1-2)-3 \neq 1-(2-3) \]

    But we get very used to using associativity and commutativity without even noticing it when adding. It's much harder to remember not to do it when it's inappropriate, once you've become thoroughly accustomed to doing it.

  • And lastly, to cope with the problem we end up with a plethora of rules about combinations of signs. More to remember, often just regarded as more rules to learn. I cannot express in words the horror I feel when I hear students muttering (or even writing out) "FOIL" when doing algebraic calculations; I haven't yet heard "KFC" from my students when working with fractions, but I've seen it on the interwebs often enough to know it's only a matter of time.
Now, there are many rules in algebra and arithmetic, but a relatively small number of principles (axioms) from which they all follow.

Unfortunately, you can't really teach arithmetic to five-year-olds by starting off with the axioms of a field.

But maybe there comes a time to revisit the collection of rules and see where they come from. At least, I always find it illuminating to go back to elementary stuff and see what new insight I might have based on further study.

Avoiding the issue

So here's my approach to a better (I have many opinions, and very few of them are humble) view of the arithmetic and algebra of subtraction.

Uninvent subtraction.

OK, so I haven't gone back in time and relived my education without the notion. What I mean is that I look back and say that after all this time I now see that there's a better way to think of subtraction.

First, any number has \(a\) an additive inverse \(-a\): a number we add to it to get \(0\). Then if \(a\) and \(b\) are two numbers, I say that \[ a-b \] is a (somewhat undesirable, but unavoidable) notation for \[ a+(-b) \] which is the sum of \(a\) and \(-b\), the additive inverse of \(b\).

And at this point I really, really wish that there were a conventional typographical distinction between \(-\), the binary subtraction operator, and \(-\), the unary additive inverse operator. Oh well.

This has the immediate consequence that we can write our sums slightly differently: \[ 1-2+3-4-5 = 1+ (-2) + 3 +(-4) + (-5) \] and these quantities can now be rearranged at will, since addition is associative and commutative.

In the second example, we have \[ 2-(3-4) = 2 + (-(3-4)) \] And what do we add to \(3-4\) to get \(0\)? we add \(4-3=1\), so \(-(3-4)=1\) and \[ 2-(3-4) = 2 + 1 = 3 \]

Of course, it's still necessary to be able to add together combinations of positive and negative numbers: there is no free lunch here. But it's a way of thinking about the computation that reduces the temptation/opportunity to make mistakes, so maybe it's a slightly cheaper lunch.

One consequence is that if I see \[ x+5=7 \] I don't think 'subtract \(5\) from each side', I think 'add \(-5\) to each side'.

I find it a useful way to think about what's going on.

I try to stress this with my students, but with mixed success. And just about everybody who's teaching early undergraduate material will surely be doing something like this in the students' first encounter with abstract algebra and axiomatic systems.

My general impression is that as long as they're doing a set problem of the 'show from the axioms of a field' type, most students can be persuaded to work this way.

But I find that as soon as that context is gone and they're back in the old familiar territory of doing algebra or arithmetic, for most it's also back to the old familiar way of proceeding. And for quite a few this has just too many opportunities for the old familiar way of going wrong.

But also

I have very similar thoughts about division, the reconstruction of which I leave as an exercise for the sufficiently motivated reader.

Tuesday, 29 May 2018

What does it mean to say that the universe is expanding?

Q: The universe is expanding, so what is it expanding into?
A: The question is wrong.
OK, that answer could use some expansion of its own.

Newtonian Cosmology

Here's one version of the story.
Careful measurements (and careful interpretation of the measurements) tell us that the stuff in the universe is getting further apart. What's more, if we look around us, then to a good degree of approximation, everything is travelling away from us, and the farther away it is, the faster it's getting away from us. In fact, the speed at which distant galaxies are receding from us is proportional to how far away they are. We know this because of red shift in the spectra of these galaxies, from which we can calculate how fast they are travelling in order for a Doppler shift to cause that much red shift.
This seems to suggest that we are (surprisingly or not will depend on your religious and philosophical framework) at the centre of the universe; it is perhaps less flattering that everything is trying to get away from us.
Let's start off by thinking about how special our position is. We think of the universe as being full of a homogeneous dust (for scale, the dust particles are clusters of galaxies). In the simplest model, we are at the origin, and at time \(t\) the the dust particle at position \(\pmb{x}\) has velocity \(\pmb{v}=H(t) \pmb{x}\). Then \(H(t)\) is the scale factor which relates distance to velocity of recession, and we call it \(H\) the Hubble parameter, to honour Hubble who made the initial observations.
But what about somebody who isn't at the origin? What about somebody living in a different dust particle, far, far away, say at position \(\pmb{x}'\)? What would they see?
Let's use \(\pmb{v}'\) to denote the velocity as seen by this observer at position \(\pmb{x}'\). We have to subtract their velocity from the velocity we see at each point to get the velocity relative to them at each point. Then we get \[ \pmb{v}' = H(t) \pmb{x} - H(t) \pmb{x}' = H(t)(\pmb{x}-\pmb{x}') \] But \(\pmb{x}-\pmb{x}'\) is just the position as seen by the observer at \(\pmb{x}'\); this other observer also sees everything in the universe recede at a rate proportional to distance, and it's even the same constant of proportionality. Rather suprprisingly, this suggests that we aren't anywhere special, and that the cosmological principle (or principle of mediocrity) that we are nowhere special is compatible with the observations.
A concrete approach to this that may be useful is the currant cake, or raisin bread analogy. If you think of the dust particles as the items of fruit in the cake, then as it cooks and the cake rises, the currants (or raisins) get farther apart, in just his way -at least, if we assume that the rises process amounts to the whole cake scaling up in a uniform way.
But this model is a bit misleading, as there is an edge to the cake, and the cake is expanding into the surrounding oven. This isn't so for the dust-filled universe.
It's important to notice is that the universe is infinite - it's the whole of three dimensional Euclidean space - forever. And it's always full of this cosmological dust: but if we go back in time, we see that the dust particles are getting closer together, so the dust must be getting denser, and if we go forward in time, it is getting more rarefied. But there's no 'edge' to the material, no expanding of anything, even though the dust particles are getting further apart. This is one of the joys of having infinite space: it can be full of stuff, but there's nothing to stop the contents spreading out.
There's a slight problem when we try to push back to \(t=0\), when the density would have to be infinite: but I avoid that by saying that I have a model for a universe that seems compatible with Hubble's observations for all \(t \gt 0\), and I've no idea what kind of Big Bang happened at \(t=0\); that's not part of my model. There's certainly no sense in which the universe started off as a single 'infinitely dense point' and exploded outward.
So, great news! As long as the universe just happens to be full of stuff exploding outwards, we have a rather neat mathematical model compatible with the observations, and all we need is some physics compatible with this model to make it plausible.
Aye, there's the rub.

Problem One

The first problem is a kinematic one. No matter how small \(H(t)\) is, objects sufficiently far away have to be moving faster than light. This is definitely hard to reconcile with special relativity, and special relativity is very well confirmed these days, so don't just want to abandon it and go back to Newtonian mechanics. This cosmological model really is only good for Newtonian mechanics.
We could try to fix it by slowing down the expansion sufficiently far away, so that Hubble's law is only valid over a certain scale: but that gets hard to reconcile with the cosmological principle.
Or we might try to fiddle with special relativity by arguing that it's great at small scales, but needs correction at larger ones.

Problem Two

But even if we only look for a Newtonian model, there is still a serious problem.
We are at the origin of the universe, and we see everything receding from us. But as thing get farther away, they speed up: everything is not only getting farther apart, but is acceleration. So our distant observer, living in a dust particle far, far away, is accelerating. That's an issue because it means that we are, after all, in a special place. Though all observers see a universe expanding in the same way, only we don't feel a force pushing on us to make us accelerate.
This is actually two problems.
There's the actual physical issue of just what is supposed to be doing the pushing. We need a pretty weird model of gravity to be compatible with gravity being attractive (so as to hold solar systems and galaxies together) at one scale, but repulsive (so as to make stuff accelerate apart at the bigger scale).
And there's the philosophical issue of what makes us so special: in this model we are, once again, in the actual centre of the universe.
So, can we find another way of modelling the universe which retains the good property of satisfying Hubble's law, but doesn't have either of the problems of this repulsive force which only becomes significant at large distances, or puts us in a privileged position?

Relativistic cosmology

Well of course we can. As usual, that's the sort of question that only gets asked if there is a good answer available.
But it involves a radically new way of understanding how the universe fits together.

Newtonian space-time

In the Newtonian picture of space-time, with the origin at the centre of expansion, we have spatial position, represented by a position vector \(\pmb{x}\) and time, represented by \(t\). If we have two events \(A\) and \(B\), which happen respective at position \(\pmb{x}_A\) and time \(t_A\), and at position \(\pmb{x}_B\) and time \(t_B\) then the distance between them is the usual Euclidean distance \(\|\pmb{x}_A -\pmb{x}_B\|\), and the time between them is \(|t_A-t_B|\).
This is all very persuasive, and seems to match well to experience.
But it isn't quite right, in ways which don't show up until things are moving fairly fast, at which point we have to worry about special relativity.
Fortunately, there's another way to think about distance and time, which is both compatible with special relativity (in the sense that it is very well approximated by special relativity over small regions) and with the observations of red shift (though for a somewhat subtler reason than you might expect).

Space-time and the metric

In this picture, we don't have separate notions of spatial and temporal distance, but a unified notion. Just as before, every point has a time coordinate \(t\) and space coordinates \(\pmb{x}\). But now we write \[ ds^2 = c^2 dt^2 - a(t)^2 dx^2 \] which is a way of packaging up a combination of difference between the (time and space) coordinates of two nearby events: \(dx^2\) is a shorthand for the sum of the square of the spatial coordinate differences.
So what does this new thing tell us?
If two events happen in the same place, then the time between them is \(ds/c\). If they happen at the same time, then the physical distance between them is \(a(t)\) times the coordinate distance, which is the square root of the sum of the squares of the coordinate differences (i.e. the usual Pythagoras' theorem thing). This really is the physical distance, as in it counts how many meter sticks (or yard sticks if you're old) you'd have to use to join them up.
It's worth noting that over a small enough region of space-time, this is indistinguishable from special relativity: so we really are doing what was hinted at above, by saying that special relativity (great though it is over small regions) might need to be adjusted to deal with large regions.
\(a(t)\) is called the scale factor, and it relates the coordinate distance between two simultaneous events to the spatial distance. It's a function of time, so if we have two objects, both of whose positions are fixed (i.e. their spatial coordinates are unchanging), the distance between them is nevertheless changing as time passes. This isn't because they're moving apart (or together, for that matter), but because the notion of distance is now something that depends on time.
Let's take a look at this: suppose we have two objects, at fixed spatial locations, so the coordinate distance between them, \(d_C\) is unchanging. The physical distance between then is \(d_P = a(t) d_C\). Now we can think about how fast \(d_P\) is changing. We have \[ \dot{d}_P = \dot{a}(t) \times d_C = \frac{\dot{a}(t)}{a(t)} \times a(t) d_C = \frac{\dot{a}(t)}{a(t)} \times d_P. \] So the rate of change of the distance between these fixed objects is proportional to the distance between them, and the Hubble parameter is \(\dot{a}/a\).
When \(a(t)\) is increasing, we loosely (and maybe misleadingly) call this expansion of space, or even worse, expansion of the universe.
OK, we have Hubble's law again, in the sense that the dust particles are all moving apart with a rate proportional to separation. And again, the universe is always full of these dust particles. But now the particles are stationary (meaning that their spatial coordinates don't change) and geometry is evolving so that the distance between them is changing.
But now we get more.
We also find that particles at fixed coordinates are not accelerating, so no force is required to get this effect of everybody seeing all the material in the universe obeying Hubble's law.

Red shift and recession

But there's something fishy about all this.
The original observation that gives rise to Hubble's law is obtained from measurements of red shift, interpreted as relative velocity. What does all that mean if everything is 'staying still' while 'geometry evolves'?
At this point we have to take into account that when we look at things a long way away, we are seeing them as they were a long time ago. In fact, we're seeing them when the scale factor was a bit different.
Amazingly, there's a geometric effect which exactly (well, almost exactly) fits the required bill.
It turns out that if a light signal is emitted at time \(t_e\) and received at time \(t_r\), then the signal is red-shifted; the ratio of the wavelength of the emitted signal to that of the received one is the ratio of the scale factor at these two times. This is called cosmological red shift and is an entirely geometric effect.
How does this fit in with Hubble's observation, though?
If you consider what an observer sees when looking out at a distance object, the cosmological redshift matches (to a very high degree of approximation) the redshift that you would get if there were no cosmological redshift, and the redshift was due to a Doppler effect from a velocity that's just the same as the rate of change of physical distance.
This is even more amazing than the fact that Hubble's law pops out.
Actually, this isn't exactly true. It is, though, true to a first degree of approximation, and it's very hard to measure distances and redshifts accurately enough to see any deviation. More importantly, it doesn't affect the result that the dependence of rate of recession (really redshift) on distance is the same no matter where in the universe you live.

So how does the scale factor evolve?

Up at the top, I was unhappy with the Newtonian approach because it had some mysterious field of force pushing stuff apart. I seem to have just replaced that with a mysterious scale factor that does the same job.
Not so. In the same way as Newtonian gravity tells us how massive bodies pull on each other, Einsteinian gravity has an equation of motion that tell us how \(a(t)\) behaves.
In Einsteinian gravity, there is the metric (and we've restricted attention to a particular class of metric already) and there is a field equation (the Einstein Field Equation, or EFE) which relates how the metric varies in space and time to the matter filling the universe. In order to work out how \(a(t)\) evolves, we need to have a model for the material that fills the universe.
This model comes in two parts.
First, there's the general model. The simplest thing that has a chance of working is a perfect fluid: so we're considering galactic clusters as parcels of a compressible fluid which has negligible viscosity.
Then, there's the specific model. A fluid has a density and a pressure, and a functional relation between the two is called an equation of state. There are standard equations of state for describing a universe full of easily compressible dust, or full of electromagnetic radiation.
Once you've chosen an equation of state, you can work out what the EFE tells us: and this is a differential equation for \(a(t)\), which can then be solved. One part of constructing a good cosmological model is deciding just what equation of state to use for your model.
Let's not go there.
The point is that \(a(t)\) isn't arbitrary; it comes out of the physical model.
In fact, all this theory was worked out by Friedmann decades before Hubble's observational work.
It's rare in science for somebody to manage to predict something before it actually happens, so stop for a moment to be impressed.

What have we got for our effort?

By investing some effort in having a model of space-time in which the geometry itself is dynamic, we get a universe in which Hubble's law is satisfied, but most of the ingredients are now interpreted in a new way. In particular, the recession of distant galaxies is no longer due to them moving, but now due to the fact that the definition of distance itself depends on time in such a way that the distance between stationary objects can be changing in time.

Have I missed anything out?

Oh boy, have I ever.
I've missed out
  • all the controversy in the interpretation of distance/velocity measurements.
  • just about all the calculations that justify the claims I make. Nothing new there.
  • what the Einstein Field Equations say. That's a big topic just in its own right, and it's a bit technical.
  • what the matter model of a perfect fluid actually looks like.
  • any discussion of the curvature of space. (I've only considered the simplest case.)
  • and much, much more.
Fortunately, there's lots of excellent material out there. If all I've done is whet your appetite for more, I'm happy.