Tuesday, May 29, 2018

What does it mean to say that the universe is expanding?

Q: The universe is expanding, so what is it expanding into?
A: The question is wrong.
OK, that answer could use some expansion of its own.

Newtonian Cosmology

Here's one version of the story.
Careful measurements (and careful interpretation of the measurements) tell us that the stuff in the universe is getting further apart. What's more, if we look around us, then to a good degree of approximation, everything is travelling away from us, and the farther away it is, the faster it's getting away from us. In fact, the speed at which distant galaxies are receding from us is proportional to how far away they are. We know this because of red shift in the spectra of these galaxies, from which we can calculate how fast they are travelling in order for a Doppler shift to cause that much red shift.
This seems to suggest that we are (surprisingly or not will depend on your religious and philosophical framework) at the centre of the universe; it is perhaps less flattering that everything is trying to get away from us.
Let's start off by thinking about how special our position is. We think of the universe as being full of a homogeneous dust (for scale, the dust particles are clusters of galaxies). In the simplest model, we are at the origin, and at time \(t\) the the dust particle at position \(\pmb{x}\) has velocity \(\pmb{v}=H(t) \pmb{x}\). Then \(H(t)\) is the scale factor which relates distance to velocity of recession, and we call it \(H\) the Hubble parameter, to honour Hubble who made the initial observations.
But what about somebody who isn't at the origin? What about somebody living in a different dust particle, far, far away, say at position \(\pmb{x}'\)? What would they see?
Let's use \(\pmb{v}'\) to denote the velocity as seen by this observer at position \(\pmb{x}'\). We have to subtract their velocity from the velocity we see at each point to get the velocity relative to them at each point. Then we get \[ \pmb{v}' = H(t) \pmb{x} - H(t) \pmb{x}' = H(t)(\pmb{x}-\pmb{x}') \] But \(\pmb{x}-\pmb{x}'\) is just the position as seen by the observer at \(\pmb{x}'\); this other observer also sees everything in the universe recede at a rate proportional to distance, and it's even the same constant of proportionality. Rather suprprisingly, this suggests that we aren't anywhere special, and that the cosmological principle (or principle of mediocrity) that we are nowhere special is compatible with the observations.
A concrete approach to this that may be useful is the currant cake, or raisin bread analogy. If you think of the dust particles as the items of fruit in the cake, then as it cooks and the cake rises, the currants (or raisins) get farther apart, in just his way -at least, if we assume that the rises process amounts to the whole cake scaling up in a uniform way.
But this model is a bit misleading, as there is an edge to the cake, and the cake is expanding into the surrounding oven. This isn't so for the dust-filled universe.
It's important to notice is that the universe is infinite - it's the whole of three dimensional Euclidean space - forever. And it's always full of this cosmological dust: but if we go back in time, we see that the dust particles are getting closer together, so the dust must be getting denser, and if we go forward in time, it is getting more rarefied. But there's no 'edge' to the material, no expanding of anything, even though the dust particles are getting further apart. This is one of the joys of having infinite space: it can be full of stuff, but there's nothing to stop the contents spreading out.
There's a slight problem when we try to push back to \(t=0\), when the density would have to be infinite: but I avoid that by saying that I have a model for a universe that seems compatible with Hubble's observations for all \(t \gt 0\), and I've no idea what kind of Big Bang happened at \(t=0\); that's not part of my model. There's certainly no sense in which the universe started off as a single 'infinitely dense point' and exploded outward.
So, great news! As long as the universe just happens to be full of stuff exploding outwards, we have a rather neat mathematical model compatible with the observations, and all we need is some physics compatible with this model to make it plausible.
Aye, there's the rub.

Problem One

The first problem is a kinematic one. No matter how small \(H(t)\) is, objects sufficiently far away have to be moving faster than light. This is definitely hard to reconcile with special relativity, and special relativity is very well confirmed these days, so don't just want to abandon it and go back to Newtonian mechanics. This cosmological model really is only good for Newtonian mechanics.
We could try to fix it by slowing down the expansion sufficiently far away, so that Hubble's law is only valid over a certain scale: but that gets hard to reconcile with the cosmological principle.
Or we might try to fiddle with special relativity by arguing that it's great at small scales, but needs correction at larger ones.

Problem Two

But even if we only look for a Newtonian model, there is still a serious problem.
We are at the origin of the universe, and we see everything receding from us. But as thing get farther away, they speed up: everything is not only getting farther apart, but is acceleration. So our distant observer, living in a dust particle far, far away, is accelerating. That's an issue because it means that we are, after all, in a special place. Though all observers see a universe expanding in the same way, only we don't feel a force pushing on us to make us accelerate.
This is actually two problems.
There's the actual physical issue of just what is supposed to be doing the pushing. We need a pretty weird model of gravity to be compatible with gravity being attractive (so as to hold solar systems and galaxies together) at one scale, but repulsive (so as to make stuff accelerate apart at the bigger scale).
And there's the philosophical issue of what makes us so special: in this model we are, once again, in the actual centre of the universe.
So, can we find another way of modelling the universe which retains the good property of satisfying Hubble's law, but doesn't have either of the problems of this repulsive force which only becomes significant at large distances, or puts us in a privileged position?

Relativistic cosmology

Well of course we can. As usual, that's the sort of question that only gets asked if there is a good answer available.
But it involves a radically new way of understanding how the universe fits together.

Newtonian space-time

In the Newtonian picture of space-time, with the origin at the centre of expansion, we have spatial position, represented by a position vector \(\pmb{x}\) and time, represented by \(t\). If we have two events \(A\) and \(B\), which happen respective at position \(\pmb{x}_A\) and time \(t_A\), and at position \(\pmb{x}_B\) and time \(t_B\) then the distance between them is the usual Euclidean distance \(\|\pmb{x}_A -\pmb{x}_B\|\), and the time between them is \(|t_A-t_B|\).
This is all very persuasive, and seems to match well to experience.
But it isn't quite right, in ways which don't show up until things are moving fairly fast, at which point we have to worry about special relativity.
Fortunately, there's another way to think about distance and time, which is both compatible with special relativity (in the sense that it is very well approximated by special relativity over small regions) and with the observations of red shift (though for a somewhat subtler reason than you might expect).

Space-time and the metric

In this picture, we don't have separate notions of spatial and temporal distance, but a unified notion. Just as before, every point has a time coordinate \(t\) and space coordinates \(\pmb{x}\). But now we write \[ ds^2 = c^2 dt^2 - a(t)^2 dx^2 \] which is a way of packaging up a combination of difference between the (time and space) coordinates of two nearby events: \(dx^2\) is a shorthand for the sum of the square of the spatial coordinate differences.
So what does this new thing tell us?
If two events happen in the same place, then the time between them is \(ds/c\). If they happen at the same time, then the physical distance between them is \(a(t)\) times the coordinate distance, which is the square root of the sum of the squares of the coordinate differences (i.e. the usual Pythagoras' theorem thing). This really is the physical distance, as in it counts how many meter sticks (or yard sticks if you're old) you'd have to use to join them up.
It's worth noting that over a small enough region of space-time, this is indistinguishable from special relativity: so we really are doing what was hinted at above, by saying that special relativity (great though it is over small regions) might need to be adjusted to deal with large regions.
\(a(t)\) is called the scale factor, and it relates the coordinate distance between two simultaneous events to the spatial distance. It's a function of time, so if we have two objects, both of whose positions are fixed (i.e. their spatial coordinates are unchanging), the distance between them is nevertheless changing as time passes. This isn't because they're moving apart (or together, for that matter), but because the notion of distance is now something that depends on time.
Let's take a look at this: suppose we have two objects, at fixed spatial locations, so the coordinate distance between them, \(d_C\) is unchanging. The physical distance between then is \(d_P = a(t) d_C\). Now we can think about how fast \(d_P\) is changing. We have \[ \dot{d}_P = \dot{a}(t) \times d_C = \frac{\dot{a}(t)}{a(t)} \times a(t) d_C = \frac{\dot{a}(t)}{a(t)} \times d_P. \] So the rate of change of the distance between these fixed objects is proportional to the distance between them, and the Hubble parameter is \(\dot{a}/a\).
When \(a(t)\) is increasing, we loosely (and maybe misleadingly) call this expansion of space, or even worse, expansion of the universe.
OK, we have Hubble's law again, in the sense that the dust particles are all moving apart with a rate proportional to separation. And again, the universe is always full of these dust particles. But now the particles are stationary (meaning that their spatial coordinates don't change) and geometry is evolving so that the distance between them is changing.
But now we get more.
We also find that particles at fixed coordinates are not accelerating, so no force is required to get this effect of everybody seeing all the material in the universe obeying Hubble's law.

Red shift and recession

But there's something fishy about all this.
The original observation that gives rise to Hubble's law is obtained from measurements of red shift, interpreted as relative velocity. What does all that mean if everything is 'staying still' while 'geometry evolves'?
At this point we have to take into account that when we look at things a long way away, we are seeing them as they were a long time ago. In fact, we're seeing them when the scale factor was a bit different.
Amazingly, there's a geometric effect which exactly (well, almost exactly) fits the required bill.
It turns out that if a light signal is emitted at time \(t_e\) and received at time \(t_r\), then the signal is red-shifted; the ratio of the wavelength of the emitted signal to that of the received one is the ratio of the scale factor at these two times. This is called cosmological red shift and is an entirely geometric effect.
How does this fit in with Hubble's observation, though?
If you consider what an observer sees when looking out at a distance object, the cosmological redshift matches (to a very high degree of approximation) the redshift that you would get if there were no cosmological redshift, and the redshift was due to a Doppler effect from a velocity that's just the same as the rate of change of physical distance.
This is even more amazing than the fact that Hubble's law pops out.
Actually, this isn't exactly true. It is, though, true to a first degree of approximation, and it's very hard to measure distances and redshifts accurately enough to see any deviation. More importantly, it doesn't affect the result that the dependence of rate of recession (really redshift) on distance is the same no matter where in the universe you live.

So how does the scale factor evolve?

Up at the top, I was unhappy with the Newtonian approach because it had some mysterious field of force pushing stuff apart. I seem to have just replaced that with a mysterious scale factor that does the same job.
Not so. In the same way as Newtonian gravity tells us how massive bodies pull on each other, Einsteinian gravity has an equation of motion that tell us how \(a(t)\) behaves.
In Einsteinian gravity, there is the metric (and we've restricted attention to a particular class of metric already) and there is a field equation (the Einstein Field Equation, or EFE) which relates how the metric varies in space and time to the matter filling the universe. In order to work out how \(a(t)\) evolves, we need to have a model for the material that fills the universe.
This model comes in two parts.
First, there's the general model. The simplest thing that has a chance of working is a perfect fluid: so we're considering galactic clusters as parcels of a compressible fluid which has negligible viscosity.
Then, there's the specific model. A fluid has a density and a pressure, and a functional relation between the two is called an equation of state. There are standard equations of state for describing a universe full of easily compressible dust, or full of electromagnetic radiation.
Once you've chosen an equation of state, you can work out what the EFE tells us: and this is a differential equation for \(a(t)\), which can then be solved. One part of constructing a good cosmological model is deciding just what equation of state to use for your model.
Let's not go there.
The point is that \(a(t)\) isn't arbitrary; it comes out of the physical model.
In fact, all this theory was worked out by Friedmann decades before Hubble's observational work.
It's rare in science for somebody to manage to predict something before it actually happens, so stop for a moment to be impressed.

What have we got for our effort?

By investing some effort in having a model of space-time in which the geometry itself is dynamic, we get a universe in which Hubble's law is satisfied, but most of the ingredients are now interpreted in a new way. In particular, the recession of distant galaxies is no longer due to them moving, but now due to the fact that the definition of distance itself depends on time in such a way that the distance between stationary objects can be changing in time.

Have I missed anything out?

Oh boy, have I ever.
I've missed out
  • all the controversy in the interpretation of distance/velocity measurements.
  • just about all the calculations that justify the claims I make. Nothing new there.
  • what the Einstein Field Equations say. That's a big topic just in its own right, and it's a bit technical.
  • what the matter model of a perfect fluid actually looks like.
  • any discussion of the curvature of space. (I've only considered the simplest case.)
  • and much, much more.
Fortunately, there's lots of excellent material out there. If all I've done is whet your appetite for more, I'm happy.

Friday, May 18, 2018

It's all the same to me.

Mathematics is the subject, probably more than any other, that we associate with precision. And yet a particularly powerful tool in mathematics consists of a fruitful failure to distinguish between objects.
The basic idea is that we have a set of objects, but rather than thinking of them all as separate, we partition the set up into a collection of subsets which, between them, cover the entire set, and no two of which have any element in common. These subsets are called equivalence classes, and an element of an equivalence class is a representative. The set of equivalence classes itself is called a quotient space.
And most of the time, it's the quotient space that's the interesting thing, not the equivalence classes themselves. It can be all too easy to lose sight of this if one is buried in the detail of showing that a given relation between elements of a set does this job of partitioning into equivalence classes, and so is an equivalence relation.

Some Examples

A simple example is to partition up positive integers, as normally expressed in base ten, into classes of length; 1 digit numbers, 2 digit number, 3 digit numbers, and so on.
But this isn't very interesting. There's not much useful structure in this quotient space for us to play with. For example, if we add together two 1 digit numbers, sometimes the answer is 1 digit long, and sometimes it is 2 digits long.
On the other hand there are others, just as familiar, but more interesting.


Partition up integers (all of them this time, not just positive ones) into odd and even. This time things are very different. The quotient space consists of just two classes, which we can call ODD and EVEN. If we add together two odd numbers, the answer is even, whatever odd numbers we start with. Similarly if we add an odd and an even, or multiply.
The key here is that we do the arithmetic on two representatives, but no matter which representatives we pick, the answer is in the same equivalence class. This means that the quotient space \(\{\textrm{ODD},\textrm{EVEN}\}\) inherits operations of addition and multiplication from the set of all integers, \(\mathbb{Z}\).
There's another example, just as familiar, but the source of well-known problems.


Two pairs of integers, \((m,n)\) and \((M,N)\) (where \(n\) and \(N\) are non-zero) are equivalent if \(mN=Mn\). This partitions the set of all pairs of integers (where the second integer is not zero) in the same way as being odd or even partitions the set of all integers. Secretly I'm thinking of the first pair as the fraction \(\frac{m}{n}\) and the second pair as the fraction\(\frac{M}{N}\).
This relationship - this equivalence relation - captures exactly the notion that the ratio \(m:n\) is the same as the ratio \(M:N\), in other words that that the two fractions represent the same rational number. We can think of each equivalence class as a rational number, and a pair of integers, i.e. a fraction, in the equivalence class, as a representative.
Then the usual arithmetic of fractions gives a way of adding and multiplying these equivalence classes, so that the equivalence class of the result doesn't depend on the choice of representatives of the classes being added or multiplied. In other words the rules for doing arithmetic with (and simplifying) fractions gives the arithmetic of rational numbers.
I can't shake the suspicion that the core of why schoolchildren have so much difficulty getting to grips with fractions is this idea of different representatives for the same rational number. It's a subtle idea, and it takes quite a lot of getting used to.

Going further

These ideas can be extended in various ways, some more familiar than others.

Congruence arithmetic

In the case of odd and even numbers, although it isn't the way we tend to think about it, we're regarding two numbers as equivalent if their difference is a multiple of \(2\).
Of course, there's nothing special about \(2\). We could pick any number to form the equivalence classes and it would work in just the same way. If we used \(3\), then sensible representatives are \(0,1,2\); if we used \(4\) then \(0,1,2,3\), and so on. If we are being careful, we use a notation such as \([0]\) to denot the equivalence class that \(0\) lives in. To save wear and tear on our fingers and keyboards, we can rather naughtily just use these representative numbers to denote the equivalence classes.
If we obtain our equivalence classes by using \(n\), then we call the resulting quotient space \(\mathbb{Z}_n\).
Actually, there is something a bit special about \(2\). If we multiply together two elements of \(\mathbb{Z}_2\), then the result is only \(0\) if one of the elements is. This is also the case with \(\mathbb{Z}_3\).
But with \(\mathbb{Z}_4\), we see that \(2 \times 2 = 0\).
OK, so from what I've written so far, it might be \(4\) that's special.
Really, what's special - or at least, well-behaved - is \(\mathbb{Z}_p\), where \(p\) is a prime. In this case, a product can only be zero if one of the multiplicands is zero. In fact, if we work with a prime, then we get a structure that is actually better behaved than \(\mathbb{Z}\), the set of all integers. We can divide by any non-zero element, in the sense that if \(a\) is not \(0\), then there is a unique solution to \(ax=1\); and multiplying by this number does just the job of dividing by \(a\). In the jargon, \(\mathbb{Z}_p\) is a field, which basically means that has a well-behaved addition, subtraction, multiplication and division.
This results in both fun and profit: the resulting number systems are fun to play with, and form the mathematical basis for some of the contemporary forms of cryptography.

Complex Numbers

We can use the same ideas to construct the complex numbers without inventing a new kind of 'imaginary' number whose square is negative.
The starting point is to consider the set of all real polynomials, and then to consider two polynomials as equivalent if their difference is a multiple of \(x^2+1\).
Then one (and the useful) representative for any polynomial is the remainder when it is divided by \(x^2+1\), so the representatives are of the form \(a+bx\) where \(a,b\) are real numbers. But then \(x^2 = 1\times(x^2+1)-1=-1\), where I misuse '\(=\)' to mean 'is equivalent to'. Then the arithmetic of these objects looks just like normal complex arithmetic with \(i^2=-1\), except that I have an \(x\) where you might expect to see an \(i\).
To phrase that slightly differently, the complex numbers, \(\mathbb{C}\), are the quotient space obtained by regarding two polynomials as equivalent if they differ by a multiple of \(x^2+1\).
This doesn't give us any algebra we didn't have before, but it does show that there is a perfectly respectable way of constructing the complex numbers out of non-problematic objects. We don't have to postulate some new kind of number whose square is negative, and then worry about whether we have accidentally made mathematics inconsistent. Well, at least we can be sure that complex arithmetic is no more inconsistent than real arithmetic.
And just as there is nothing special about \(2\), there is nothing special about \(1+x^2\); we could use any polynomial here.
In fact, the similarity to \(\mathbb{Z}_n\) is profound.
Just as with the integers, if the polynomial we work with cannot be expressed as a product of lower degree polynomials, we can add, subtract and multiply any pair of polynomials, and also divide by any non-zero one.
This idea lets us construct a whole new collection of algebraic structures, including in particular the finite fields, which are important in aspects of statistical experiment design and error correcting code.

The wild blue yonder

It doesn't stop here. Examples of naturally occurring partitions and quotient spaces abound thoughout mathematics, in geometry, algebra and analysis.

Linear algebra

If \(V\) is a vector space, and \(S\) is a subspace of \(V\), then we can think of two vectors \(u\) and \(v\) as equivalent if \(u-v \in S\). This relationship partitions \(V\) up into subsets that look like copies of \(S\) obtained by translation away from the origin, called cosets. The quotient space is again a vector space.
This turns out to be an important construction in analysis, where \(V\) and \(S\) are spaces of functions.

Group theory

If \(G\) is a group and \(H\) is a subgroup of \(G\), the we can say that two elements \(g_1,g_2 \in G\) are equivalent if \(g_1g_2^{-1} \in H\), and again this gives us a partition of \(G\) into cosets. This time it isn't automatic that the quotient space gets a well-defined multiplication from \(G\); \(H\) has to satisfy a certain condition, called being a normal subgroup.
In the case of Abelian groups, this lets us break a big group down into a kind of tower of other groups, in a way analogous to the prime decomposition of an integer.
It is also important in Galois theory, which relates the structure of certain groups to the solubility of polynomials.


We start off with the \(x-y\) plane, and consider two points to be equivalent if their \(x\) and \(y\) coordinates differ by integer amounts. This time the quotient space is a new surface, which we can visualise as a unit square with opposite edges glued together: a familiar structure in certain video games, where leaving the screen to the left, or top, brings you back in at the right, or bottom. In fact, this quotient space is a torus.
Constructing surfaces as quotient spaces is a powerful tool in the attempt to classify all the possible surfaces. And it also suggest the question of when two surfaces should be regarded as 'really the same': another partitioning of all surfaces described in different ways into equivalence classes of 'the same surface, but a different description'.

And so on

This barely scratches the surface. The more mathematics, and the more mathematical structures, you meet, then more you come across interesting quotient spaces.
Googling for 'quotient' plus the thing you are interested in (group, surface, vector space, and many more) will probably return something cool, so don't waste any more time here, go and look.

Wednesday, May 2, 2018

Base n to mod n in one easy step.

We learn how to write numbers (well, integers) in bases other than ten fairly early in our mathematical education. A bit later, we learn how about congruence arithmetic and doing arithmetic modulo any integer \(n \gt 1\). I remember that there was a certain degree of initial confusion between the two, though that was really just about remembering which label went with what: after all, the two are entirely different, aren't they?

As it turns out, not so much.

Let's have a brief reminder of congruence arithmetic, and then see what representing numbers in different bases has to do with it.

Congruence arithmetic modulo \(n\)

The simplest and most familiar form of congruence arithmetic is something we're all familiar with long before we meet congruence arithmetic as a thing.

We all know right from a very early stage that an even number plus an even number is even, an even plus an odd is odd, and so on. But an even number is one which leaves a remainder of zero when it's divided by two, and an odd number is one which leaves a remainder of one. So these odd and even rules are telling us that when we do sums with integers, the remainder on division by two is well behaved with respect to addition and multiplication. Or in the hieratic jargon, we can do arithmetic modulo two.

It's not so obvious that this works with numbers other than two, but it's just as true.

Choose any integer \(n>1\). We say that two integers, \(a\) and \(b\) are congruent modulo \(n\) if their difference is a multiple of \(n\), or, equivalently, if they have the same remainder on division by \(n\).

Then with a bit of work, we can show that arithmetic modulo \(n\) makes sense: it doesn't matter what numbers we start with, the remainder after doing an arithmetic operation only depends on the remainders of the numbers we combine.

This takes a little bit of algebra. On the other hand...

Arithmetic modulo ten and modulo two

If we carry out the usual addition and multiplication operations of integer arithmetic, writing all numbers in base ten, then it is obvious (as long as we believe the standard algorithms!) that the units digit of a sum or product depends only on the units digits of the numbers multiplied or added.

But the units digit of a number (written in base ten) is just the remainder when you divide by ten! Looking only at the units digits is doing arithmetic modulo ten.

In just the same way, if we write our numbers in binary, then the units digit of a sum or product depends only on the units digits of the numbers combined. And the units digits is \(1\) for an odd number and \(0\) for an even one.

Looking only at the units digits is now doing arithmetic modulo two.

Arithmetic modulo \(n\) again

But there's nothing special about two or ten (at least, not in this regard).

If we choose any integer \(n \gt 1\), and write integers in base \(n\), then the units digit of a sum or product depends only on the units digits of the numbers combined.

And now, looking at the units digits is working modulo \(n\).

So although base \(n\) and modulo \(n\) look like entirely different ideas, they turn out to be rather closely related.

What's the point?

I'm not entirely sure. Thinking about last digits in base \(n\) might be an interesting way to approach congruence arithmetic, as it comes from a more concrete direction. It also suggests various investigative projects, such as when the last digit of a product can be zero, or how negative numbers fit into this picture. For me, the point was really that there is actually a real relationship between these two apparently different bits of maths, and it's always fun to see connections like that.