Wednesday, June 27, 2018

Why is subtraction so hard?

Subtraction presents various problems to learners of mathematics, not least with the mechanics of hand calculating the result of subtracting one multi-digit number from another. But that's not the kind of difficulty I'm interested in: I'm more interested here in the difficulties that arise when computations involving several additions and subtractions of integers are involved, or at the slightly more advanced level where algebraic computations involving several additions and subtractions are required. I'm going to take a look at what I think lies underneath the difficulty.

The procedure required to deal with subtraction starts off deceptively simple.

Addition and subtraction are of the same priority, and so a calculation involving a string of additions and subtractions is calculated from right to left: \[ 1-2+3-4-5=-1+3-4-5=2-4-5=-2-5=-7 \] Slightly more complicated, we may have things inside parentheses which consequently have to be evaluated before the result is combined with the rest: \[ 2-(3-4) = 2-(-1) = 3 \] But already we find lots of mistakes being made. It isn't uncommon for the first calculation to turn into some variation on \[ 1-2+3-4-5=1-2+3-(-1)=-1+2=1 \] or for the second to go down the route of \[ 2-(3-4)=2-3-4=-1-4=-5 \]

The root of the problem

In one sense, the root of the problem is combinatorial. There are lots of different ways of combining the various numbers in the calculations, but they don't all give the same answer. There are more ways of getting the wrong answer than the right one.

But that doesn't really get to the bottom of it. Why do learners combine things in the wrong way when they've been given careful explanation and instruction of the right way? (And it's not just novices: errors along these lines persist with undergraduates!)

I think that the real problem is that subtraction is a horrible operation, in many ways.

The horrors of subtraction

  • First, we learn to add together positive integers, or natural numbers. The result of adding together two natural numbers is always a natural number: addition is a well defined binary operation on natural numbers.

    But subtraction isn't.

    When we subtract one natural number from another, the result can be negative. Suddenly a whole new can of worms is opened up, because we need to deal with the arithmetic of negative numbers, which, as we all know, is a paradise of opportunity for sign errors.

  • Next, subtraction isn't commutative. This doesn't seem to be subtle, but let's take a quick look: \[ 2-1=1 \qquad \text{ but } \qquad 1-2 =-1 \] The answers are different, but only by being of opposite signs; and to rub salt in the wound, the second may well be done by subtracting \(1\) from \(2\) and changing the sign. Losing a minus sign is easy for all of us.
  • Even more horrible, subtraction is not associative: \[ (1-2)-3 \neq 1-(2-3) \]

    But we get very used to using associativity and commutativity without even noticing it when adding. It's much harder to remember not to do it when it's inappropriate, once you've become thoroughly accustomed to doing it.

  • And lastly, to cope with the problem we end up with a plethora of rules about combinations of signs. More to remember, often just regarded as more rules to learn. I cannot express in words the horror I feel when I hear students muttering (or even writing out) "FOIL" when doing algebraic calculations; I haven't yet heard "KFC" from my students when working with fractions, but I've seen it on the interwebs often enough to know it's only a matter of time.
Now, there are many rules in algebra and arithmetic, but a relatively small number of principles (axioms) from which they all follow.

Unfortunately, you can't really teach arithmetic to five-year-olds by starting off with the axioms of a field.

But maybe there comes a time to revisit the collection of rules and see where they come from. At least, I always find it illuminating to go back to elementary stuff and see what new insight I might have based on further study.

Avoiding the issue

So here's my approach to a better (I have many opinions, and very few of them are humble) view of the arithmetic and algebra of subtraction.

Uninvent subtraction.

OK, so I haven't gone back in time and relived my education without the notion. What I mean is that I look back and say that after all this time I now see that there's a better way to think of subtraction.

First, any number has \(a\) an additive inverse \(-a\): a number we add to it to get \(0\). Then if \(a\) and \(b\) are two numbers, I say that \[ a-b \] is a (somewhat undesirable, but unavoidable) notation for \[ a+(-b) \] which is the sum of \(a\) and \(-b\), the additive inverse of \(b\).

And at this point I really, really wish that there were a conventional typographical distinction between \(-\), the binary subtraction operator, and \(-\), the unary additive inverse operator. Oh well.

This has the immediate consequence that we can write our sums slightly differently: \[ 1-2+3-4-5 = 1+ (-2) + 3 +(-4) + (-5) \] and these quantities can now be rearranged at will, since addition is associative and commutative.

In the second example, we have \[ 2-(3-4) = 2 + (-(3-4)) \] And what do we add to \(3-4\) to get \(0\)? we add \(4-3=1\), so \(-(3-4)=1\) and \[ 2-(3-4) = 2 + 1 = 3 \]

Of course, it's still necessary to be able to add together combinations of positive and negative numbers: there is no free lunch here. But it's a way of thinking about the computation that reduces the temptation/opportunity to make mistakes, so maybe it's a slightly cheaper lunch.

One consequence is that if I see \[ x+5=7 \] I don't think 'subtract \(5\) from each side', I think 'add \(-5\) to each side'.

I find it a useful way to think about what's going on.

I try to stress this with my students, but with mixed success. And just about everybody who's teaching early undergraduate material will surely be doing something like this in the students' first encounter with abstract algebra and axiomatic systems.

My general impression is that as long as they're doing a set problem of the 'show from the axioms of a field' type, most students can be persuaded to work this way.

But I find that as soon as that context is gone and they're back in the old familiar territory of doing algebra or arithmetic, for most it's also back to the old familiar way of proceeding. And for quite a few this has just too many opportunities for the old familiar way of going wrong.

But also

I have very similar thoughts about division, the reconstruction of which I leave as an exercise for the sufficiently motivated reader.

Tuesday, May 29, 2018

What does it mean to say that the universe is expanding?

Q: The universe is expanding, so what is it expanding into?
A: The question is wrong.
OK, that answer could use some expansion of its own.

Newtonian Cosmology

Here's one version of the story.
Careful measurements (and careful interpretation of the measurements) tell us that the stuff in the universe is getting further apart. What's more, if we look around us, then to a good degree of approximation, everything is travelling away from us, and the farther away it is, the faster it's getting away from us. In fact, the speed at which distant galaxies are receding from us is proportional to how far away they are. We know this because of red shift in the spectra of these galaxies, from which we can calculate how fast they are travelling in order for a Doppler shift to cause that much red shift.
This seems to suggest that we are (surprisingly or not will depend on your religious and philosophical framework) at the centre of the universe; it is perhaps less flattering that everything is trying to get away from us.
Let's start off by thinking about how special our position is. We think of the universe as being full of a homogeneous dust (for scale, the dust particles are clusters of galaxies). In the simplest model, we are at the origin, and at time \(t\) the the dust particle at position \(\pmb{x}\) has velocity \(\pmb{v}=H(t) \pmb{x}\). Then \(H(t)\) is the scale factor which relates distance to velocity of recession, and we call it \(H\) the Hubble parameter, to honour Hubble who made the initial observations.
But what about somebody who isn't at the origin? What about somebody living in a different dust particle, far, far away, say at position \(\pmb{x}'\)? What would they see?
Let's use \(\pmb{v}'\) to denote the velocity as seen by this observer at position \(\pmb{x}'\). We have to subtract their velocity from the velocity we see at each point to get the velocity relative to them at each point. Then we get \[ \pmb{v}' = H(t) \pmb{x} - H(t) \pmb{x}' = H(t)(\pmb{x}-\pmb{x}') \] But \(\pmb{x}-\pmb{x}'\) is just the position as seen by the observer at \(\pmb{x}'\); this other observer also sees everything in the universe recede at a rate proportional to distance, and it's even the same constant of proportionality. Rather suprprisingly, this suggests that we aren't anywhere special, and that the cosmological principle (or principle of mediocrity) that we are nowhere special is compatible with the observations.
A concrete approach to this that may be useful is the currant cake, or raisin bread analogy. If you think of the dust particles as the items of fruit in the cake, then as it cooks and the cake rises, the currants (or raisins) get farther apart, in just his way -at least, if we assume that the rises process amounts to the whole cake scaling up in a uniform way.
But this model is a bit misleading, as there is an edge to the cake, and the cake is expanding into the surrounding oven. This isn't so for the dust-filled universe.
It's important to notice is that the universe is infinite - it's the whole of three dimensional Euclidean space - forever. And it's always full of this cosmological dust: but if we go back in time, we see that the dust particles are getting closer together, so the dust must be getting denser, and if we go forward in time, it is getting more rarefied. But there's no 'edge' to the material, no expanding of anything, even though the dust particles are getting further apart. This is one of the joys of having infinite space: it can be full of stuff, but there's nothing to stop the contents spreading out.
There's a slight problem when we try to push back to \(t=0\), when the density would have to be infinite: but I avoid that by saying that I have a model for a universe that seems compatible with Hubble's observations for all \(t \gt 0\), and I've no idea what kind of Big Bang happened at \(t=0\); that's not part of my model. There's certainly no sense in which the universe started off as a single 'infinitely dense point' and exploded outward.
So, great news! As long as the universe just happens to be full of stuff exploding outwards, we have a rather neat mathematical model compatible with the observations, and all we need is some physics compatible with this model to make it plausible.
Physics.
Aye, there's the rub.

Problem One

The first problem is a kinematic one. No matter how small \(H(t)\) is, objects sufficiently far away have to be moving faster than light. This is definitely hard to reconcile with special relativity, and special relativity is very well confirmed these days, so don't just want to abandon it and go back to Newtonian mechanics. This cosmological model really is only good for Newtonian mechanics.
We could try to fix it by slowing down the expansion sufficiently far away, so that Hubble's law is only valid over a certain scale: but that gets hard to reconcile with the cosmological principle.
Or we might try to fiddle with special relativity by arguing that it's great at small scales, but needs correction at larger ones.

Problem Two

But even if we only look for a Newtonian model, there is still a serious problem.
We are at the origin of the universe, and we see everything receding from us. But as thing get farther away, they speed up: everything is not only getting farther apart, but is acceleration. So our distant observer, living in a dust particle far, far away, is accelerating. That's an issue because it means that we are, after all, in a special place. Though all observers see a universe expanding in the same way, only we don't feel a force pushing on us to make us accelerate.
This is actually two problems.
There's the actual physical issue of just what is supposed to be doing the pushing. We need a pretty weird model of gravity to be compatible with gravity being attractive (so as to hold solar systems and galaxies together) at one scale, but repulsive (so as to make stuff accelerate apart at the bigger scale).
And there's the philosophical issue of what makes us so special: in this model we are, once again, in the actual centre of the universe.
So, can we find another way of modelling the universe which retains the good property of satisfying Hubble's law, but doesn't have either of the problems of this repulsive force which only becomes significant at large distances, or puts us in a privileged position?

Relativistic cosmology

Well of course we can. As usual, that's the sort of question that only gets asked if there is a good answer available.
But it involves a radically new way of understanding how the universe fits together.

Newtonian space-time

In the Newtonian picture of space-time, with the origin at the centre of expansion, we have spatial position, represented by a position vector \(\pmb{x}\) and time, represented by \(t\). If we have two events \(A\) and \(B\), which happen respective at position \(\pmb{x}_A\) and time \(t_A\), and at position \(\pmb{x}_B\) and time \(t_B\) then the distance between them is the usual Euclidean distance \(\|\pmb{x}_A -\pmb{x}_B\|\), and the time between them is \(|t_A-t_B|\).
This is all very persuasive, and seems to match well to experience.
But it isn't quite right, in ways which don't show up until things are moving fairly fast, at which point we have to worry about special relativity.
Fortunately, there's another way to think about distance and time, which is both compatible with special relativity (in the sense that it is very well approximated by special relativity over small regions) and with the observations of red shift (though for a somewhat subtler reason than you might expect).

Space-time and the metric

In this picture, we don't have separate notions of spatial and temporal distance, but a unified notion. Just as before, every point has a time coordinate \(t\) and space coordinates \(\pmb{x}\). But now we write \[ ds^2 = c^2 dt^2 - a(t)^2 dx^2 \] which is a way of packaging up a combination of difference between the (time and space) coordinates of two nearby events: \(dx^2\) is a shorthand for the sum of the square of the spatial coordinate differences.
So what does this new thing tell us?
If two events happen in the same place, then the time between them is \(ds/c\). If they happen at the same time, then the physical distance between them is \(a(t)\) times the coordinate distance, which is the square root of the sum of the squares of the coordinate differences (i.e. the usual Pythagoras' theorem thing). This really is the physical distance, as in it counts how many meter sticks (or yard sticks if you're old) you'd have to use to join them up.
It's worth noting that over a small enough region of space-time, this is indistinguishable from special relativity: so we really are doing what was hinted at above, by saying that special relativity (great though it is over small regions) might need to be adjusted to deal with large regions.
\(a(t)\) is called the scale factor, and it relates the coordinate distance between two simultaneous events to the spatial distance. It's a function of time, so if we have two objects, both of whose positions are fixed (i.e. their spatial coordinates are unchanging), the distance between them is nevertheless changing as time passes. This isn't because they're moving apart (or together, for that matter), but because the notion of distance is now something that depends on time.
Let's take a look at this: suppose we have two objects, at fixed spatial locations, so the coordinate distance between them, \(d_C\) is unchanging. The physical distance between then is \(d_P = a(t) d_C\). Now we can think about how fast \(d_P\) is changing. We have \[ \dot{d}_P = \dot{a}(t) \times d_C = \frac{\dot{a}(t)}{a(t)} \times a(t) d_C = \frac{\dot{a}(t)}{a(t)} \times d_P. \] So the rate of change of the distance between these fixed objects is proportional to the distance between them, and the Hubble parameter is \(\dot{a}/a\).
When \(a(t)\) is increasing, we loosely (and maybe misleadingly) call this expansion of space, or even worse, expansion of the universe.
OK, we have Hubble's law again, in the sense that the dust particles are all moving apart with a rate proportional to separation. And again, the universe is always full of these dust particles. But now the particles are stationary (meaning that their spatial coordinates don't change) and geometry is evolving so that the distance between them is changing.
But now we get more.
We also find that particles at fixed coordinates are not accelerating, so no force is required to get this effect of everybody seeing all the material in the universe obeying Hubble's law.

Red shift and recession

But there's something fishy about all this.
The original observation that gives rise to Hubble's law is obtained from measurements of red shift, interpreted as relative velocity. What does all that mean if everything is 'staying still' while 'geometry evolves'?
At this point we have to take into account that when we look at things a long way away, we are seeing them as they were a long time ago. In fact, we're seeing them when the scale factor was a bit different.
Amazingly, there's a geometric effect which exactly (well, almost exactly) fits the required bill.
It turns out that if a light signal is emitted at time \(t_e\) and received at time \(t_r\), then the signal is red-shifted; the ratio of the wavelength of the emitted signal to that of the received one is the ratio of the scale factor at these two times. This is called cosmological red shift and is an entirely geometric effect.
How does this fit in with Hubble's observation, though?
If you consider what an observer sees when looking out at a distance object, the cosmological redshift matches (to a very high degree of approximation) the redshift that you would get if there were no cosmological redshift, and the redshift was due to a Doppler effect from a velocity that's just the same as the rate of change of physical distance.
This is even more amazing than the fact that Hubble's law pops out.
Actually, this isn't exactly true. It is, though, true to a first degree of approximation, and it's very hard to measure distances and redshifts accurately enough to see any deviation. More importantly, it doesn't affect the result that the dependence of rate of recession (really redshift) on distance is the same no matter where in the universe you live.

So how does the scale factor evolve?

Up at the top, I was unhappy with the Newtonian approach because it had some mysterious field of force pushing stuff apart. I seem to have just replaced that with a mysterious scale factor that does the same job.
Not so. In the same way as Newtonian gravity tells us how massive bodies pull on each other, Einsteinian gravity has an equation of motion that tell us how \(a(t)\) behaves.
In Einsteinian gravity, there is the metric (and we've restricted attention to a particular class of metric already) and there is a field equation (the Einstein Field Equation, or EFE) which relates how the metric varies in space and time to the matter filling the universe. In order to work out how \(a(t)\) evolves, we need to have a model for the material that fills the universe.
This model comes in two parts.
First, there's the general model. The simplest thing that has a chance of working is a perfect fluid: so we're considering galactic clusters as parcels of a compressible fluid which has negligible viscosity.
Then, there's the specific model. A fluid has a density and a pressure, and a functional relation between the two is called an equation of state. There are standard equations of state for describing a universe full of easily compressible dust, or full of electromagnetic radiation.
Once you've chosen an equation of state, you can work out what the EFE tells us: and this is a differential equation for \(a(t)\), which can then be solved. One part of constructing a good cosmological model is deciding just what equation of state to use for your model.
Let's not go there.
The point is that \(a(t)\) isn't arbitrary; it comes out of the physical model.
In fact, all this theory was worked out by Friedmann decades before Hubble's observational work.
It's rare in science for somebody to manage to predict something before it actually happens, so stop for a moment to be impressed.

What have we got for our effort?

By investing some effort in having a model of space-time in which the geometry itself is dynamic, we get a universe in which Hubble's law is satisfied, but most of the ingredients are now interpreted in a new way. In particular, the recession of distant galaxies is no longer due to them moving, but now due to the fact that the definition of distance itself depends on time in such a way that the distance between stationary objects can be changing in time.

Have I missed anything out?

Oh boy, have I ever.
I've missed out
  • all the controversy in the interpretation of distance/velocity measurements.
  • just about all the calculations that justify the claims I make. Nothing new there.
  • what the Einstein Field Equations say. That's a big topic just in its own right, and it's a bit technical.
  • what the matter model of a perfect fluid actually looks like.
  • any discussion of the curvature of space. (I've only considered the simplest case.)
  • and much, much more.
Fortunately, there's lots of excellent material out there. If all I've done is whet your appetite for more, I'm happy.

Friday, May 18, 2018

It's all the same to me.

Mathematics is the subject, probably more than any other, that we associate with precision. And yet a particularly powerful tool in mathematics consists of a fruitful failure to distinguish between objects.
The basic idea is that we have a set of objects, but rather than thinking of them all as separate, we partition the set up into a collection of subsets which, between them, cover the entire set, and no two of which have any element in common. These subsets are called equivalence classes, and an element of an equivalence class is a representative. The set of equivalence classes itself is called a quotient space.
And most of the time, it's the quotient space that's the interesting thing, not the equivalence classes themselves. It can be all too easy to lose sight of this if one is buried in the detail of showing that a given relation between elements of a set does this job of partitioning into equivalence classes, and so is an equivalence relation.

Some Examples

A simple example is to partition up positive integers, as normally expressed in base ten, into classes of length; 1 digit numbers, 2 digit number, 3 digit numbers, and so on.
But this isn't very interesting. There's not much useful structure in this quotient space for us to play with. For example, if we add together two 1 digit numbers, sometimes the answer is 1 digit long, and sometimes it is 2 digits long.
On the other hand there are others, just as familiar, but more interesting.

Parity

Partition up integers (all of them this time, not just positive ones) into odd and even. This time things are very different. The quotient space consists of just two classes, which we can call ODD and EVEN. If we add together two odd numbers, the answer is even, whatever odd numbers we start with. Similarly if we add an odd and an even, or multiply.
The key here is that we do the arithmetic on two representatives, but no matter which representatives we pick, the answer is in the same equivalence class. This means that the quotient space \(\{\textrm{ODD},\textrm{EVEN}\}\) inherits operations of addition and multiplication from the set of all integers, \(\mathbb{Z}\).
There's another example, just as familiar, but the source of well-known problems.

Fractions

Two pairs of integers, \((m,n)\) and \((M,N)\) (where \(n\) and \(N\) are non-zero) are equivalent if \(mN=Mn\). This partitions the set of all pairs of integers (where the second integer is not zero) in the same way as being odd or even partitions the set of all integers. Secretly I'm thinking of the first pair as the fraction \(\frac{m}{n}\) and the second pair as the fraction\(\frac{M}{N}\).
This relationship - this equivalence relation - captures exactly the notion that the ratio \(m:n\) is the same as the ratio \(M:N\), in other words that that the two fractions represent the same rational number. We can think of each equivalence class as a rational number, and a pair of integers, i.e. a fraction, in the equivalence class, as a representative.
Then the usual arithmetic of fractions gives a way of adding and multiplying these equivalence classes, so that the equivalence class of the result doesn't depend on the choice of representatives of the classes being added or multiplied. In other words the rules for doing arithmetic with (and simplifying) fractions gives the arithmetic of rational numbers.
I can't shake the suspicion that the core of why schoolchildren have so much difficulty getting to grips with fractions is this idea of different representatives for the same rational number. It's a subtle idea, and it takes quite a lot of getting used to.

Going further

These ideas can be extended in various ways, some more familiar than others.

Congruence arithmetic

In the case of odd and even numbers, although it isn't the way we tend to think about it, we're regarding two numbers as equivalent if their difference is a multiple of \(2\).
Of course, there's nothing special about \(2\). We could pick any number to form the equivalence classes and it would work in just the same way. If we used \(3\), then sensible representatives are \(0,1,2\); if we used \(4\) then \(0,1,2,3\), and so on. If we are being careful, we use a notation such as \([0]\) to denot the equivalence class that \(0\) lives in. To save wear and tear on our fingers and keyboards, we can rather naughtily just use these representative numbers to denote the equivalence classes.
If we obtain our equivalence classes by using \(n\), then we call the resulting quotient space \(\mathbb{Z}_n\).
Actually, there is something a bit special about \(2\). If we multiply together two elements of \(\mathbb{Z}_2\), then the result is only \(0\) if one of the elements is. This is also the case with \(\mathbb{Z}_3\).
But with \(\mathbb{Z}_4\), we see that \(2 \times 2 = 0\).
OK, so from what I've written so far, it might be \(4\) that's special.
Really, what's special - or at least, well-behaved - is \(\mathbb{Z}_p\), where \(p\) is a prime. In this case, a product can only be zero if one of the multiplicands is zero. In fact, if we work with a prime, then we get a structure that is actually better behaved than \(\mathbb{Z}\), the set of all integers. We can divide by any non-zero element, in the sense that if \(a\) is not \(0\), then there is a unique solution to \(ax=1\); and multiplying by this number does just the job of dividing by \(a\). In the jargon, \(\mathbb{Z}_p\) is a field, which basically means that has a well-behaved addition, subtraction, multiplication and division.
This results in both fun and profit: the resulting number systems are fun to play with, and form the mathematical basis for some of the contemporary forms of cryptography.

Complex Numbers

We can use the same ideas to construct the complex numbers without inventing a new kind of 'imaginary' number whose square is negative.
The starting point is to consider the set of all real polynomials, and then to consider two polynomials as equivalent if their difference is a multiple of \(x^2+1\).
Then one (and the useful) representative for any polynomial is the remainder when it is divided by \(x^2+1\), so the representatives are of the form \(a+bx\) where \(a,b\) are real numbers. But then \(x^2 = 1\times(x^2+1)-1=-1\), where I misuse '\(=\)' to mean 'is equivalent to'. Then the arithmetic of these objects looks just like normal complex arithmetic with \(i^2=-1\), except that I have an \(x\) where you might expect to see an \(i\).
To phrase that slightly differently, the complex numbers, \(\mathbb{C}\), are the quotient space obtained by regarding two polynomials as equivalent if they differ by a multiple of \(x^2+1\).
This doesn't give us any algebra we didn't have before, but it does show that there is a perfectly respectable way of constructing the complex numbers out of non-problematic objects. We don't have to postulate some new kind of number whose square is negative, and then worry about whether we have accidentally made mathematics inconsistent. Well, at least we can be sure that complex arithmetic is no more inconsistent than real arithmetic.
And just as there is nothing special about \(2\), there is nothing special about \(1+x^2\); we could use any polynomial here.
In fact, the similarity to \(\mathbb{Z}_n\) is profound.
Just as with the integers, if the polynomial we work with cannot be expressed as a product of lower degree polynomials, we can add, subtract and multiply any pair of polynomials, and also divide by any non-zero one.
This idea lets us construct a whole new collection of algebraic structures, including in particular the finite fields, which are important in aspects of statistical experiment design and error correcting code.

The wild blue yonder

It doesn't stop here. Examples of naturally occurring partitions and quotient spaces abound thoughout mathematics, in geometry, algebra and analysis.

Linear algebra

If \(V\) is a vector space, and \(S\) is a subspace of \(V\), then we can think of two vectors \(u\) and \(v\) as equivalent if \(u-v \in S\). This relationship partitions \(V\) up into subsets that look like copies of \(S\) obtained by translation away from the origin, called cosets. The quotient space is again a vector space.
This turns out to be an important construction in analysis, where \(V\) and \(S\) are spaces of functions.

Group theory

If \(G\) is a group and \(H\) is a subgroup of \(G\), the we can say that two elements \(g_1,g_2 \in G\) are equivalent if \(g_1g_2^{-1} \in H\), and again this gives us a partition of \(G\) into cosets. This time it isn't automatic that the quotient space gets a well-defined multiplication from \(G\); \(H\) has to satisfy a certain condition, called being a normal subgroup.
In the case of Abelian groups, this lets us break a big group down into a kind of tower of other groups, in a way analogous to the prime decomposition of an integer.
It is also important in Galois theory, which relates the structure of certain groups to the solubility of polynomials.

Geometry

We start off with the \(x-y\) plane, and consider two points to be equivalent if their \(x\) and \(y\) coordinates differ by integer amounts. This time the quotient space is a new surface, which we can visualise as a unit square with opposite edges glued together: a familiar structure in certain video games, where leaving the screen to the left, or top, brings you back in at the right, or bottom. In fact, this quotient space is a torus.
Constructing surfaces as quotient spaces is a powerful tool in the attempt to classify all the possible surfaces. And it also suggest the question of when two surfaces should be regarded as 'really the same': another partitioning of all surfaces described in different ways into equivalence classes of 'the same surface, but a different description'.

And so on

This barely scratches the surface. The more mathematics, and the more mathematical structures, you meet, then more you come across interesting quotient spaces.
Googling for 'quotient' plus the thing you are interested in (group, surface, vector space, and many more) will probably return something cool, so don't waste any more time here, go and look.

Wednesday, May 2, 2018

Base n to mod n in one easy step.

We learn how to write numbers (well, integers) in bases other than ten fairly early in our mathematical education. A bit later, we learn how about congruence arithmetic and doing arithmetic modulo any integer \(n \gt 1\). I remember that there was a certain degree of initial confusion between the two, though that was really just about remembering which label went with what: after all, the two are entirely different, aren't they?

As it turns out, not so much.

Let's have a brief reminder of congruence arithmetic, and then see what representing numbers in different bases has to do with it.

Congruence arithmetic modulo \(n\)

The simplest and most familiar form of congruence arithmetic is something we're all familiar with long before we meet congruence arithmetic as a thing.

We all know right from a very early stage that an even number plus an even number is even, an even plus an odd is odd, and so on. But an even number is one which leaves a remainder of zero when it's divided by two, and an odd number is one which leaves a remainder of one. So these odd and even rules are telling us that when we do sums with integers, the remainder on division by two is well behaved with respect to addition and multiplication. Or in the hieratic jargon, we can do arithmetic modulo two.

It's not so obvious that this works with numbers other than two, but it's just as true.

Choose any integer \(n>1\). We say that two integers, \(a\) and \(b\) are congruent modulo \(n\) if their difference is a multiple of \(n\), or, equivalently, if they have the same remainder on division by \(n\).

Then with a bit of work, we can show that arithmetic modulo \(n\) makes sense: it doesn't matter what numbers we start with, the remainder after doing an arithmetic operation only depends on the remainders of the numbers we combine.

This takes a little bit of algebra. On the other hand...

Arithmetic modulo ten and modulo two

If we carry out the usual addition and multiplication operations of integer arithmetic, writing all numbers in base ten, then it is obvious (as long as we believe the standard algorithms!) that the units digit of a sum or product depends only on the units digits of the numbers multiplied or added.

But the units digit of a number (written in base ten) is just the remainder when you divide by ten! Looking only at the units digits is doing arithmetic modulo ten.

In just the same way, if we write our numbers in binary, then the units digit of a sum or product depends only on the units digits of the numbers combined. And the units digits is \(1\) for an odd number and \(0\) for an even one.

Looking only at the units digits is now doing arithmetic modulo two.

Arithmetic modulo \(n\) again

But there's nothing special about two or ten (at least, not in this regard).

If we choose any integer \(n \gt 1\), and write integers in base \(n\), then the units digit of a sum or product depends only on the units digits of the numbers combined.

And now, looking at the units digits is working modulo \(n\).

So although base \(n\) and modulo \(n\) look like entirely different ideas, they turn out to be rather closely related.

What's the point?

I'm not entirely sure. Thinking about last digits in base \(n\) might be an interesting way to approach congruence arithmetic, as it comes from a more concrete direction. It also suggests various investigative projects, such as when the last digit of a product can be zero, or how negative numbers fit into this picture. For me, the point was really that there is actually a real relationship between these two apparently different bits of maths, and it's always fun to see connections like that.

Monday, April 9, 2018

The two child puzzle

How careful do you have to be with probability problems? Very careful. Very, very careful. Lack of enough care leads to (apparent) paradoxes.

The puzzle

So, here's the puzzle. It's something of a classic in probability.

I tell you that John Smith has two children. I also tell you that one of them is a girl. I then ask you for the probability that the other is a girl.

Obviously, we need some ground rules here. We'll work with the slight simplification that boys and girls are equally likely, and that the probabilities of boy and girl are independent.

A plausible answer, and maybe the one I'm hoping for, is that since boys and girls are equally likely, the probability that the other is a girl is \(1/2\).

Let's pretend that was your answer, because now I get the chance to show you something interesting.

Ah-hah! I pounce.

No, you aren't thinking it through carefully enough. If you know that one child is a girl, then all you know about the children is that they aren't both boys. There are four equally likely sets of children: boy-boy, boy-girl, girl-boy and girl-girl. Since we've exclude boy-boy, then we are equally likely to have any of the remaining possibilities. In one third of them is the other child a girl, so the probability is actually \(1/3\).

I can dress this up more formally. \(A\) is the event John Smith has a daughter and \(B\) is the event John Smith has two daughters. I want \(P(B|A)\), the probability of \(B\) given \(A\). Fortunately, there's a formula for that. Since \[ P(A \cap B) = P(B|A)P(A) \] and \(P(A \cap B)=P(B)\) (since \(A \cap B = B\)), I see that \[ P(B|A) = \frac{P(B)}{P(A)} \] I know that \(P(B)=1/4\) and \(P(A)=3/4\), so (just as I said above), \(P(B|A)=1/3\).

This seems strange, and counter-intuitive. But I'm pretty sure I didn't get it wrong, because I wrote a little bit of code and simulated it: the following code randomly creates \(n\) two-child families, and finds the fraction of the time that a family with at least one daughter actually has two.

import random

def sim(n):
    hasGirl=0
    hasTwoGirls=0
    for ii in range(n):   
        kids=[random.randrange(2),random.randrange(2)]
        if sum(kids)>0:
            hasGirl += 1
        if sum(kids)==2:
            hasTwoGirls+=1
    print(1.0*hasTwoGirls/(1.0*hasGirl))
Running this for moderately large \(n\) certainly suggests that the probability is \(1/3\).

How much more convincing can you get?

But then I read this book, Standard Deviations by Gary Smith. It's an excellent exposition of how statistics and data can be misused to mislead. I was thoroughly enjoying it until...

He considers this problem of the two children and demolishes the above argument.

The puzzle redux

John Smith has two children, and takes one of them for a walk. If the child that he took for a walk was a girl, what is the probability that the other child is also a girl?

"Ooh, I know this one," I thought. "It's a third."

And then I read on. Smith carefully and lucidly points one that this is wrong, and that the probability actually is \(1/2\).

Naturally I took personal offence at this, so I read it carefully a few times, trying to find his mistake.

I didn't find one.

So I did the obvious thing. I worked out the probability carefully.

There are two things I'm interested in here. \(A\) is the event The child John Smith took a walk with is a daughter and \(B\) is the event Both of John Smith's children are girls.

I want to know \(P(B|A)\). Just as before, I have \(A \cap B = B\), so that \[ P(B|A) = \frac{P(A \cap B)}{P(B)}=\frac{P(B)}{P(A)} \] But \(P(B) = 1/4\) and \(P(A)=1/2\), so \(P(B|A)=1/2\).

By now I was feeling decidedly uncomfortable, so I did the next obvious thing. I wrote a little bit of code and simulated the situation where a randomly chose child from a randomly chosen family is taken for a walk \(n\) times, and finds what proportion of the time the child taken for a walk is a girl, she has a sister at home:

import random

def walksim(n):
    girlWalk=0
    hasTwoGirls=0
    for ii in range(n):
        kids=[random.randrange(2),random.randrange(2)]
        if kids[random.randrange(2)]==1:
            girlWalk += 1
            if sum(kids)==2:
                hasTwoGirls+=1
    print((1.0*hasTwoGirls)/(1.0*girlWalk))
Running this was moderately large values of \(n\) convinced me that yes, indeed, the probability was \(1/2\).

By now I was thoroughly confused.

So What's Going On?

According to a quotation popularly attributed to Einstein, it's a mark of lunacy to keep asking the same question, hoping for a different answer. So naturally, I was a little concerned about just what it might mean if you managed to actually get a different answer.

So I thought about it a bit harder, and came to finally came to a (in retrospect, obvious) conclusion: they aren't actually the same puzzle.

So what's the difference? In version one, you're told that one of John Smith's children is a daughter. In version two, you're told that John Smith has taken one of his children for a walk, and that child is a daughter. Either way, we've been told that one of John Smith's children is a daughter.

But we haven't actually been given quite the same information.

Version 1 simply guarantees that at least one child is a daughter; version 2 tells us that the particular child that John Smith took for a walk is a daughter. The possibility that the child left at home is a daughter is excluded from the calculation in version 2, but included in version 1, which is why the probability that both are daughters is lower in version 1 than in version 2.

So the whole confusion boils down to what you think it means to be told that One of John Smith's children is a daughter. If you interpret it as saying no more than John Smith has at least one daughter, then the answer is \(1/3\). But if you interpret as saying that one particular child is a daughter (the older one, or the one standing on the left, or the only one who has ever been locked alone in the cupboard under the stairs for misbehaving, or, indeed, the one that he took for a walk that day) then the answer is \(1/2\).

To the extend that there's a disagreement here, it isn't an internal disagreement in probability theory (which would be a Very Bad Thing Indeed), it's a disagreement of interpretation of what One of John Smith's children is a daughter means.

The moral

As is often the case, the problem isn't the mathematical analysis: in each version that was straightforward. The problem is setting up the right mathematical model for the situation, which can be very tricky if there's any linguistic ambiguity in the description.

Sunday, March 25, 2018

Cantor and infinity in a countable universe.

Cantor's theorem states that the cardinality of a set is always strictly less than the cardinality of its power set. But there is a universe for set theory which is countable. What's going on here?

There's a really nice PBS Infinite Series video on Youtube How Big are all Infinities Combined, exploring the sizes of infinite sets. (If you haven't watched it, this would be a great time. I'll wait.) In this video they make heavy use of Cantor's theorem, which says

Given any set \(S\), its power set \(\mathcal{P}(S)\) is bigger than \(S\)
where \(\mathcal{P}(S)\) is the set of all subsets of \(S\).

The amazing thing about this is that it is true even when \(S\) is infinite, though we have to work a little to say what we mean by bigger than here: what we mean is that there is no injective function \(\mathcal{P}(S) \to S\), though there is an obvious injective function \(i:S \to \mathcal{P}(S)\), given by \(i(x)=\{x\}\) for each \(x \in S\).

It's a non-trivial fact (the Cantor-Schröder-Bernstein theorem) that given two sets \(A\) and \(B\), if there is an injection from \(A\) to \(B\) and an injection from \(B\) to \(A\), then there is a bijection from \(A\) to \(B\); so it makes sense to say that \(A\) is smaller than (or equal to) \(B\) in size if there is an injection from \(A\) to \(B\), but none from \(B\) to \(A\).

This immediately tells us that there is a whole tower of infinite sets that we can make; starting with the natural numbers \(\mathbb{N}\), then we can think about \(\mathcal{P}(\mathbb{N})\), \(\mathcal{P}(\mathcal{P}(\mathbb{N})\), and so on, each of them strictly larger than its predecessor.

It's not very hard to prove that the second of these, \(\mathcal{P}(\mathbb{N})\), is the same size as \(\mathbb{R}\). It is an interesting question whether there is any infinite size between that of \(\mathbb{N}\) and \(\mathbb{R}\), but one which I won't consider here.

In the video (you have watched it now, right?) the question is how big all the infinities combined is.

But there's even an issue with what it means for one infinite set to be bigger than another, and that's what I want to explore here.

The problem is, to a large extent, knowing just what sets and functions are.

It's easy enough to decide what a function is if you know what a set is.

If \(A\) and \(B\) are sets, we think informally as a function \(A \to B\) as a rule that assigns to each element of \(A\) an element of \(B\). We can formalise this by saying that a function \(A \to B\) is a subset of \(A \times B\) (the set of all ordered pairs \((a,b)\) where \(a \in A\) and \(b \in B\)) which has the property that each element of \(A\) appears in exactly one of the ordered pairs.

But what's a set?

It would be really nice if we could just say that a set was a (somehow) well-defined collection of objects. But trying to do without some kind of restriction on how we define the sets tends to lead to real problems, the most famous of which is Russell's paradox:

Let \(X\) be the set of all sets which are not elements of themselves. Then if \(X \in X\), by the definition of \(X\), \(X \not \in X\). But conversely, if \(X \not \in X\) then again by the definition of \(X\), \(X \in X\).
This is a genuine paradox in the world of naive set theory, so we need to be a bit more careful about what counts as a description of the elements comprising a set if we want to avoid problems like this.

The (ok, a) solution is to develop axiomatic set theory, and one particularly popular version of that is ZFC (which stands for Zermelo-Fraenkel plus the Axiom of Choice). I won't explain just how ZFC works here, but in essence it is a set of rules that tell us how we can build new sets out of existing ones, and a couple of particular sets that we insist on (so that we can build others out of them). One of the sets we insist on is the set of von Neumann integers, so \(\mathbb{N}\) is definitely in in there.

There's always the question of whether our axioms are consistent. Most mathematicians are happy that the ZFC axioms are in fact consistent, partly because we can describe a collection of objects (which we call sets) that seem to satisfy all the axioms. The study of particular collections of objects and operations on them which satisfy particular sets of axioms is called model theory, and it throws up some surprising results.

We'll call a collection of sets that satisfy the axioms of ZFC a universe.

There is a fundamental theorem of model theory, the downward Löwenheim-Skolem theorem that tells us that there is a countable universe (yes, the entire universe is countable) that satisfies the axioms of ZFC.

But there's clearly something fishy about this. Cantor's theorem also follows from these axioms of set theory, and that tells us that the power set of \(\mathbb{N}\) is uncountable.

How does an uncountable set fit into a countable universe? This is Skolem's paradox.

And this is where the issue of just what is a set rears up in all its glory and horror.

Let's review the situation.

The entire universe is countable, so in this model of set theory there are only countably many subsets of \(\mathbb{N}\). (This model clearly fails to include most of the subsets that we might think ought to be there.) Since the universe is countable and \(\mathcal{P}(\mathbb{N})\) is countable, there is a bijection between \(\mathbb{N}\) and \(\mathcal{P}(\mathbb{N})\).

But Cantor has proved that there can be no such bijection.

But remember what a function \(A \to B\) is: it is a subset of \(A \times B\) satisfying a certain constraint, namely that every element of \(A\) occurs in exactly on of the ordered pairs.

We know that the \(\mathcal{P}(\mathbb{N})\) in this universe is countable, so there is a collection of ordered pairs \((n,X)\), one for each natural number \(n\) and such that every \(X \subseteq \mathbb{N}\) in this universe appears in the collection.

The escape clause is the one that tells us that this collection is not—cannot be—one of the sets in the universe.

Looking at the universe from outside, we see a function—what we might call an external function—giving a bijection between \(\mathbb{N}\) and this universe's model of \(\mathcal{P}(\mathbb{N})\).

But if we live inside the universe, we can't see this: it doesn't exist in the universe. There is no internal function which matches the natural numbers with the sets of natural numbers.

You may well have an objection to raise at this point: the real numbers are a complete ordered field, and it's also a theorem that this defines them up to isomorphism. And we know that the real numbers (at least in the mathematical universe we're accustomed to) really are uncountable. So how do they fit into this countable universe?

The answer to this is unavoidably a little technical.

The completeness axiom for the real numbers says something about all subsets of \(\mathbb{R}\); this is a second order axiom. The axioms of ZFC aren't powerful enough for this type of operation. They are first order, which (roughly speaking) means that they let you say something about all elements of a set, but not all subsets of it.

If you use the first order version of the axioms for a complete ordered field, there is more than one structure satisfying the axioms. Some are countable (as must be the case in the countable universe of ZFC guaranteed by the Löwenheim-Skolem Theorem), and some are larger (indeed there are models of all infinite cardinalities).

So all the above is true if we use the power of first order axioms to specify what a set is. If we allow second order axioms, so that we can use the notion of 'all subsets', then the game changes. Then why don't we use some kind of second order theory, and avoid this kind of nonsense?

It's very tempting to do this. Unfortunately, there are problems with second order theories too. Though it solves some of the problems, for example tying down the real numbers much more specifically, the notion of proof in a second order theory is not as well behaved as in first order theory. Consequently, logicians who work on the foundations of mathematics explore the strengths and weaknesses of first and second order theories.

And what about the normal 'working mathematicians'? Most aren't particularly concerned with these foundational issues: and it's quite possible to go through a degree, a PhD, and a research career in mathematics without meeting or having any reason to care about them. In any given context, most mathematicians simply use first or second order axioms according to which are more convenient.

Acknowledgement

This post was sparked off by the Infinite Series video How Big are all Infinities Combined. Gabe Perez-Giz (@fizziksgabe) also bears some responsibility, since when I made a brief comment on twitter about this he asked if I had a blog post about it. Well, now I do. Thanks also to James Propp (@JimPropp) for some helpful suggestions on presentation.

Sunday, March 18, 2018

How old is the Universe?

The Universe is about fifteen billion years old, and started off with a Big Bang, as any fule kno. Well, kind of, but that's more than a slight over-simplification.

There are two fairly obvious questions: first, how do we know? and second, what does that actually mean? Advance warning: there will be mathematics. But if you've got as far as knowing what a differential equation is you should have all you need to cope.

Modelling the Universe

The Mathematics Contribution

Unfortunately, there's no apparent way of directly observing the age of the universe. Any measurement of its age is going to be pretty indirect, and rely quite heavily having on some kind of theory or model of the universe.

First, we'll find a mathematical framework for the model, then incorporate physics into it later.

To start off with, here are some fairly plausible assumptions about the nature of the universe.

  1. The universe is homogeneous
  2. The universe is isotropic
The first assumption is that the universe looks (at any given time) much the same no matter where you are; the second is that it is much the same in every direction. We know that the universe looks (pretty much) isotropic from where we are; assuming that we are nowhere special and the universe is much the same at every point implies that it is also isotropic everywhere.

From these two, we can deduce that (in terms of some very well-chosen coordinates) the geometry of the universe is given by the metric \[ ds^2 = c^2 dt^2 -a(t)^2\left(\frac{dr^2}{1-kr^2} + r^2( d\theta^2+\sin^2(\theta)d\phi^2) \right). \] where \(k\) is either \(-1\), \(0\) or \(1\), and \(a(t)\) is some as yet undetermined function called the scale factor.

The metric tells us about the nature of the separation between two nearby points. There's a lot packed up in that, so it's worth taking a closer look.

Consider a couple of nearby points in the universe, where nearby means that the differences in their coordinates are small, and we call those differences \(dt, dr, d\theta\) and \(d\phi\) . There are a couple of interesting cases.

If the points have the same \(r, \theta, \phi\) coordinates, then \(ds^2=c^2 dt^2\), and \(dt\) is just how much time passes between the two.

On the other hand, if the points are at the same time, then \[ -ds^2 = a(t)^2\left(\frac{dr^2}{1-kr^2} + r^2( d\theta^2+\sin^2(\theta)d\phi^2) \right) \] is the square of the distance between them. We can see how we can regard this distance as the product of \[ \sqrt{\frac{dr^2}{1-kr^2} + r^2( d\theta^2+\sin^2(\theta)d\phi^2))} \] which we call the coordinate distance and \(a(t)\) which we call the scale factor.

The different values of \(k\) tell us about the spatial geometry of the universe.

  1. If \(k=-1\) the universe has negative spatial curvature. This means that if we consider a sphere of radius \(R\), the volume of the enclosed space grows faster than \(4\pi R^3/3\).
  2. If \(k=0\) we have flat space. Then the volume enclosed by a sphere of radius \(R\) is the familiar \(4 \pi R^3/3\).
  3. If \(k=1\) then the universe has positive spatial curvature. In this case the volume enclosed in a sphere of radius \(R\) is less than \(4 \pi R^3/3\).
There is nothing to choose between these possibilities from the point of view of the mathematics. Observationally though, as far as we can tell the universe is spatially flat, i.e. \(k=0\).

It's also worth making a note that although the coordinates make it look as if there's a centre of the universe at at \(r=0\), this isn't true. This \((r,\theta,\phi)\) system of coordinates are just the familiar spherical polar coordinates: we can equally well choose any point to be at \(r=0\).

So our two assumptions mean the the universe can be described in terms of just one dynamical quantity, called the scale factor. This function \(a(t)\) described the entire history of the universe. But the mathematics doesn't tell use anything about it. For that, we require some physical input.

The Physics Contribution

In order to figure out how \(a(t)\) behaves, we need to know what kind of material the universe is full of. Then the Einstein Field Equations tell us the relationship between what the universe is filled with and its geometry (as expressed by the metric), and can be used to find the metric.

So we need a good candidate for the stuff that fills the universe.

One plausible approximation is to thing of the universe as full of a fluid, whose constituent 'atoms' are clusters of galaxies. (Obviously we are working on a very large scale here for this approximation to be sensible.) It is also reasonable to take this fluid to have negligible viscosity, so we model the content of the universe as an ideal fluid.

Given this, the state of the fluid is given by two quantities, the pressure \(p\) and density \(\rho\). Because of the homogeneity and isotropy assumptions, these only depend on \(t\). With quite a lot of calculation, one eventually finds a pair of coupled ordinary differential equations for \(a\), \(\rho\) and \(p\): \[ \begin{split} \left( \frac{\dot{a}}{a} \right)^2 &= \frac{8\pi G}{3}\rho +\frac{\Lambda c^2}{3} - \frac{kc^2}{a^2}\\ \left(\frac{\ddot{a}}{a}\right) &= -\frac{4 \pi G}{3}\left( \rho + \frac{3p}{c^2} \right) + \frac{\Lambda c^2}{3} \end{split} \] In these equations, \(G\) is the gravitational constant, and \(\Lambda\) is the cosmological constant, which appears naturally in the Einstein Field Equations. For a long time it was believed to be \(0\), but recent measurements suggest is is a very small positive quantity.

OK, so this gives us two equations: but there are three quantities to be dealt with, \(a\), \(p\) and \(\rho\). We're missing some information.

To fill the gap, we need to know a bit more than just that the universe is full of an ideal fluid. We need to know what kind of ideal fluid it is. And one way of saying that is by giving a relationship between \(p\) and \(\rho\), which relationship we call the equation of state. Equipped with this we have, at least in principle, all the information required to specify the history of the universe in terms of \(a\), \(p\) and \(\rho\) as functions of \(t\).

A Brief Digression on the Hubble Parameter

In passing, it's worth noting that that quantity \(\dot{a}/a\) which appears in the first equation above is our old friend the Hubble parameter, usually denoted \(H(t)\). (I don't like calling it the Hubble constant, since it's a function of time.) At any time \(t\), the Hubble parameter \(H(t)\) tells us us the relationship between the distance between two objects at fixed spatial coordinates (i.e. fixed \((r,\theta,\phi)\), and the rate at which that distance is changing.

You might briefly wonder how things at fixed position can be getting further apart: but remember, the distance between things is the product of the scale factor \(a(t)\) and the coordinate distance, so even though the coordinate distance is fixed, this physical distance between them changes with \(a(t)\).

The Age of the Universe

Now that we have a model of the universe, we can at least try to work out how old the universe is according to that model. And then we can try to deduce from that something about the actual universe.

There's a useful bit of notation worth introducing here. I decide to call the current time \(t_0\), and then just put a subscript \(0\) on any quantity to indicate that it is the value at \(t_0\): so \(a_o = a(t_0)\), \(H_0 = H(t_0)\) etc.

We should note that that the units of \(H\) are those of inverse time, so on dimensional grounds one might hope that the age of the universe is given by (some multiple of) \(1/H_0\). (Thanks to Leo Stein, @duetosymmetry, for pointing this out.) We'll see how this hope plays out for one simple case just below.

A Simple Dust Filled Universe

As an illustration of how we can extract an age of the universe from this mathematical model, we'll consider the simplest (non-trivial!) model.

First we take \(k=0\). There is no observational evidence that this is wrong, and it makes the equations simpler.

Next, we take \(\Lambda =0\). This isn't quite right, but we're going to be pushing the evolution backwards, which will mean that \(\rho\) will be increasing, so the contribution from \(\Lambda\) will become less important rather than more important as we push further. With luck this won't be too much of a problem.

Finally, we take \(p=0\), i.e. we assume that the universe is full of a fluid that does not resist compression. Since there is a lot of space between galaxies, and they aren't moving very fast, this seems plausible too. In this context such a fluid is called dust.

Now, with all these simplifications, our differential equations reduce to \[ \left( \frac{\dot{a}}{a} \right)^2 = \frac{8\pi G}{3}\rho, \qquad \left(\frac{\ddot{a}}{a}\right) = -\frac{4 \pi G}{3} \rho \] so we immediately get \[ \left( \frac{\dot{a}}{a} \right)^2 = -2 \left(\frac{\ddot{a}}{a}\right) \] or \[ \dot{a}^2 = -2a \ddot{a}. \]

Faced with a differential equation like this - nonlinear, and not one I've seen a special method to solve - I resort to guessing. It looks as if taking \(a(t)\) to be some power of \(t\) might work, since the powers on each side of that equation would match, so I guess that \(a(t)\) is proportional to \(t^\alpha\), and try to find a value of \(\alpha\) that works.

Plugging this into the equation gives \(\alpha^2=-2\alpha(\alpha-1)\), or \(\alpha(3\alpha-2)=0\). The \(\alpha=0\) solution is not interesting: it would mean that \(a(t)\) was constant, so the \(\alpha = 2/3\) solution is the relevant one.

This gives us \[ a(t) = a_0\left( \frac{t}{t_0} \right)^{\frac{2}{3}} \]

Now we're getting somewhere. We can see that this is fine for any value of \(t>0\), but when \(t=0\) it all goes to pieces. \(a(0)=0\), and if you look at the first equation you'll see that the expression for \(\rho\) involves a division by \(0\), which is never a good sign.

But let's gloss over that for the moment. What I'd like to know is the value of \(t_0\), because that's how long the universe (according to this model) has been around.

We can do something a little bit cunning here, and compute \(H(t)=\dot{a}/a\). This gives us \[ H(t) = \frac{2}{3t} \] or, more to the point, \[ t_0 = \frac{2}{3H_0} \] and \(H_0\) is something that we can try to find from observation. Admittedly, the measurements are quite tricky, but at least it's some observational data that tells use the age of the universe.

Well, no.

That doesn't tell us the age of the universe. What it tells us is that if we push the model back in time, the very longest we can push it is this \(t_0\). It doesn't tell us that the universe sprang into existence that long ago and then evolved according to these equations subsequently.

In fact, we can even say that this value can't be quite right.

The Age of the Universe?

If we push this back then eventually we are in a regime where the density is very large. At this point, it is almost certainly a bad approximation to assume that \(p=0\), and we need to use a more accurate model that takes better care of how pressure and density are related in some kind of hot plasma.

Well, of course, we have reasonable models for this. We plug in the appropriate equation of state, and work out how old the universe is given that for a sufficiently early 'now' that equation of state is relevant.

But then the problem recurses. For sufficiently early times, the conditions become so extreme that we really don't know how to model the matter.

And of course, in all the above I was working with various simplifications. We need to work with rather more complicated evolution equations to do something more realistic.

And when we do all that, we find that the 'age of the universe' is a little under fourteen billion years.

But this age is not really the age of the universe. It is how long it has been since the start of the time when we have some physical understanding of how the stuff filling the universe behaves.

Before that? Since we don't know how the stuff behaves, and it is very likely that classical physics will stop being the relevant model in any case, it's hard to push earlier.

There are lots of speculative theories, some more fun and some more plausible than others. But we certainly can't deduce from what we know that the universe itself has only existed for this fourteen billion years, only that our ability to describe it fails when we try to get further back than that.

The Age of the Universe As We Know It

This is the bottom line. When we talk about the age of the universe in cosmology, we aren't really talking about the age of the universe as such. We're talking about the length of time for which we have some understanding of what is going on in the universe, i.e the time since which content of the universe was something we have reasonable models for. The Big Bang is an idealization of the time at which the universe became accessible to our theories and models.

Before that, it may have spent an undetermined, or even infinite, length of time full of some kind of very high energy quantum field which underwent a phase transition to the hot dense plasma that looks to us like a Big Bang beginning to the universe. Or maybe something even weirder, such as a state in which the notion of time itself doesn't make sense, so that the 'age of the universe' is the length of time for which time has been a useful notion. There are various notions out there of what might have preceded the Big Bang, but by the very nature of things, they tend to be relatively unconstrained by observations.

Addendum

Nalini Joshi (@monsoon0) has pointed out that \[ \dot{a}^2 = -2a \ddot{a} \] can be solved without resorting to guessing the right answer, by using separation of variables (twice). Dividing both sides by \(a \dot{a}\) gives \[ \frac{\dot{a}}{a} = -2 \frac{\ddot{a}}{\dot{a}} \] which can be integrated immediately to give \[ \ln(a) = -2 \ln(\dot{a}) + C_0 \] so that \(a = C_1\dot{a}^{-2}\).

This rearranges to \[ \dot{a}^2 = \frac{C_1}{a} \] and so \[ \dot{a}=\frac{C_2}{\sqrt{a}} \] which can then be separated to give \[ \sqrt{a}da = C_2 dt \] Integrating this (and choosing the constant of integration so that \(a(0)=0\)) yields the solution given above.