Sunday, 25 March 2018

Cantor and infinity in a countable universe.

Cantor's theorem states that the cardinality of a set is always strictly less than the cardinality of its power set. But there is a universe for set theory which is countable. What's going on here?

There's a really nice PBS Infinite Series video on Youtube How Big are all Infinities Combined, exploring the sizes of infinite sets. (If you haven't watched it, this would be a great time. I'll wait.) In this video they make heavy use of Cantor's theorem, which says

Given any set \(S\), its power set \(\mathcal{P}(S)\) is bigger than \(S\)
where \(\mathcal{P}(S)\) is the set of all subsets of \(S\).

The amazing thing about this is that it is true even when \(S\) is infinite, though we have to work a little to say what we mean by bigger than here: what we mean is that there is no injective function \(\mathcal{P}(S) \to S\), though there is an obvious injective function \(i:S \to \mathcal{P}(S)\), given by \(i(x)=\{x\}\) for each \(x \in S\).

It's a non-trivial fact (the Cantor-Schröder-Bernstein theorem) that given two sets \(A\) and \(B\), if there is an injection from \(A\) to \(B\) and an injection from \(B\) to \(A\), then there is a bijection from \(A\) to \(B\); so it makes sense to say that \(A\) is smaller than (or equal to) \(B\) in size if there is an injection from \(A\) to \(B\), but none from \(B\) to \(A\).

This immediately tells us that there is a whole tower of infinite sets that we can make; starting with the natural numbers \(\mathbb{N}\), then we can think about \(\mathcal{P}(\mathbb{N})\), \(\mathcal{P}(\mathcal{P}(\mathbb{N})\), and so on, each of them strictly larger than its predecessor.

It's not very hard to prove that the second of these, \(\mathcal{P}(\mathbb{N})\), is the same size as \(\mathbb{R}\). It is an interesting question whether there is any infinite size between that of \(\mathbb{N}\) and \(\mathbb{R}\), but one which I won't consider here.

In the video (you have watched it now, right?) the question is how big all the infinities combined is.

But there's even an issue with what it means for one infinite set to be bigger than another, and that's what I want to explore here.

The problem is, to a large extent, knowing just what sets and functions are.

It's easy enough to decide what a function is if you know what a set is.

If \(A\) and \(B\) are sets, we think informally as a function \(A \to B\) as a rule that assigns to each element of \(A\) an element of \(B\). We can formalise this by saying that a function \(A \to B\) is a subset of \(A \times B\) (the set of all ordered pairs \((a,b)\) where \(a \in A\) and \(b \in B\)) which has the property that each element of \(A\) appears in exactly one of the ordered pairs.

But what's a set?

It would be really nice if we could just say that a set was a (somehow) well-defined collection of objects. But trying to do without some kind of restriction on how we define the sets tends to lead to real problems, the most famous of which is Russell's paradox:

Let \(X\) be the set of all sets which are not elements of themselves. Then if \(X \in X\), by the definition of \(X\), \(X \not \in X\). But conversely, if \(X \not \in X\) then again by the definition of \(X\), \(X \in X\).
This is a genuine paradox in the world of naive set theory, so we need to be a bit more careful about what counts as a description of the elements comprising a set if we want to avoid problems like this.

The (ok, a) solution is to develop axiomatic set theory, and one particularly popular version of that is ZFC (which stands for Zermelo-Fraenkel plus the Axiom of Choice). I won't explain just how ZFC works here, but in essence it is a set of rules that tell us how we can build new sets out of existing ones, and a couple of particular sets that we insist on (so that we can build others out of them). One of the sets we insist on is the set of von Neumann integers, so \(\mathbb{N}\) is definitely in in there.

There's always the question of whether our axioms are consistent. Most mathematicians are happy that the ZFC axioms are in fact consistent, partly because we can describe a collection of objects (which we call sets) that seem to satisfy all the axioms. The study of particular collections of objects and operations on them which satisfy particular sets of axioms is called model theory, and it throws up some surprising results.

We'll call a collection of sets that satisfy the axioms of ZFC a universe.

There is a fundamental theorem of model theory, the downward Löwenheim-Skolem theorem that tells us that there is a countable universe (yes, the entire universe is countable) that satisfies the axioms of ZFC.

But there's clearly something fishy about this. Cantor's theorem also follows from these axioms of set theory, and that tells us that the power set of \(\mathbb{N}\) is uncountable.

How does an uncountable set fit into a countable universe? This is Skolem's paradox.

And this is where the issue of just what is a set rears up in all its glory and horror.

Let's review the situation.

The entire universe is countable, so in this model of set theory there are only countably many subsets of \(\mathbb{N}\). (This model clearly fails to include most of the subsets that we might think ought to be there.) Since the universe is countable and \(\mathcal{P}(\mathbb{N})\) is countable, there is a bijection between \(\mathbb{N}\) and \(\mathcal{P}(\mathbb{N})\).

But Cantor has proved that there can be no such bijection.

But remember what a function \(A \to B\) is: it is a subset of \(A \times B\) satisfying a certain constraint, namely that every element of \(A\) occurs in exactly on of the ordered pairs.

We know that the \(\mathcal{P}(\mathbb{N})\) in this universe is countable, so there is a collection of ordered pairs \((n,X)\), one for each natural number \(n\) and such that every \(X \subseteq \mathbb{N}\) in this universe appears in the collection.

The escape clause is the one that tells us that this collection is not—cannot be—one of the sets in the universe.

Looking at the universe from outside, we see a function—what we might call an external function—giving a bijection between \(\mathbb{N}\) and this universe's model of \(\mathcal{P}(\mathbb{N})\).

But if we live inside the universe, we can't see this: it doesn't exist in the universe. There is no internal function which matches the natural numbers with the sets of natural numbers.

You may well have an objection to raise at this point: the real numbers are a complete ordered field, and it's also a theorem that this defines them up to isomorphism. And we know that the real numbers (at least in the mathematical universe we're accustomed to) really are uncountable. So how do they fit into this countable universe?

The answer to this is unavoidably a little technical.

The completeness axiom for the real numbers says something about all subsets of \(\mathbb{R}\); this is a second order axiom. The axioms of ZFC aren't powerful enough for this type of operation. They are first order, which (roughly speaking) means that they let you say something about all elements of a set, but not all subsets of it.

If you use the first order version of the axioms for a complete ordered field, there is more than one structure satisfying the axioms. Some are countable (as must be the case in the countable universe of ZFC guaranteed by the Löwenheim-Skolem Theorem), and some are larger (indeed there are models of all infinite cardinalities).

So all the above is true if we use the power of first order axioms to specify what a set is. If we allow second order axioms, so that we can use the notion of 'all subsets', then the game changes. Then why don't we use some kind of second order theory, and avoid this kind of nonsense?

It's very tempting to do this. Unfortunately, there are problems with second order theories too. Though it solves some of the problems, for example tying down the real numbers much more specifically, the notion of proof in a second order theory is not as well behaved as in first order theory. Consequently, logicians who work on the foundations of mathematics explore the strengths and weaknesses of first and second order theories.

And what about the normal 'working mathematicians'? Most aren't particularly concerned with these foundational issues: and it's quite possible to go through a degree, a PhD, and a research career in mathematics without meeting or having any reason to care about them. In any given context, most mathematicians simply use first or second order axioms according to which are more convenient.

Acknowledgement

This post was sparked off by the Infinite Series video How Big are all Infinities Combined. Gabe Perez-Giz (@fizziksgabe) also bears some responsibility, since when I made a brief comment on twitter about this he asked if I had a blog post about it. Well, now I do. Thanks also to James Propp (@JimPropp) for some helpful suggestions on presentation.

Sunday, 18 March 2018

How old is the Universe?

The Universe is about fifteen billion years old, and started off with a Big Bang, as any fule kno. Well, kind of, but that's more than a slight over-simplification.

There are two fairly obvious questions: first, how do we know? and second, what does that actually mean? Advance warning: there will be mathematics. But if you've got as far as knowing what a differential equation is you should have all you need to cope.

Modelling the Universe

The Mathematics Contribution

Unfortunately, there's no apparent way of directly observing the age of the universe. Any measurement of its age is going to be pretty indirect, and rely quite heavily having on some kind of theory or model of the universe.

First, we'll find a mathematical framework for the model, then incorporate physics into it later.

To start off with, here are some fairly plausible assumptions about the nature of the universe.

  1. The universe is homogeneous
  2. The universe is isotropic
The first assumption is that the universe looks (at any given time) much the same no matter where you are; the second is that it is much the same in every direction. We know that the universe looks (pretty much) isotropic from where we are; assuming that we are nowhere special and the universe is much the same at every point implies that it is also isotropic everywhere.

From these two, we can deduce that (in terms of some very well-chosen coordinates) the geometry of the universe is given by the metric \[ ds^2 = c^2 dt^2 -a(t)^2\left(\frac{dr^2}{1-kr^2} + r^2( d\theta^2+\sin^2(\theta)d\phi^2) \right). \] where \(k\) is either \(-1\), \(0\) or \(1\), and \(a(t)\) is some as yet undetermined function called the scale factor.

The metric tells us about the nature of the separation between two nearby points. There's a lot packed up in that, so it's worth taking a closer look.

Consider a couple of nearby points in the universe, where nearby means that the differences in their coordinates are small, and we call those differences \(dt, dr, d\theta\) and \(d\phi\) . There are a couple of interesting cases.

If the points have the same \(r, \theta, \phi\) coordinates, then \(ds^2=c^2 dt^2\), and \(dt\) is just how much time passes between the two.

On the other hand, if the points are at the same time, then \[ -ds^2 = a(t)^2\left(\frac{dr^2}{1-kr^2} + r^2( d\theta^2+\sin^2(\theta)d\phi^2) \right) \] is the square of the distance between them. We can see how we can regard this distance as the product of \[ \sqrt{\frac{dr^2}{1-kr^2} + r^2( d\theta^2+\sin^2(\theta)d\phi^2))} \] which we call the coordinate distance and \(a(t)\) which we call the scale factor.

The different values of \(k\) tell us about the spatial geometry of the universe.

  1. If \(k=-1\) the universe has negative spatial curvature. This means that if we consider a sphere of radius \(R\), the volume of the enclosed space grows faster than \(4\pi R^3/3\).
  2. If \(k=0\) we have flat space. Then the volume enclosed by a sphere of radius \(R\) is the familiar \(4 \pi R^3/3\).
  3. If \(k=1\) then the universe has positive spatial curvature. In this case the volume enclosed in a sphere of radius \(R\) is less than \(4 \pi R^3/3\).
There is nothing to choose between these possibilities from the point of view of the mathematics. Observationally though, as far as we can tell the universe is spatially flat, i.e. \(k=0\).

It's also worth making a note that although the coordinates make it look as if there's a centre of the universe at at \(r=0\), this isn't true. This \((r,\theta,\phi)\) system of coordinates are just the familiar spherical polar coordinates: we can equally well choose any point to be at \(r=0\).

So our two assumptions mean the the universe can be described in terms of just one dynamical quantity, called the scale factor. This function \(a(t)\) described the entire history of the universe. But the mathematics doesn't tell use anything about it. For that, we require some physical input.

The Physics Contribution

In order to figure out how \(a(t)\) behaves, we need to know what kind of material the universe is full of. Then the Einstein Field Equations tell us the relationship between what the universe is filled with and its geometry (as expressed by the metric), and can be used to find the metric.

So we need a good candidate for the stuff that fills the universe.

One plausible approximation is to thing of the universe as full of a fluid, whose constituent 'atoms' are clusters of galaxies. (Obviously we are working on a very large scale here for this approximation to be sensible.) It is also reasonable to take this fluid to have negligible viscosity, so we model the content of the universe as an ideal fluid.

Given this, the state of the fluid is given by two quantities, the pressure \(p\) and density \(\rho\). Because of the homogeneity and isotropy assumptions, these only depend on \(t\). With quite a lot of calculation, one eventually finds a pair of coupled ordinary differential equations for \(a\), \(\rho\) and \(p\): \[ \begin{split} \left( \frac{\dot{a}}{a} \right)^2 &= \frac{8\pi G}{3}\rho +\frac{\Lambda c^2}{3} - \frac{kc^2}{a^2}\\ \left(\frac{\ddot{a}}{a}\right) &= -\frac{4 \pi G}{3}\left( \rho + \frac{3p}{c^2} \right) + \frac{\Lambda c^2}{3} \end{split} \] In these equations, \(G\) is the gravitational constant, and \(\Lambda\) is the cosmological constant, which appears naturally in the Einstein Field Equations. For a long time it was believed to be \(0\), but recent measurements suggest is is a very small positive quantity.

OK, so this gives us two equations: but there are three quantities to be dealt with, \(a\), \(p\) and \(\rho\). We're missing some information.

To fill the gap, we need to know a bit more than just that the universe is full of an ideal fluid. We need to know what kind of ideal fluid it is. And one way of saying that is by giving a relationship between \(p\) and \(\rho\), which relationship we call the equation of state. Equipped with this we have, at least in principle, all the information required to specify the history of the universe in terms of \(a\), \(p\) and \(\rho\) as functions of \(t\).

A Brief Digression on the Hubble Parameter

In passing, it's worth noting that that quantity \(\dot{a}/a\) which appears in the first equation above is our old friend the Hubble parameter, usually denoted \(H(t)\). (I don't like calling it the Hubble constant, since it's a function of time.) At any time \(t\), the Hubble parameter \(H(t)\) tells us us the relationship between the distance between two objects at fixed spatial coordinates (i.e. fixed \((r,\theta,\phi)\), and the rate at which that distance is changing.

You might briefly wonder how things at fixed position can be getting further apart: but remember, the distance between things is the product of the scale factor \(a(t)\) and the coordinate distance, so even though the coordinate distance is fixed, this physical distance between them changes with \(a(t)\).

The Age of the Universe

Now that we have a model of the universe, we can at least try to work out how old the universe is according to that model. And then we can try to deduce from that something about the actual universe.

There's a useful bit of notation worth introducing here. I decide to call the current time \(t_0\), and then just put a subscript \(0\) on any quantity to indicate that it is the value at \(t_0\): so \(a_o = a(t_0)\), \(H_0 = H(t_0)\) etc.

We should note that that the units of \(H\) are those of inverse time, so on dimensional grounds one might hope that the age of the universe is given by (some multiple of) \(1/H_0\). (Thanks to Leo Stein, @duetosymmetry, for pointing this out.) We'll see how this hope plays out for one simple case just below.

A Simple Dust Filled Universe

As an illustration of how we can extract an age of the universe from this mathematical model, we'll consider the simplest (non-trivial!) model.

First we take \(k=0\). There is no observational evidence that this is wrong, and it makes the equations simpler.

Next, we take \(\Lambda =0\). This isn't quite right, but we're going to be pushing the evolution backwards, which will mean that \(\rho\) will be increasing, so the contribution from \(\Lambda\) will become less important rather than more important as we push further. With luck this won't be too much of a problem.

Finally, we take \(p=0\), i.e. we assume that the universe is full of a fluid that does not resist compression. Since there is a lot of space between galaxies, and they aren't moving very fast, this seems plausible too. In this context such a fluid is called dust.

Now, with all these simplifications, our differential equations reduce to \[ \left( \frac{\dot{a}}{a} \right)^2 = \frac{8\pi G}{3}\rho, \qquad \left(\frac{\ddot{a}}{a}\right) = -\frac{4 \pi G}{3} \rho \] so we immediately get \[ \left( \frac{\dot{a}}{a} \right)^2 = -2 \left(\frac{\ddot{a}}{a}\right) \] or \[ \dot{a}^2 = -2a \ddot{a}. \]

Faced with a differential equation like this - nonlinear, and not one I've seen a special method to solve - I resort to guessing. It looks as if taking \(a(t)\) to be some power of \(t\) might work, since the powers on each side of that equation would match, so I guess that \(a(t)\) is proportional to \(t^\alpha\), and try to find a value of \(\alpha\) that works.

Plugging this into the equation gives \(\alpha^2=-2\alpha(\alpha-1)\), or \(\alpha(3\alpha-2)=0\). The \(\alpha=0\) solution is not interesting: it would mean that \(a(t)\) was constant, so the \(\alpha = 2/3\) solution is the relevant one.

This gives us \[ a(t) = a_0\left( \frac{t}{t_0} \right)^{\frac{2}{3}} \]

Now we're getting somewhere. We can see that this is fine for any value of \(t>0\), but when \(t=0\) it all goes to pieces. \(a(0)=0\), and if you look at the first equation you'll see that the expression for \(\rho\) involves a division by \(0\), which is never a good sign.

But let's gloss over that for the moment. What I'd like to know is the value of \(t_0\), because that's how long the universe (according to this model) has been around.

We can do something a little bit cunning here, and compute \(H(t)=\dot{a}/a\). This gives us \[ H(t) = \frac{2}{3t} \] or, more to the point, \[ t_0 = \frac{2}{3H_0} \] and \(H_0\) is something that we can try to find from observation. Admittedly, the measurements are quite tricky, but at least it's some observational data that tells use the age of the universe.

Well, no.

That doesn't tell us the age of the universe. What it tells us is that if we push the model back in time, the very longest we can push it is this \(t_0\). It doesn't tell us that the universe sprang into existence that long ago and then evolved according to these equations subsequently.

In fact, we can even say that this value can't be quite right.

The Age of the Universe?

If we push this back then eventually we are in a regime where the density is very large. At this point, it is almost certainly a bad approximation to assume that \(p=0\), and we need to use a more accurate model that takes better care of how pressure and density are related in some kind of hot plasma.

Well, of course, we have reasonable models for this. We plug in the appropriate equation of state, and work out how old the universe is given that for a sufficiently early 'now' that equation of state is relevant.

But then the problem recurses. For sufficiently early times, the conditions become so extreme that we really don't know how to model the matter.

And of course, in all the above I was working with various simplifications. We need to work with rather more complicated evolution equations to do something more realistic.

And when we do all that, we find that the 'age of the universe' is a little under fourteen billion years.

But this age is not really the age of the universe. It is how long it has been since the start of the time when we have some physical understanding of how the stuff filling the universe behaves.

Before that? Since we don't know how the stuff behaves, and it is very likely that classical physics will stop being the relevant model in any case, it's hard to push earlier.

There are lots of speculative theories, some more fun and some more plausible than others. But we certainly can't deduce from what we know that the universe itself has only existed for this fourteen billion years, only that our ability to describe it fails when we try to get further back than that.

The Age of the Universe As We Know It

This is the bottom line. When we talk about the age of the universe in cosmology, we aren't really talking about the age of the universe as such. We're talking about the length of time for which we have some understanding of what is going on in the universe, i.e the time since which content of the universe was something we have reasonable models for. The Big Bang is an idealization of the time at which the universe became accessible to our theories and models.

Before that, it may have spent an undetermined, or even infinite, length of time full of some kind of very high energy quantum field which underwent a phase transition to the hot dense plasma that looks to us like a Big Bang beginning to the universe. Or maybe something even weirder, such as a state in which the notion of time itself doesn't make sense, so that the 'age of the universe' is the length of time for which time has been a useful notion. There are various notions out there of what might have preceded the Big Bang, but by the very nature of things, they tend to be relatively unconstrained by observations.

Addendum

Nalini Joshi (@monsoon0) has pointed out that \[ \dot{a}^2 = -2a \ddot{a} \] can be solved without resorting to guessing the right answer, by using separation of variables (twice). Dividing both sides by \(a \dot{a}\) gives \[ \frac{\dot{a}}{a} = -2 \frac{\ddot{a}}{\dot{a}} \] which can be integrated immediately to give \[ \ln(a) = -2 \ln(\dot{a}) + C_0 \] so that \(a = C_1\dot{a}^{-2}\).

This rearranges to \[ \dot{a}^2 = \frac{C_1}{a} \] and so \[ \dot{a}=\frac{C_2}{\sqrt{a}} \] which can then be separated to give \[ \sqrt{a}da = C_2 dt \] Integrating this (and choosing the constant of integration so that \(a(0)=0\)) yields the solution given above.

Saturday, 3 March 2018

A Regiment of Monstrous Functions

How badly can a function \(f:\mathbb{R} \to \mathbb{R}\) behave? In fact, remarkably badly. Today, I will mostly be posting a list of functions which (to my taste) exhibit various forms of bizarre behaviour.

Not continuous at a point

The simplest type of pathological function is probably a function which is well-behaved, except for having a jump discontinuity. An example of such a function is given by \[ f(x) = \left\{ \begin{array}{ccc} x &\text{if} & x \le 0 \\ x+1 & \text{if} & x\gt 0 \end{array} \right. \] What makes this function misbehave is essentially that it has two different definitions, applicable on different parts of the domain, and which somehow disagree where the different parts meet. This suggests a strategy for constructing 'monsters' by appropriate choices of how to split up the domain and how to define the function on the different parts.

Nowhere continuous

We can take this notion of making a function discontinuous at one point and push it to the limit by splitting up the domain into two sets, each of which is dense. One familiar way of making such a split is to consider the rational numbers (\(\mathbb{Q})\) and the irrational numbers (\(\mathbb{R} \setminus \mathbb{Q}\)). Then if \(x \in \mathbb{R}\), no matter how small a number \(\varepsilon \ge 0\) is chosen, the interval \((x-\varepsilon,x+\varepsilon)\) contains both rational and irrational numbers.

So let's now look at the function \[ f(x) = \left\{ \begin{array}{rcc} 0 &\text{if} & x \in \mathbb{Q} \\ 1 & \text{if} & x \in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] If we choose some real number \(x\), then if \(x\) is rational, so \(f(x)=0\), we can find irrational numbers as close as we like to \(x\), so that arbitrarily close to \(x\) are values on which \(f\) takes on the value \(1\); likewise if \(x\) is irrational, so \(f(x)=1\), we can find rational numebrs as close as we like to \(x\), so that arbitrarily close to \(x\) are values on which \(f\) takes on the value \(0\).

It follows then that no matter what value of \(x\) we choose, \(f\) is not continuous.

We can sort of visualize the graph of this \(f\) as a pair of dotted lines, one at height \(0\) (for the rational values of \(x\)) and one at height \(1\) (for the irrational values). It's then pretty plausible than changing the value of \(x\) by any amount at all involves jumping up and down between the two lines, so the function is not continuous anywhere.

This rather went from the sublime to the ridiculous in one jump. But I think that the next case is still stranger.

Continuous at just one point

This time we have \[ f(x) = \left\{ \begin{array}{rcl} 0 &\text{if} & x \in \mathbb{Q} \\ x & \text{if} & x \in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] By the same argument as above, this function is not continuous at any value of \(x\) other than \(0\).

But at \(0\), the story is different.

This time, if we chose a value of \(x\) close to zero, the value of \(f(x)\) is either \(0\), if \(x\) is rational, or \(x\) which is small, if \(x\) is irrational. In either case, we can make \(f(x)\) as close as we want to \(0\) by taking \(x\) sufficiently small: so \(f\) is continuous at just this one point.

In much the same way as before, we can try to visualize the graph as a pair of lines again, this time at height \(0\) for rational values, and at height \(x\) for irrational ones. Away from zero we have the same situation as before, but at zero, where the graphs 'intersect', the two definitions (almost) agree, and so the function is continuous.

So we can have a function that is continuous everywhere except at one point, or just at one point. It's not hard to see we can have as many discontinuities as we want. But now it's time for a really weird function.

Continuous on just the irrationals

This one is hard to believe. First, though, agree that if \(x\) is a rational number, then we express it as the ratio of two integers \(m/n\) where \(n>0\) and \(\gcd(m,n)=1\). We then define \[ f(x) = \left\{ \begin{array}{rcl} 1/n &\text{if} & x \in \mathbb{Q} \\ 0 & \text{if} & x \in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] So if \(x\) is a rational number, then \(f(x)\) is non-zero, but arbitrarily close to \(x\) are irrational numbers, on which \(f\) takes the value \(0\). So \(f\) is not continuous at \(x\).

On the other hand, if \(x\) is not rational, then no matter how large an integer \(N\) we choose, there is an interval about \(x\) so small that no rational number with denominator less than \(N\) lies inside that interval. So any number inside the interval is either irrational, so \(f\) takes on the value \(0\), or is rational, so the denominator must exceed \(N\), and the value is less than \(1/N\).

So \(f\) is continuous at this \(x\).

This is very hard to comprehend. I can try to think of a graph analogous to the other two cases, but when I think about it carefully I realize I'm kidding myself.

Continuous on just the rationals

You might expect (I certainly expected) that it would be possible to do this by a suitable clever adaptation of the previous example. It turns out that that doesn't work. In fact, nothing works. It isn't possible for a function to be continuous on just the rationals.

More than anything else, this makes me realize that I don't really understand just how the previous example works.

This pretty much wraps it up for the continuity monsters. But there are other monsters to admire.

Differentiable, but the derivative is not continuous

This is subtler than you might expect. You can't get it by integrating a function with a discontinuity, because the result isn't differentiable. So consider \[ f(x) = \left\{ \begin{array}{ccl} x^2\sin(1/x) &\text{if} & x \neq 0 \\ 0 & \text{if} & x =0 \end{array} \right. \] Then as long as \(x \neq 0\), the usual rules of differentiation give us \[ f'(x) = 2x \sin(1/x) - \cos(1/x) \] But as \(x\) approaches \(0\), this is not at all well-behaved. It just oscillates faster and faster between (approximately) \(\pm 1\).

On the other hand, if \(h \neq 0\) then \[ \begin{split} \frac{f(h+0)-f(0)}{h} &= \frac{h^2\sin(1/h)}{h}\\ &= h \sin(1/h) \end{split} \] so as \(h \to 0\), since \(\sin(1/h\) is bounded above by \(1\) and below by \(-1\), this also tends to \(0\), so that \(f'(0)=0\).

So it is possible for a derivative to be discontinuous, but not because of a jump discontinuity.

We can fairly obviously make this work for higher and higher derivatives by using higher powers of \(x\). That isn't very interesting. Let's look at something a bit weirder.

Differentiable at just one point

In fact, this will be even stranger than it sounds. We can have a function that is differentiable at just one point, but not even continuous anywhere else. \[ f(x) = \left\{ \begin{array}{rcl} x^2 &\text{if} & x \in \mathbb{Q} \\ 0 & \text{if} & x \in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] Again, by just the same argument as before, this function is not continuous at any non-zero value of \(x\).

But (rather like the above example) if \(h \neq 0\), we have \[ \frac{f(0+h)-f(0)}{h} = \left\{ \begin{array}{rcl} h &\text{if} & h \in \mathbb{Q} \\ 0 & \text{if} & h \in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] so again we see that as \(h \to 0\), the limit is \(0\), so \(f'(0)=0\).

And again, we can make this as differentiable as we want at \(0\) by using higher powers of \(x\).

It's probably worth saying out loud that we can't do this by working out derivatives of derivatives, since the derivative only exists at \(0\). Instead, we say that \(f\) is differentiable \(n\) times at \(x\) if there exist numbers \(f^{(i)}(x)\) for \(i=1,\ldots n\) with \[ f(x+h) = f(x) + f'(x)h + \ldots + \frac{1}{n!}f^{(n)}(x) + e \] where \(e/h^n \to 0\) as \(h \to 0\).

Now, these are all interesting in their own right, just as examples of what can happen. Let's finish with an example which is interesting and useful.

Infinitely differentiable but not analytic

We finish off with \[ f(x) = \left\{ \begin{array}{ccl} \exp(-1/x) &\text{if} & x \gt 0 \\ 0 & \text{if} & x \le 0 \end{array} \right. \] A straightforward calculation shows that this function has derivatives of all orders at \(0\), and the derivatives are all \(0\). So this function is infinitely differentiable, but since its power series is just \(0\), it is not analytic.

Using this, we can build a new function: \[ g(x) = \frac{f(x)}{f(x)+f(1-x)} \] This function is \(0\) for negative values of \(x\), \(1\) for values of \(x > 1\), and smoothly interpolates between them. Functions built out of these are important in differential geometry and differential topology, to patch together smoothly objects which are defined locally. It is an argument using such functions that proves, for example, that any smooth manifold admits a Riemannian metric.

You missed a bit!

There is one very well-known type pathological function which I haven't mentioned: the functions which are continuous but nowhere differentiable, first described by Karl Weierstrass. The reason is that I wanted to consider functions where there is an explicit (and simple) formula for the value of \(f(x)\), so that the properties could be seen directly. I've neglected the continuous but not differentiable functions although they are also interesting, because they are defined by means of limits rather than a simple formula.

Enough is enough

Most of these functions are, basically, a freak show. We look at them to see objects with bizarre or surprising properties; and some actually have practical uses. But even those without obvious application are important. They help to find weaknesses in our understanding of basic concepts, and thereby to sharpen our understanding of them.

Also, they're fun.

Comment

It has been pointed out in the comments below that technically these functions are not freaks. In fact, amongst functions, the continuous functions are the freaks (i.e. the rarities), and amongst continuous functions, the differentiable ones are the freaks. But in terms of the kind of functions that students meet in practice, where (at worst piecewise) analytic is standard, these functions are still the monsters.

Acknowledgement

  • I probably wouldn't have written this if @panlepan hadn't tweeted about continuous but non-differentiable functions.
  • Thanks to Manley Perkel for typo spotting.
  • thanks to delio for his comments on what is (and is not) a freak.

Saturday, 3 February 2018

What's a number? 2 - really

After mastering (or at least being exposed to) the arithmetic of natural numbers, we go beyond, to rational and even irrational numbers. But again, we usually do this without any real discussion of what they are. We just absorb the notion that a fraction or an infinite decimal expansion mean something, or that asking for a quantity whose square is \(2\) is a sane thing to do.
The journey from the natural numbers to the reals is full of interesting scenery, and the destination has a depth and richness that you can spend lifetimes investigating without ever doing more than scratch the surface. I'm only going to sketch the process here, pointing out some of the more interesting sights on the way, but missing the calculations that justify it. If you haven't seen this before, I hope you'll get at least a rough idea of what's going on. If you have, maybe you'll get a new insight or perspective on it.

How should they behave?

Just as with the natural numbers, we look at the behaviour of the numbers that we have some intuition for, and demand that whatever the real numbers are, they should obey the same rules. In particular, we want to be able to add, subtract, multiply and divide with all the usual associative, commutative and distributive properties. We also want to be able to compare numbers, so we want to be sure that given any \(a\) and \(b\) exactly one of \(a \lt b\), \(a=b\) or \(a \gt b\) is true, and inequalities have to behave nicely, in the sense that we can add anything to both sides of an inequality, or multiply both sides by any positive number.

Something to take away.

Equipped with only the natural numbers \(\mathbb{N}\) (realized through the von Neumann numbers) we can always multiply and add, but sometimes we can subtract, and sometimes we can't, depending on the numbers.
We say that if \(m,n \in \mathbb{N}\) then \(m \geq n\) if there is some \(x\ \in \mathbb{N}\) such that \(m=x+n\), and \(m \gt n\) if \(x \neq 0\). Then as long as \(m \ge n\) we can define \(m-n\) to be this \(x\).
The first step is to fit negative numbers into the picture so that we have a meaning for \(m-n\) when this is not the case.
A neat trick for this is to think about pairs of number \((m,n)\), and then think of two pairs of numbers \((m,n)\) and \((p,q)\) as equivalent when \[ m+q=p+n \] We see straight away that if \(m \geq n\) and \(p \geq q\) that this equivalence is saying that \(m-n = p-q\). We can think of any particular pair \((m,n)\) as representing the difference \(m-n\), and then we make the jump to say that this is also true if \(m \lt n\).
Now we lump the pairs together into groups of equivalent pairs, and call each of these an equivalence class. The set of all the equivalence classes is called \(\mathbb{Z}\), the set of integers.
This is a very powerful idea, so let's take a moment to look at it more carefully.
I'm saying that I think of the pair \((3,0)\) as representing the same object as \((4,1)\), \((5,2)\), \((6,3)\), or in fact any pair of the form \((n+3,n)\) for any natural number \(n\). They are all pairs of numbers where the first is \(3\) more than the second, and they all represent the number \(3\). Likewise, the pair \((0,3)\) represents the same object as \((1,4)\), \((2,5)\), \((3,6)\) or any pair of the form \((n,n+3)\) where \(n\) is a natural number. They are all pairs where the second number in the pair is \(3\) more than the first; they all represent the thing I want to call \(-3\). The only thing I care about it the difference between the first number in the pair and the second.
So we need to make sure that we can add and multiply these integer objects sensibly: that means we need to know how to add and multiply representatives in such a way that whichever representatives we choose to combine, the result is always a representative of the same collection.
This is a lot like what happens if we lump numbers together into odd and even. It doesn't matter which two odd numbers I add together, I always get an even number, so I can just say that odd plus odd equals even. Likewise, I can say what happens if I add or multiply any combination of odd and even; it doesn't matter just which numbers I pick, only whether they are odd or even.
I need to do the same thing with these pairs: it mustn't matter which representative pairs I pick to operate in, I must get a representative of the same resulting class.
Here's a way that works (using juxtaposition to denote multiplication):
  • \((m,n)+(p,q)=(m+p,n+q)\)
  • \((m,n)(p,q) = (mp+nq,mq+np)\)
If we do this, we can see that each element \(n\) of \(\mathbb{N}\) is represented by the pair \((n,0)\), and we've acquired a collection of negative numbers represented by pairs of the form \((0,n)\).
Why is this? Well, notice that \((n,0)+(0,n)=(n,n)\), which is equivalent to \((0,0)\), which is the representative of \(0\).
In particular we have a way of subtracting \((n,0)\) from \((m,0)\) when \(n \gt m\). We just see how to subtract \((n,0)\) from \((m,0)\) when \(n \lt m\), and insist that it still works. If \(n \lt m\) we just have \((m-n,0)\), which is equivalent to \((m,n)\) and then we use the same thing when \(n \gt m\). But we know that in this case \((m,n)\) is equivalent to \((0,n-m)\), which gives a meaning to the standard statement that if \(n \gt m\) then \(m-n = -(n-m)\).
We also keep the notion of order. Any pair is (equivalent to one) of the form \((n,0)\) for \(n \in \mathbb{N}\) or \((0,n)\) where \(n \in \mathbb{N}, n \neq 0\). We say that an integer is positive if it has a representative of the first form, and negative if it has one of the second form. Then all the usual rules about multiplying signs follow automatically.
I won't go through the check that all this works. If you want, you can see some more detail here.
Now that we have the set of integers, \(\mathbb{Z}\), I'll just use single letters \(m,n\) etc to represent individual integers: but bear in mind that each of these letters represents an equivalence class of pairs of natural numbers.

Be rational

Now we can add, subtract and multiply. But mostly we can't divide. We say that \(m / n = q\) if \(nq=m\), but there usually isn't such a \(q\): for example, there is nothing in \(\mathbb{N}\) we can multiply by \(2\) to get the answer \(5\).
But we can use the previous trick again. The last time, we considered pairs to be equivalent if they had the same difference. We can do just the same thing but now thinking of the pairs as ratios instead of differences. So we think about pairs of integers \((m,n)\) where \(m,n \in \mathbb{N}\) and \(n \neq 0\). We now think of \((m,n\) and \((p,q)\) as equivalent if \[ mq=np. \]
so for example, \((1,2)\), \((2,4)\), \((3,6)\) and so on are all equivalent. Of course, we all already know that \(\frac{1}{2}\), \(\frac{2}{4}\), and \(\frac{3}{6}\) are different ways of writing the same fraction; this is all finding a way to say that when we don't have any fractions yet, only integers.
We call the set of equivalence classes of these pairs \(\mathbb{Q}\), the set of rational numbers.
Something happens here that I find quite striking. In the previous section, we introduced solutions to an 'addition' problem, and got something where the addition rule was simple, but the multiplication one wasn't. This time, we introduce solutions to a 'multiplication' problem, and get a simple multiplication rule, but a complicated addition one.
We have
  • \((m,n)(p,q)=(mp,nq) \)
  • \((m,n)+(p,q) = (mq+np,nq) \)
It takes a while, but you can check that all the usual arithmetic rules work when you define multiplication this way.
So now if we have two rational numbers \((m,n)\) and \((p,q)\) we can see that the answer to the question 'what do I multiply \((p,q)\) by to get \((m,n)\)?' is \((mq,np)\) as long as \(p \neq 0\).
What's more, every rational can be represented by a pair of the form \((m,n)\) where \(n \gt 0\). Then we say that the rational number has the same sign (positive, zero, or negative) as \(m\) does. And then inequalities behave themselves: you can add the same thing to both sides or multiply both sides by a positive quantity.
So now we have a set of objects (rather complicated objects, to be sure) which contain the natural numbers we started. With these numbers, we can add, subtract, multiply and divide (as long as we don't try to divide by \(0\)).
Let's stop to look around.
By means of two algebraic steps - really the same trick applied twice, once for addition and once for multiplication - we have gone from the natural numbers \(\mathbb{N}\) to the rational numbers \(\mathbb{Q}\).
In the rational numbers, we have a well-behaved addition and multiplication. Both are associative and commutative and the multiplication distributes over addition. There is an additive identity, \(0\), and a multiplicative identity, \(1\). Every rational number has an additive inverse, and any non-zero rational number has a multiplicative inverse.
So algebraically, the rational numbers are as well-behaved as you could hope for.
What's more, they are nicely ordered: we know what it means for a rational number to be positive or negative, and inequalities behave themselves, by which I mean that they are preserved by addition of any rational number or multiplication by any positive one.
And yet they are unsatisfactory. If we draw an isosceles right triangle with two sides of length \(1\), then the length of the hypotenuse should give \(2\) when squared. Unfortunately, there is no rational number with this property. There are rational numbers whose square is less than \(2\), and those whose square is greater than \(2\), and we can get the square as close as we want to to \(2\), but we can't get it exactly.
And this isn't the only missing value. So what can we do about it?

Mind the gap

This step is rather different from the others. The others filled in algebraic holes - a lack of additive or multiplicative inverses. Now we have to fill in a different kind of hole - one where we seem to be able to get as close as we want to to some value, but the value itself is missing.
First, we recall what it means for a sequence of numbers, \(q_n\), to converge to a value, \(q\). This is captured by saying that given any tolerance, the sequence eventually strays no farther than that from the limit: in the standard mathematical symbols, \[ \forall \epsilon \gt 0, \exists N \in \mathbb{N} | n \gt N \Rightarrow |q_n-q| \lt \epsilon. \] But this isn't as helpful as we might hope, because it's the sequences that look as if they converge but don't which are the problem. Fortunately, Cauchy came up with a clever way of characterizing a sequence which looks as if it ought to converge. This is that given any tolerance, pairs of terms in the sequence are eventually within that tolerance of each other. This time we have \[ \forall \epsilon \gt 0, \exists N \in \mathbb{N} | m,n \gt N \Rightarrow |q_n-q_m| \lt \epsilon \] and a sequence which satisfies this criterion is called a Cauchy sequence.
And now, just as we constructed objects to solve equations of the form \(x+n=m\) and \(xn=m,\) we need to construct objects to act as the limit of a sequence which looks convergent.
Again, we do it by defining some equivalence classes, which will fill in the gaps in \(\mathbb{Q}\) and be the real numbers. To make these equivalence classes, we say that two Cauchy sequences \(q_n\) and \(r_n\) are equivalent if \(q_n - r_n \to 0\). Then the set of equivalence classes is \(\mathbb{R}\), the set of real numbers. This time the element \(q\) of \(\mathbb{Q}\) is represented in \(\mathbb{R}\) by the constant sequence all of whose terms are \(q\).
Now we have to define the sum and product of two real numbers. This is easy enough: if \(x_n\) represents the real number \(x\), and \(y_n\) the real number \(y\), then \[ x_n+y_n \text{ represents } x+y, \qquad x_n y_n \text{ represents } xy \] It takes some effort, but we can again show that all the algebra keeps working as before. We can add, subtract, multiply and divide (except by zero), and all the standard algebraic properties still hold.
What do we get from this?
For example, if \(m\) is a natural number, we can define a sequence \(x_n\) by \[ \begin{split} x_0 &= 1\\ x_{n+1} &= \frac{x_n}{2}+\frac{m}{2x_n} \mbox{ if } n \gt 0 \end{split} \] This sequence satisfies Cauchy's criterion, and if it has a limit, \(x\), that limit satisfies \(x^2=m\); so we certainly get square roots for all the natural numbers.
In fact, we get a lot more new numbers besides these.
Order is also simple to define: \(x \lt y\) if for all \(n\) sufficiently large, \(x_n \lt y_n\). And again, it takes some effort, but we can check that the order still works in the same way.
We're not done yet. This makes sure that every sequence of rationals that looks as if it should converge has a limit. When the limit is not itself a rational number, we call it irrational.
But there's a potential problem here. We've got a way of adding in to the number system all the limits of Cauchy sequences of rational numbers which don't have a rational limit. But maybe there are still gaps. Maybe there are Cauchy sequences of the real numbers we've just constructed that don't have a limit (in this set). How often do we have to repeat this process? Could it even be that no matter how often we do it, there are still Cauchy sequences that don't converge in the set of numbers we've created so far?
The good news is that this doesn't happen. Having carried out this process once, we don't have to do it again. The real numbers are complete in the sense that every sequence that looks convergent (i.e. every Cauchy sequence) actually does have a limit in the real numbers.

The end of the road

And now we have a set that we can feel reasonably happy to call real numbers. They can be added, subtracted, multiplied and divided (except by zero). The rational numbers, the integers, and the natural numbers all sit inside them in what seems like a sensible way. They are ordered in a way that talks nicely with the algebra. And finally, there are no gaps.
Unfortunately, the actual objects are very unwieldy. A real number is an equivalence class of Cauchy sequences of equivalence classes of rationals, rationals are equivalence classes of pairs of integers, integers are equivalence classes of pairs of natural numbers, and natural numbers are those sets we call von Neumann numbers. This is awful. How are we supposed to do anything with objects like that?
It would make our lives much easier if we had a result like the one for the Peano axioms: that any two sets satisfying those axioms are really the same set but differently labelled. Unfortunately that isn't true for the properties we have so far.
But it's nearly true. The structure we've created has the property that if \(r \in \mathbb{R}\), then there is some \(n \in \mathbb{N}\) such that \(n \gt r\). This is called the Archimedean property. And if we put that into the mix, then we do have the same uniqueness result as before. So the good news is that as with the natural numbers, we don't really care about the construction: what it does is to demonstrate that the algebraic, order, and completeness properties we want are satisfied by essentially one object. And now we can just work with the rules, without caring about what is 'inside' our real numbers: how they behave is all that matters, and we can describe that without caring about the insides.

Odds and Ends

Decimal representation

So, what does this have to do with the usual representation of a real number as in integer plus a (possibly neverending) decimal part?
We define \(m.d_1 d_2 d_3 \ldots \) to be the real number defined by the Cauchy sequence whose \(n\)th term is \[ m + \sum_{i=1}^N \frac{d_i}{10^i} \] In fact, this is one way of understanding what we mean by \[ m+\lim_{N \to \infty} \sum_{i=1}^N\frac{d_i}{10^i}. \] Not only does this always give a Cauchy sequence (relatively easy to prove), but for any real number there is a sequence of this form (not quite so easy). So the real numbers actually are the familiar decimal objects.
As an added bonus, now you can really understand why \(0.\dot{9}=1\): the Cauchy sequence \(0.9\), \(0.99\), \(0.999 \ldots\) is in the same equivalence class as the constant sequence \(1,1,1\ldots\), so the two are just different representations of the same real number, namely \(1\).

Levels of irrationality

There are two different types of irrational number, both of which are produced by this single construction. We say that a number is algebraic if it a root of a polynomial equation with integer coefficients. So all rational numbers are algebraic, and so also is \(\sqrt{m}\) for any natural number \(m\).
But there are also irrational numbers which are not the root of any such polynomial: \(\pi\) and \(e\) are the best known of these. These are called transcendental. Trying to understand these is the object of transcendental number theory. One surprising property of the transcendental numbers is that they are easier to approximate by rational numbers than the irrational algebraic numbers. (Easy to approximate has to do with how close you can get with a denominator of a given size.)

Alternative Routes

There's more than one way to this endpoint.
We could start with the strictly positive integers, and proceed via the positive rationals to the set of all rationals. This is slightly more elegant in that we don't have to worry about the denominator being zero when we build the rationals. On the other hand, we don't get the integers as a natural subset of the rationals this way.
Once we have the rationals, there are various ways of filling in the gaps, by using different characterizations of just what the gaps are, two of which are particularly worth mentions.
One is to consider ways of splitting up the rationals into sets \(L\) and \(U\) so that \(L\) and \(U\) between them contain all rationals, every element of \(L\) is less than every element of \(U\), and \(L\) has no greatest element. Then we introduce the irrational numbers as least elements of those sets \(U\) which do not have a least element. This construction is known as the Dedekind cut after the late nineteenth and early twentieth century mathematician Richard Dedekind, and the idea is based on that of Eudoxus, a 4th Century BCE Greek mathematician.
The other is to say that any set which has an upper bound (i.e. a number greater than every element of the set) has a least upper bound. The rationals don't have this property, but if we add in the required least upper bounds this has the same effect of introducing all the required irrationals.
But again, the important thing is that whichever route we choose, we arrive at the same endpoint. The details of the objects we finally end up with will differ, but since the properties are always the same, we don't need to know just what they are. How they behave really is all that matters.

Keep right on past the end of the road

This has been quite a journey. We've filled in two algebraic gaps and one analytic one. But we don't actually have to stop here. There are still some missing objects. We can't, for example, solve the equation \(x^2+1=0\) in \(\mathbb{R}\). And there are no gaps between the real numbers to put the solution in. We can fill in the gap by using a different kind of trick, and introducing the complex numbers. This almost works, but this time as well as gaining something, you lose something: there is no way of ordering the complex numbers in a way that plays nicely with the algebra. But that's another story.

Acknowledgement

Thanks to Colin Wright (@ColinTheMathmo) for helpful comments on an earlier version of this and Adam Atkinson for typo spotting.

Thursday, 18 January 2018

What's a number? 1 - naturally

In mathematics (and before that, arithmetic) we get used to dealing with numbers in various ways. But what is a number?

What sort of a question is that?

Of course, we all learn to count, and for the most part we learn to become competent. In practice, what this means is that there is an agreed-on sequences of noises that we make when we point at objects, and the number of objects is the noise we make when we get to the last one. And with practice, we find that if we're careful we always make the same noise for the last item of a collection, even if you point at them in a different order. We also find that if you have a pair of collections then they can be matched up in a one to one fashion if and only if we assign the same noise to both of them.
With a little thought, we might come to the conclusion that, for example, three is the name of a number, and any collection that we get to three on by the 'point-and-make noises' algorithm contains three items. But just what is this three? It's the thing that all collections with three items have in common, but what is this 'three-ness' that they all share?
Not knowing just what a number is doesn't stop us learning how to do arithmetic, and this arithmetic is generally very useful. It prevents us from being cheated too easily in shops, and ensures that we know how many minibuses to book for a trip to the seaside, or how with more effort to direct a spaceship to the moon. But a chosen few eventually get uncomfortable, and start asking awkward questions about just what numbers are.
One rather cunning attempt to deal with the problem is to think about sets, and then to define three to be the set of all sets with three elements. Then to say that a set has three elements is to say that it is in this set (of sets).
Alas, that doesn't work. When you try to think about sets carefully, you discover that you have to be a bit careful about just which descriptions actually describe sets. And when you do that, you find that 'has three elements' doesn't describe one, because if you try to include it as a set you get contradictions: this is a Bad Thing. This was a bit of a blow to early developers of set theory, so another solution has to be found.

Who cares what a number is?

An alternative, and even more cunning attempt to deal with the problem is to ignore it. After all, humanity has been counting successfully for millennia without ever coming to any grief caused by not knowing what a number is. Surely all that matters is that, whatever they actually are, we know how they behave. So the solution to the problem is that if we want to do mathematics rather than just indulge in unimportant activities like making sure we get the right change when we buy a bar of chocolate, then we describe how numbers behave by means of some mathematical rules, and then work with the rules.
Well, time to be a bit more specific. I'm talking about the numbers we use to count things and do elementary arithmetic with here, so I'm talking about what are sometimes called the natural numbers, and I'll include in that the special number zero. (Some people like to start the natural numbers at one, but I'm not one of them.)
It's important to realise that there is no right way of doing this. We have to sit down and think about how numbers behave, and try to capture that behaviour is a neat collection of rules.
So we play with numbers (whatever they are) or rather some kind of representatives of them, and we come to various conclusions. We can add them, we can multiply them, order doesn't matter with addition and multiplication, and so on. Observations like this can be gathered indefinitely, so we need to pick out a collection of fundamental properties that the others follow from: an axiom system for the natural numbers.
One way of doing this is due to Giuseppe Peano, who wrote a book about it in 1889 (and in Latin: this was a time when by writing in Latin you might still hope to increase your audience rather than shrinking it). Nowadays, we tend to present things rather differently, but nevertheless Peano's axioms survive in somewhat altered form. So, here is one version of the Peano axioms for the natural numbers, consisting of a set \(\mathbb{N}\) with a successor function \(S:\mathbb{N} \to \mathbb{N}\):
  1. \(0 \in \mathbb{N}\)
  2. If \(n \in \mathbb{N}\), then \(0 \neq S(n)\)
  3. If \(m \neq n \in \mathbb{N}\), then \(S(m) \neq S(n)\)
  4. If \(K \subseteq \mathbb{N}\) such that \(0 \in K\) and \(n \in K \Rightarrow S(n) \in K\), then \(K = \mathbb{N}\)
We will also use the convenient shorthand \(S(0)=1\), \(S(S(0))=S(1)=2\) and so on, with the familiar numerals; privately we think of \(S\) as adding \(1\).
This certainly seems to capture some of how we think the natural numbers behave. \(0\) is the first one, we get all the rest by adding \(1\), and we never get the same answer by adding \(1\) to two different numbers. The last is less obvious, but it's just a slightly disguised way of saying 'proof by induction works'.
Now comes the surprise (at least, it surprises me).
These axioms essentially specify the natural numbers uniquely.
By that I mean that if we have another set, \(\mathbb{N}'\) with an element \(0'\) and a successor function \(S'\) satisfying the same axioms, then there is a (unique!) bijection \(i:\mathbb{N} \to \mathbb{N}'\) such that \(i(0)=0'\), and \(S(i(0))=i(S(0))\). So any two sets claiming to be the natural numbers are really just the same set with two different ways of labelling the elements and successor function.
Hang on, though (you may be thinking). It these things are meant to be the natural numbers, shouldn't there be operations of addition and multiplication?
Indeed there should. And they should have all the usual properties too of associativity, commutativity and distribution. But now we know that if we can define them in terms of \(0\) and the successor function, then the natural numbers are still basically unique. (And we don't have to come check the properties of addition and multiplication work separately.) So, here's how addition and multiplication work for \(m,n \in \mathbb{N}\), using \(+\) for addition and \(\cdot\) for multiplication:
Addition
  1. \(m+0 = m \)
  2. \(m+S(n) = S(m+n)\)
Multiplication
  1. \(m \cdot 0 = 0\)
  2. \(m \cdot S(n) = m \cdot n+m\)
These definitions are both inductive: they tell us how to combine a number with \(0\), and if we have to combine a number with something other than \(0\) we reduce it to dealing with a number nearer \(0\).
I wouldn't like to claim that it is trivial, but it is straightforward to show that addition and multiplication as defined are associative and commutative and that the distributive laws hold.

That's all very well, but are there any numbers?

And here's the snag. It's all very well saying that we don't care what they are, only how they behave. But how do we know that anything behaves like that?
Remember, we have a bunch of properties that we have abstracted from experience. For the numbers we're directly familiar with (say, up to a dozen or so) we have a fairly direct perception of the truth of the various properties. But why do we believe that they keep on going for numbers which are arbitrarily large? Why do we even believe that the mathematical structure makes sense? How do we know that the axioms are consistent? It could be that we have asked for too many properties, and defined the natural numbers out of existence.
Aside: probably apocryphal stories about PhD students who have proved hundreds of interesting properties of things that don't actually exist abound. We really want to avoid that situation.
In fact, asking about consistency of axioms open an enormous can of worms, which I don't propose to get into right now. I'll settle for a less rigorous approach that is at least very (very very) plausible, and which was invented by the amazing John von Neumann.
First, we assume that set theory is consistent. We don't even need scary bits of set theory, we just assume that no problems arise with some simple sets that I'll build as we go along.
First of all, we have the empty set, \(\emptyset\). This will be \(0\) in the model I'm going to describe for Peano's axioms. (It's a model in a highly technical sense too, but I also don't propose to get into that.)
Then the first few natural numbers are given by \[ \begin{split} 0 &= \emptyset \\ 1 &= S(0)= 0 \cup \{0\} = \{0\} \\ 2 &= S(1)=1 \cup \{1\} = \{0,1\} \\ 3 &= S(2)= 2 \cup \{2\} = \{0,1,2\} \end{split} \] and in general \[ n+1 = S(n) = n\cup \{n\} = \{0,1,\ldots n\} \] with \[ \mathbb{N} = \{0,1,2,3\ldots \} \]
You might wonder, as I did when I first met this, why we don't just use \(S(n)=\{n\}\). That's simpler. Well, there's nothing to stop you doing that, but it's less elegant: for a start, in this picture the set \(4\) has 4 elements, which is handy. In fact, there are other nice properties of these integers, such as the fact that \(m \lt n\) is equivalent to \(m \in n\), which has some huge ramifications. Those are for maybe a future post. For now, we settle for having a set and successor function that look as if they work.

So the natural numbers are these weird sets?

Fortunately not. These von Neumann numbers (often called the von Neumann integers, even though they don't include the negative ones) are just to show that the axioms make sense. The point of them is to say that the axiomatization makes sense, so it is in fact safe not to worry about what numbers are, just how they behave.
It's worth noting that there are models of the Peano axioms that look completely different. The fact that there is a bijection that preserves the structure doesn't mean that the objects have to bear any resemblance to each other.
So here's an entirely different approach. We consider a world of sets and functions, where the domain and codomain of each function is the same. I mean here if I look at any function, it maps some function to itself: different functions can do this with different sets. Then my integers are operators on the collection of functions that are built up like this:
  1. \(0(f)\) is the identity function.
  2. \(S(n)(f)\) is \(f \circ n(f)\)
This is the idea of Alonzo Church, and can be expressed neatly using his lambda calculus . (In fact the lambda calculus gives a nice way of expressing addition and multiplication in this model of the integers.)
The point is that these integers then are nothing like the von Neumann integers. They are instructions that tell you how often to compose a function with itself. But they have exactly the same structure as the von Neuman integers.
We could even make a model using strings of characters using the alphabet \(\{0,1,2,3,4,5,6,7,8,9\}\) where everything is defined using the usual rules for decimal arithmetic. (Yuck.)
And none of these is really the natural numbers. Or they all are, because they're all really the same, we're just expressing one ideas in several different way. The point is that the axiomatic approach tells us that it doesn't matter how we think about the natural numbers, we can reason about them just by using the axioms, secure in the knowledge that whatever we deduce will be true of whatever concrete realization we prefer.

Are you glossing over anything?

Lots. One is the the subtlety of the inductive axiom. I gave it as involving an arbitrary subset of \(\mathbb{N}\). But if we do logic (and sets) carefully, this is not part of first-order logic. We'd have to say something like
If \(P\) is any predicates such that \(P(0)\) is true and \(P(k) \Rightarrow P(k+1)\) for al \(k \geq 0\) then \(P(n)\) is true for all \(n \in \mathbb{N}\).
This doesn't look like such a big deal. And, after all, it's how we use it.
But it's an enormous deal. If you use this as your axiom, then the axioms are not strong enough to tie down the natural numbers to essentially one version. You can have non-standard models, which included numbers larger than all the normal integers. This is also tied up with non-standard analysis.
I've also glossed over the whole of (axiomatic) set theory, the attempt to put the idea of a set on a reliable footing so that it can't be broken by something like Russell's paradox. It's a fascinating topic in its own right, and a study of it explains why you get into trouble by trying to have a set of all sets with three elements. Interesting as it is, though, we don't need to go to all that effort to appreciate why the von Neumann numbers give a nice model.

Hang on, \(0\) isn't a natural number! This is all wrong!

It is true, not everybody includes \(0\) as a natural number: some only consider the strictly positive integers to be natural numbers. It's one of mathematics's little ironies that there is no universally agreed definition of a natural number. Neither definition is clearly better than the other: each has its minor context dependent advantages and disadvantages.
If I had to draw up approximate lines, I'd probably say that in general terms logicians and set theorists tend to include \(0\), number theorists tend not to. And I will admit now (to save anybody the effort of pointing it out later) that Peano started at \(1\), not \(0\).
I've included \(0\) because it's what I'm used to, and because the von Neumann integers are such a compelling model, and it has a natural place there. I also like there being an additive identity. If you really can't bear the thought if \(0\) being a natural number (and if you can't, I can't imagine what compelled you to get this far) then just consider this whole thing as am discussion of what the non-negative integers are.

Why does the title have a 1 in it?

Now that we know what the natural numbers are (or aren't) we can build on this to make other number systems. I intend to explore a couple of them (eventually) in subsequent blog posts.

Friday, 29 December 2017

Measuring up to Heisenberg (and since)

'Everybody knows' that the uncertainty principle of quantum mechanics tells us that we can't make a measurement on a system without disturbing it, and that that's why you can't simultaneously measure the position and momentum of a particle. This isn't an entirely unreasonable thing to think: it is, after all, closely related to Heisenberg's original argument involving using light to probe the properties of a particle. But it's wrong. The situation in quantum mechanics is subtler and much more interesting than that.

Statistics rears its ugly head

Suppose we have a large collection of boxes, each containing a particle, and they have been prepared so that the particle in every box is in exactly the same state. (Or if you prefer, one box containing a particle, prepared in the same way a large number of times.)
You may be wondering how we do that. I'm not going to tell you, largely because I have no idea. But I can tell you what quantum mechanics predicts about the measurements we make on the particles inside. You may think this is unfair, but it's no less fair than using classical mechanics to tell you what happens to a ball thrown at 20 meters per second at an upward angle of 30 degrees from a cliff edge 100 meters above sea level: classical mechanics doesn't tell us how to arrange that situation either, only what the consequences are.
Now we measure the \(x\)-coordinate of the position of each particle. It could be that every single box is guaranteed to give the same value for this measurement. In this case we say that the particle is in an eigenstate of \(x\)-position, and the eigenvalue is the \(x\) value that we measure.
But this is not a very likely occurrence. It is much more likely that the measurements result in different values. Then we can compute the average value of all the measurements, and if we have the enthusiasm and stamina, we can compute the standard deviation. Doing this lots of times with differently prepared particles, we find that sometimes the standard deviation is large, and sometimes it is small: so sometimes the particle has a relatively well-defined position (in the sense that the measurements tend to be close together), and sometimes not.
Instead of measuring the \(x\)-coordinate of the position of each particle, we might measure the \(x\)-component of its momentum. A similar situation results. Depending on the initial preparation, we sometimes find a large spread of results, and sometimes a small spread.
There is an obvious question we might ask: what is the relationship between the measurements of position and momentum? The answer is simple to describe, but not quite so easy to explain.
It turns out that we can prepare the particles so that they have a very small standard deviation in their position measurements. We can also prepare them so that they have a very small standard deviation in their momentum measurements. But we can't prepare them in a way where both sets of measurements have a small standard deviation. In fact, the product of the two standard deviations cannot be reduced below a certain value. The more tightly clustered the position measurements are, the more spread out are the momentum measurements; conversely, if the momentum measurements are tightly clustered, the position ones are more spread out.
This is not easy to understand from the point of view that the act of measurement disturbs the system. All the measurement are being carried out on particles in the same state, but some measure position and others measure momentum. So what is going on here?

What is a particle's state?

This is one way in which quantum mechanics is really different from classical mechanics.
In classical mechanics, the state of a particle is described completely by its position \(\pmb{x}\) and momentum \(\pmb{p}\). Given these quantities and the knowledge of the forces acting on the particle, the subsequent values of position and momentum are entirely determined.
The quantum mechanics picture is rather different.

Observables

In quantum mechanics, the state of a system is described by a vector, often denoted \(\psi\). So far, this does not sound so different from the classical mechanics situation: we can think of the position and momentum as a vector too. But it's a different kind of vector.
In the quantum mechanics picture of the universe, each quantity that we can observe is associated to a linear operator (in fact, a special kind of linear operator, called Hermitian, but I won't go into that level of detail here) I'll use \(H\) to represent a (Hermitian) linear operator. What this means is that \(H\) is a machine that I can feed a vector and it returns a vector in a well-behaved way: if \(\alpha, \beta\) are any two complex numbers, and \(\psi,\phi\) are any two vectors, then \[ H(\alpha \psi + \beta \phi) = \alpha H(\psi) + \beta H(\phi) \]
Although I slid past it fairly quickly just then, you may have noticed that I said \(\alpha\) and \(\beta\) could be complex numbers: this is one of the ways that quantum mechanics is different from classical mechanics---the mathematics of complex numbers is there at the fundamental level of the kind of vectors that can be used to describe the state of a system.
Now, if we choose our state vector very carefully, we can have the situation \[ H(\psi) = \lambda \psi \] where \(\lambda\) is a real number. When this happens, \(\psi\) is called an eigenvector of \(H\), and \(\lambda\) is the associated eigenvalue. (For general linear operators the eigenvalues can be complex, but for Hermitian operators they must be real.)
Eigenvectors are very important in this story. I'll explain why shortly, but first, we need to know this: If \(H\) is the linear operator corresponding to an observable, then there are eigenvectors \(\psi_i\) of \(H\) such that any state vector \(\psi\) can be expressed in the form \[ \psi = \sum_{i} \alpha_i \psi_i \] in just one way, where \(\sum_i |\alpha_i|^2 = 1\). (We call this set \(\{\psi_i\}\) the basis of eigenvectors associated with \(H\)).

Observations

'Just what,' you should be wondering by now, 'does this have to do with measuring a property of a particle?'
Well, given an observable, \(H\), (note that I'm abusing terminology a bit here by referring to the linear operator as the observable: I do that) and its associated basis, \(\{\psi_i\}\), we know that each \[ H (\psi_i) = \lambda_i \psi_i \] for some real number \(\lambda_i\).
Then if we have a particle in the state \[ \psi = \sum_{i} \alpha_i \psi_i \] We can carry out a measurement of the observable corresponding to \(H\).
The result of the measurement must be one of the values \(\lambda_i\), and a result of \(\lambda_i\) occurs with probability \(|\alpha_i|^2\). (So we don't know just what the result will be, just the probabilities of the various possible results.) Furthermore, immediately after the measurement, if the measured value is \(\lambda_K\), then the state of the system is \(\psi_K\). (This is called the collapse of the wave function.)
Quantum mechanics does not tell us which outcome will occur, it only tells us the probabilities of the various outcomes.
So for example, if the state of the system were \(\psi = 0.6 \psi_1 + 0.8 \psi_2\), then the result of measuring \(H\) would be \(\lambda_1\) \(0.36\) of the time (and immediately after these measurements the system would be in state \(\psi_1\)), and it would be \(\lambda_2\) the remaining \(0.64\) of the time (and immediately after those measurements the system would be in state \(\psi_2\)).
On the other hand, if the state of the system were \(\psi=\psi_1\), then the result of measuring \(H\) would always be \(\lambda_1\), and the state of the system would be unchanged by the measurement. (This, of course, contradicts the claim that it is impossible to make a measurement without disturbing the system.)
So we can see that (given this model of particle states and measurements), the average and spread of measurements is determined by the coefficients \(\alpha_i\); if most of the weight is associated with a particular basis vector, then the results have a very small spread, and if many basis vectors make a significant contribution, then the results are more spread out. |

Uncertainty

So now we can try to see what this might suggest about measurements of position and momentum. Let's call the operator corresponding to measurement of \(x\)-position \(X\), with corresponding basis \(\{\psi_i\}\) and the operator for the momentum measurement, \(P\), with corresponding basis \(\{\phi_i\}\).
The crux of the situation is that the \(\psi_i\) cannot be eigenvectors of \(P\), and the \(\phi_i\) cannot be eigenvectors of \(X\).
So if a particle is prepared in a state which guarantees a particular value of \(x\), so \(\psi = \psi_i\), then it must be a combination of \(\phi_i\); likewise, if \(\psi=\phi_i\), it must be a combination of \(\psi_i\). So certainty of position means that momentum is uncertain, and vice versa.

What's going on here?

The problem is that the operators \(P\) and \(X\) do not commute: if \(\psi\) is a state vector, then \(PX(\psi) \neq XP(\psi)\). The mathematics of Hermitian operators tells us that it is possible to find a basis of vectors which are eigenvectors for a pair of \(H_1\) and \(H_2\) if and only if the two commute, i.e. \[ H_1H_2 (\psi) = H_2H_1(\psi) \] for all vectors \(psi\).
We can define the operator \([H_1,H_2]=H_1H_2 - H_2H_1\), called the commutator of \(H_1\) and \(H_2\), and then we can equivalently say that \(H_1\) and \(H_2\) commute if \[ [H_1,H_2] = 0 \] (meaning that \([H_1H_2](\psi =0) \) for all \(\psi\).
It then follows that since \(P\) and \(X\) do not commute, one cannot have simultaneously sharply defined values for both the \(x\)-position and the \(x\)-component of momentum for a particle.
On the other hand, if you find two observable which do commute, then it is possible to have sharply defined values for both; indeed, either both will be sharply defined or neither will!

Heisenberg's uncertainty relation

Heisenberg's uncertainty relation (as we now understand it) tells us how the extent to which two operators fail to commute tells the extent to which values of the corresponding observable cannot be simultaneously well-defined. More precisely, given a state \(\psi\), and observables \(A\), and \(B\) the product of the standard deviations of measurements of \(A\) and \(B\) must be at least half the length of the vector \([H_1,H_2](\psi)\).
In the case of \(x\)-coordinate of position and \(x\)-component of momentum, the length of \([X,P](\psi)\) is always \(h/2\pi\), given the well-known uncertainty relation.

Just a minute \(\ldots\)

We've actually shifted ground in a fairly radical way here. We've gone from 'you can't measure a particle's position and momentum simultaneously' which comes from a world-view in which it has a position and momentum, but we can't access the values', to 'a particle's state does not simultaneously specify a sharp position and momentum' which comes from a world-view where the state of a particle just isn't the same information as it is in classical physics.
This, together with the fact that the theory is probabilistic rather than deterministic has been a significant issue in the development of quantum mechanics. The desire to maintain a 'realist', deterministic picture of physics, in which particles do have well-defined position and momentum even if we can't know them survives, and even now has its proponents.
And all this doesn't actually explain the uncertainty principle. It just provides a mathematical/theoretical framework which it is a consequence of. But that's OK, because that's generally how physics proceeds: we have phenomena that are hard to understand, and we develop a structure in which those phenomena are natural. Once we've become thoroughly accustomed to that structure, we tell ourselves we understand what's going on. At least, until the next Big Idea comes along and we start all over again.

Further reading

If this has whetted your appetite for more detail, there are, of course, many places to go to. This is a short list of 'not a standard undergraduate textbook' books which I think are interesting.
  • Susskind and Friedman Quantum mechanics: the theoretical mininum gives a careful presentation that doesn't require a huge amount of mathematical backjgound. The first five chapters fill in most of what I've glossed over above, and then it goes on to look at other interesting stuff such as entanglement. It also explains how the quantum state evolves in time, which I haven't mentioned at all.
  • Penrose's The Emperor's New Mind (and his other more recent tomes) give wonderful insights into quantum mechanics (inter alia), including some speculative ideas about wave function collapse.
  • Bohm and Hiley's The undivided universe is an exposition of pilot wave theory, which provides a deterministic and realist model of quantum mechanics. I haven't invested the time and effort to be able to do more than say: this looks like a good place to start if you want to know more.
  • Feynman's lecture notes are, of course, worth looking at once you have some idea what's going on.
You may have noticed that I haven't listed any resources on the philosophy of quantum mechanics. This is because I know my limitations.

Friday, 17 November 2017

Russian peasant cryptography

What is the relationship between Russian peasant multiplication and modern cryptography? Unlike the multiplication procedures commonly taught, it can easily be adapted to compute powers rather than products; and we need to calculate powers for some cryptosystems.
I should probably admit right away that this is not going to be at all informative about how Russian peasants managed keep their communications secret from their aristocratic overlords.

Russian Peasant Multiplication

If you haven't seen this before, it's a rather neat approach to multiplication which doesn't require the memorization of multiplication tables, but only the ability to double or half integers. It seems to be rather unclear just what it has do do with Russian peasants: it's certainly less than Danish pastries have to do with Denmark. The method was known to the ancient Egyptians, who considerably predate Russian peasants.
The general description is simple, but not very informative. Nevertheless, I will start by presenting it.
If you want to multiply together two integers, \(m\) and \(n\), form two columns, one headed by \(m\) and the other by \(n\). Then repeatedly add a new row obtained by halving \(m\) (discarding any remainder) and doubling \(n\), stopping when you end up with a \(1\) in the \(m\) column. Finally, add up all the entries in the \(n\) column where the \(m\) value is odd.
It's much more informative to see an example. So let's work out \(13 \times 7\). This gives us the table \[ \begin{array}{|c|c|} \hline m & n \\ \hline 13 & 7\\ 6 & 14\\ 3 & 28\\ 1 & 56\\ \hline \end{array} \] after which we add up the right hand entries when the left hand entry is odd, giving \[ 7+28+56 = 91 \] which we can quickly check is \(13 \times 7\).
The interesting question is, of course, why does this mysterious looking procedure work?
If we look at the left hand column, and work our way up, recording a \(1\) when we see an odd number, and a \(0\) when we see an even one, we obtain \(1101\), which is (by no coincidence whatever) the binary representation of \(13\). In fact, this is exactly the standard algorithm for converting numbers to their binary representation.
Then we can see that the repeated doubling of the \(m\) column is calculating the product of \(m\) with the corresponding power of \(2\), so that adding those corresponding to an odd value in the \(n\) column is adding up \(m\) multiplied by those powers of \(2\) which sum to \(m\).
So by using this we can multiply any integers together, and what's more the algorithm is reasonably efficient. I admit that it's not as efficient as the column or grid methods usually taught now, but at least you don't need to know your times tables to do it.
This is rather nice. At least, I like it. But rather remarkably, it can be adapted slightly to work out something apparently much more complicated, and very useful in modern (public key) cryptography.

Russian peasant exponentiation

We make two changes to the above algorithm above, both only involving the \(m\) column.
  • Instead of doubling, we square
  • Instead of adding at the end, we multiply.

Then in each row of the table, instead of having \(m\) multiplied by the corresponding power of \(2\), we have it raised to the corresponding power of \(2\). Multiplying all these together then gives us \(n^m\).
Let's see how it works with the previous \(m\) and \(n\). \[ \begin{array}{|c|c|} \hline m & n \\ \hline 13 & 7\\ 6 & 49\\ 3 & 2401\\ 1 & 5764801\\ \hline \end{array} \] and multiplying the relevant terms together gives \[ 5764801 \times 2401 \times 7 = 96889010407 \] which is, indeed, \(7^{13}\).
But unlike the multiplication, this isn't just reasonably efficient: this is remarkably efficient.
If we want to find \(n\) to the power of \(m\), we could do it by multiplying together \(m\) lots of \(n\). This is like finding \(m \times n\) by adding together \(m\) lots of \(n\): you can do it, but if \(m\) is bigger than about \(3\) you really have better things to do with your time.
But with this algorithm, the number of squarings that you have to do is at most about \(3\) times the number of digits of \(n\) (since the binary representation of a decimal number has about \(3\) times as many bits as the decimal one has digits), followed by that many multiplications.
Of course, we need to be able to multiply to do this, but that's OK: the multiplication algorithm lets us multiply by doubling and adding, and now the adapted version lets us exponentiate by squaring and multiplying. It's still a good deal.

A quick outline of RSA

The point of this is that in the RSA (Rivest-Shamir-Adleman) approach to public key cryptography (and also in some others), we need to compute powers in order to encrypt and decrypt messages.
The way that the procedure works is that you choose a private pair of primes, \(p\) and \(q\), and tell the world the value of \(N=pq\). (You keep \(p\) and \(q\) secret: the algorithm relies on the fact that factorizing \(N\) is hard.)
Then you pick a number \(e\) and find \(d\) with the property that \(ed\) is \(1\) more than a multiple of \((p-1)(q-1)\). You tell the whole world \(e\). (You keep \(d\) secret. The algorithm relies on the fact that finding \(d\) without knowing \(p\) and \(q\) is hard.)
Then you represent your message as an integer \(M \lt N\). The encrypted form of your message, \(E\) is the remainder when you divide \(M^e\) by \(N\).
Then by the magic of number theory (actually, it isn't magic, it's Fermat's Little Theorem) the original message is the remainder when you divide \(E^d\) by \(N\).
In principle, this is all there is to it, and there are many descriptions available, often illustrated using values of \(N\) which are small enough to make all the arithmetic tractable.
But there's an issue.
In practice, the values of \(M\) and at least one of \(e\) or \(d\) is very large. If you had to do repeated multiplication, the universe would die of old age before you were finished. This algorithm reduces the amount of work to manageable proportions. \(e\) (or \(d\)) multiplications are reduced to a number of multiplications that is a small multiple of the number of digits of \(e\) (or \(d\)).
That wasn't the only issue.
The value of \(M^N\) is almost inconceivably large. You couldn't write it down if you used every elementary particle in the known universe to hold one digit. So how does that work?
The answer is more number theory. One of the really nice properties of working out remainders is that it doesn't matter where in a calculation you do it. If you want to know the remainder of some quantity when you divide by \(N\), then you can calculate the whole thing then divide by \(N\); or, rather more sensibly, you can work it out in stages and take the remainder whenever the working value exceeds \(N\). You are guaranteed to obtain the same result in the end.

Russian Peasant RSA

So now we can see how to compute an encrypted message. We can use the adapted Russian peasant multiplication, but now we just add the final detail that whenever the squared number exceeds \(N\), we take the remainder on division by \(N\).
Let's see this work with finding the remainder on dividing \(7^{13}\) by \(59\).
\[ \begin{array}{|c|c|} \hline m & n \\ \hline 13 & 7\\ 6 & 49\\ 3 & 2401 \rightarrow 41\\ 1 & 1681 \rightarrow 29\\ \hline \end{array} \] giving \[ 29 \times 41 \times 7 = 8323 \rightarrow 4 \] which is (phew!) the remainder on dividing \(96889010407\) by \(59\).
But here's the punchline. This algorithm for calculating the exponential is what is usually referred to as exponentiation by (repeated) squaring. And this (or some variation on it) is in fact the algorithm used in implementations of public key cryptography (such as RSA) which rely on computing exponentials.
So there we have it. What is essentially an algorithm for multiplication which can be used by people who don't know their times tables turns into an algorithm for exponentiation which underlies contemporary cryptographic methods.

Acknowledgements

At the 2017 MathsJam Gathering, Elaine Smith and Lynda Goldenberg gave a talk about techniques for multiplication including the Russian peasant method. That was when the penny dropped for me that the repeated squaring algorithm was an adaptation of the Russian peasant multiplication, and I decide it was worth sharing the realisation. About twenty minutes lates, Oliver Masters gave a talk about the Fibonacci matrix where he mentions the repeated squaring algorithm for matrix exponentiation. Fortunately he didn't relate it to the multiplication algorithm. About five minutes after that, Colin Wright (@ColinTheMathmo) was wrapping up the talks and he mentioned the relationship. But by that time I had the bit between my teeth and had decided to do this anyway, so I have.
If you enjoyed this, then you'll probably enjoy @ColinTheMathmo's series of articles about Perrin numbers.