Sunday, 25 March 2018

Cantor and infinity in a countable universe.

Cantor's theorem states that the cardinality of a set is always strictly less than the cardinality of its power set. But there is a universe for set theory which is countable. What's going on here?

There's a really nice PBS Infinite Series video on Youtube How Big are all Infinities Combined, exploring the sizes of infinite sets. (If you haven't watched it, this would be a great time. I'll wait.) In this video they make heavy use of Cantor's theorem, which says

Given any set \(S\), its power set \(\mathcal{P}(S)\) is bigger than \(S\)

where \(\mathcal{P}(S)\) is the set of all subsets of \(S\).

The amazing thing about this is that it is true even when \(S\) is infinite, though we have to work a little to say what we mean by bigger than here: what we mean is that there is no injective function \(\mathcal{P}(S) \to S\), though there is an obvious injective function \(i:S \to \mathcal{P}(S)\), given by \(i(x)=\{x\}\) for each \(x \in S\).

It's a non-trivial fact (the Cantor-Schröder-Bernstein theorem) that given two sets \(A\) and \(B\), if there is an injection from \(A\) to \(B\) and an injection from \(B\) to \(A\), then there is a bijection from \(A\) to \(B\); so it makes sense to say that \(A\) is smaller than (or equal to) \(B\) in size if there is an injection from \(A\) to \(B\), but none from \(B\) to \(A\).

This immediately tells us that there is a whole tower of infinite sets that we can make; starting with the natural numbers \(\mathbb{N}\), then we can think about \(\mathcal{P}(\mathbb{N})\), \(\mathcal{P}(\mathcal{P}(\mathbb{N})\), and so on, each of them strictly larger than its predecessor.

It's not very hard to prove that the second of these, \(\mathcal{P}(\mathbb{N})\), is the same size as \(\mathbb{R}\). It is an interesting question whether there is any infinite size between that of \(\mathbb{N}\) and \(\mathbb{R}\), but one which I won't consider here.

In the video (you have watched it now, right?) the question is how big all the infinities combined is.

But there's even an issue with what it means for one infinite set to be bigger than another, and that's what I want to explore here.

The problem is, to a large extent, knowing just what sets and functions are.

It's easy enough to decide what a function is if you know what a set is.

If \(A\) and \(B\) are sets, we think informally as a function \(A \to B\) as a rule that assigns to each element of \(A\) an element of \(B\). We can formalise this by saying that a function \(A \to B\) is a subset of \(A \times B\) (the set of all ordered pairs \((a,b)\) where \(a \in A\) and \(b \in B\)) which has the property that each element of \(A\) appears in exactly one of the ordered pairs.

But what's a set?

It would be really nice if we could just say that a set was a (somehow) well-defined collection of objects. But trying to do without some kind of restriction on how we define the sets tends to lead to real problems, the most famous of which is Russell's paradox:

Let \(X\) be the set of all sets which are not elements of themselves. Then if \(X \in X\), by the definition of \(X\), \(X \not \in X\). But conversely, if \(X \not \in X\) then again by the definition of \(X\), \(X \in X\).

This is a genuine paradox in the world of naive set theory, so we need to be a bit more careful about what counts as a description of the elements comprising a set if we want to avoid problems like this.

The (ok, a) solution is to develop axiomatic set theory, and one particularly popular version of that is ZFC (which stands for Zermelo-Fraenkel plus the Axiom of Choice). I won't explain just how ZFC works here, but in essence it is a set of rules that tell us how we can build new sets out of existing ones, and a couple of particular sets that we insist on (so that we can build others out of them). One of the sets we insist on is the set of von Neumann integers, so \(\mathbb{N}\) is definitely in in there.

There's always the question of whether our axioms are consistent. Most mathematicians are happy that the ZFC axioms are in fact consistent, partly because we can describe a collection of objects (which we call sets) that seem to satisfy all the axioms. The study of particular collections of objects and operations on them which satisfy particular sets of axioms is called model theory, and it throws up some surprising results.

We'll call a collection of sets that satisfy the axioms of ZFC a universe.

There is a fundamental theorem of model theory, the downward Löwenheim-Skolem theorem that tells us that there is a countable universe (yes, the entire universe is countable) that satisfies the axioms of ZFC.

But there's clearly something fishy about this. Cantor's theorem also follows from these axioms of set theory, and that tells us that the power set of \(\mathbb{N}\) is uncountable.

How does an uncountable set fit into a countable universe? This is Skolem's paradox.

And this is where the issue of just what is a set rears up in all its glory and horror.

Let's review the situation.

The entire universe is countable, so in this model of set theory there are only countably many subsets of \(\mathbb{N}\). (This model clearly fails to include most of the subsets that we might think ought to be there.) Since the universe is countable and \(\mathcal{P}(\mathbb{N})\) is countable, there is a bijection between \(\mathbb{N}\) and \(\mathcal{P}(\mathbb{N})\).

But Cantor has proved that there can be no such bijection.

But remember what a function \(A \to B\) is: it is a subset of \(A \times B\) satisfying a certain constraint, namely that every element of \(A\) occurs in exactly on of the ordered pairs.

We know that the \(\mathcal{P}(\mathbb{N})\) in this universe is countable, so there is a collection of ordered pairs \((n,X)\), one for each natural number \(n\) and such that every \(X \subseteq \mathbb{N}\) in this universe appears in the collection.

The escape clause is the one that tells us that this collection is not—cannot be—one of the sets in the universe.

Looking at the universe from outside, we see a function—what we might call an external function—giving a bijection between \(\mathbb{N}\) and this universe's model of \(\mathcal{P}(\mathbb{N})\).

But if we live inside the universe, we can't see this: it doesn't exist in the universe. There is no internal function which matches the natural numbers with the sets of natural numbers.

You may well have an objection to raise at this point: the real numbers are a complete ordered field, and it's also a theorem that this defines them up to isomorphism. And we know that the real numbers (at least in the mathematical universe we're accustomed to) really are uncountable. So how do they fit into this countable universe?

The answer to this is unavoidably a little technical.

The completeness axiom for the real numbers says something about all subsets of \(\mathbb{R}\); this is a second order axiom. The axioms of ZFC aren't powerful enough for this type of operation. They are first order, which (roughly speaking) means that they let you say something about all elements of a set, but not all subsets of it.

If you use the first order version of the axioms for a complete ordered field, there is more than one structure satisfying the axioms. Some are countable (as must be the case in the countable universe of ZFC guaranteed by the Löwenheim-Skolem Theorem), and some are larger (indeed there are models of all infinite cardinalities).

So all the above is true if we use the power of first order axioms to specify what a set is. If we allow second order axioms, so that we can use the notion of 'all subsets', then the game changes. Then why don't we use some kind of second order theory, and avoid this kind of nonsense?

It's very tempting to do this. Unfortunately, there are problems with second order theories too. Though it solves some of the problems, for example tying down the real numbers much more specifically, the notion of proof in a second order theory is not as well behaved as in first order theory. Consequently, logicians who work on the foundations of mathematics explore the strengths and weaknesses of first and second order theories.

And what about the normal 'working mathematicians'? Most aren't particularly concerned with these foundational issues: and it's quite possible to go through a degree, a PhD, and a research career in mathematics without meeting or having any reason to care about them. In any given context, most mathematicians simply use first or second order axioms according to which are more convenient.

Acknowledgement

This post was sparked off by the Infinite Series video How Big are all Infinities Combined. Gabe Perez-Giz (@fizziksgabe) also bears some responsibility, since when I made a brief comment on twitter about this he asked if I had a blog post about it. Well, now I do. Thanks also to James Propp (@JimPropp) for some helpful suggestions on presentation.

Sunday, 18 March 2018

How old is the Universe?

The Universe is about fifteen billion years old, and started off with a Big Bang, as any fule kno. Well, kind of, but that's more than a slight over-simplification.

There are two fairly obvious questions: first, how do we know? and second, what does that actually mean? Advance warning: there will be mathematics. But if you've got as far as knowing what a differential equation is you should have all you need to cope.

Modelling the Universe

The Mathematics Contribution

Unfortunately, there's no apparent way of directly observing the age of the universe. Any measurement of its age is going to be pretty indirect, and rely quite heavily having on some kind of theory or model of the universe.

First, we'll find a mathematical framework for the model, then incorporate physics into it later.

To start off with, here are some fairly plausible assumptions about the nature of the universe.

The universe is homogeneous
The universe is isotropic

The first assumption is that the universe looks (at any given time) much the same no matter where you are; the second is that it is much the same in every direction. We know that the universe looks (pretty much) isotropic from where we are; assuming that we are nowhere special and the universe is much the same at every point implies that it is also isotropic everywhere.

From these two, we can deduce that (in terms of some very well-chosen coordinates) the geometry of the universe is given by the metric \[ ds^2 = c^2 dt^2 -a(t)^2\left(\frac{dr^2}{1-kr^2} + r^2( d\theta^2+\sin^2(\theta)d\phi^2) \right). \] where \(k\) is either \(-1\), \(0\) or \(1\), and \(a(t)\) is some as yet undetermined function called the scale factor.

The metric tells us about the nature of the separation between two nearby points. There's a lot packed up in that, so it's worth taking a closer look.

Consider a couple of nearby points in the universe, where nearby means that the differences in their coordinates are small, and we call those differences \(dt, dr, d\theta\) and \(d\phi\) . There are a couple of interesting cases.

If the points have the same \(r, \theta, \phi\) coordinates, then \(ds^2=c^2 dt^2\), and \(dt\) is just how much time passes between the two.

On the other hand, if the points are at the same time, then \[ -ds^2 = a(t)^2\left(\frac{dr^2}{1-kr^2} + r^2( d\theta^2+\sin^2(\theta)d\phi^2) \right) \] is the square of the distance between them. We can see how we can regard this distance as the product of \[ \sqrt{\frac{dr^2}{1-kr^2} + r^2( d\theta^2+\sin^2(\theta)d\phi^2))} \] which we call the coordinate distance and \(a(t)\) which we call the scale factor.

The different values of \(k\) tell us about the spatial geometry of the universe.

If \(k=-1\) the universe has negative spatial curvature. This means that if we consider a sphere of radius \(R\), the volume of the enclosed space grows faster than \(4\pi R^3/3\).
If \(k=0\) we have flat space. Then the volume enclosed by a sphere of radius \(R\) is the familiar \(4 \pi R^3/3\).
If \(k=1\) then the universe has positive spatial curvature. In this case the volume enclosed in a sphere of radius \(R\) is less than \(4 \pi R^3/3\).

There is nothing to choose between these possibilities from the point of view of the mathematics. Observationally though, as far as we can tell the universe is spatially flat, i.e. \(k=0\).

It's also worth making a note that although the coordinates make it look as if there's a centre of the universe at at \(r=0\), this isn't true. This \((r,\theta,\phi)\) system of coordinates are just the familiar spherical polar coordinates: we can equally well choose any point to be at \(r=0\).

So our two assumptions mean the the universe can be described in terms of just one dynamical quantity, called the scale factor. This function \(a(t)\) described the entire history of the universe. But the mathematics doesn't tell use anything about it. For that, we require some physical input.

The Physics Contribution

In order to figure out how \(a(t)\) behaves, we need to know what kind of material the universe is full of. Then the Einstein Field Equations tell us the relationship between what the universe is filled with and its geometry (as expressed by the metric), and can be used to find the metric.

So we need a good candidate for the stuff that fills the universe.

One plausible approximation is to thing of the universe as full of a fluid, whose constituent 'atoms' are clusters of galaxies. (Obviously we are working on a very large scale here for this approximation to be sensible.) It is also reasonable to take this fluid to have negligible viscosity, so we model the content of the universe as an ideal fluid.

Given this, the state of the fluid is given by two quantities, the pressure \(p\) and density \(\rho\). Because of the homogeneity and isotropy assumptions, these only depend on \(t\). With quite a lot of calculation, one eventually finds a pair of coupled ordinary differential equations for \(a\), \(\rho\) and \(p\): \[ \begin{split} \left( \frac{\dot{a}}{a} \right)^2 &= \frac{8\pi G}{3}\rho +\frac{\Lambda c^2}{3} - \frac{kc^2}{a^2}\\ \left(\frac{\ddot{a}}{a}\right) &= -\frac{4 \pi G}{3}\left( \rho + \frac{3p}{c^2} \right) + \frac{\Lambda c^2}{3} \end{split} \] In these equations, \(G\) is the gravitational constant, and \(\Lambda\) is the cosmological constant, which appears naturally in the Einstein Field Equations. For a long time it was believed to be \(0\), but recent measurements suggest is is a very small positive quantity.

OK, so this gives us two equations: but there are three quantities to be dealt with, \(a\), \(p\) and \(\rho\). We're missing some information.

To fill the gap, we need to know a bit more than just that the universe is full of an ideal fluid. We need to know what kind of ideal fluid it is. And one way of saying that is by giving a relationship between \(p\) and \(\rho\), which relationship we call the equation of state. Equipped with this we have, at least in principle, all the information required to specify the history of the universe in terms of \(a\), \(p\) and \(\rho\) as functions of \(t\).

A Brief Digression on the Hubble Parameter

In passing, it's worth noting that that quantity \(\dot{a}/a\) which appears in the first equation above is our old friend the Hubble parameter, usually denoted \(H(t)\). (I don't like calling it the Hubble constant, since it's a function of time.) At any time \(t\), the Hubble parameter \(H(t)\) tells us us the relationship between the distance between two objects at fixed spatial coordinates (i.e. fixed \((r,\theta,\phi)\), and the rate at which that distance is changing.

You might briefly wonder how things at fixed position can be getting further apart: but remember, the distance between things is the product of the scale factor \(a(t)\) and the coordinate distance, so even though the coordinate distance is fixed, this physical distance between them changes with \(a(t)\).

The Age of the Universe

Now that we have a model of the universe, we can at least try to work out how old the universe is according to that model. And then we can try to deduce from that something about the actual universe.

There's a useful bit of notation worth introducing here. I decide to call the current time \(t_0\), and then just put a subscript \(0\) on any quantity to indicate that it is the value at \(t_0\): so \(a_o = a(t_0)\), \(H_0 = H(t_0)\) etc.

We should note that that the units of \(H\) are those of inverse time, so on dimensional grounds one might hope that the age of the universe is given by (some multiple of) \(1/H_0\). (Thanks to Leo Stein, @duetosymmetry, for pointing this out.) We'll see how this hope plays out for one simple case just below.

A Simple Dust Filled Universe

As an illustration of how we can extract an age of the universe from this mathematical model, we'll consider the simplest (non-trivial!) model.

First we take \(k=0\). There is no observational evidence that this is wrong, and it makes the equations simpler.

Next, we take \(\Lambda =0\). This isn't quite right, but we're going to be pushing the evolution backwards, which will mean that \(\rho\) will be increasing, so the contribution from \(\Lambda\) will become less important rather than more important as we push further. With luck this won't be too much of a problem.

Finally, we take \(p=0\), i.e. we assume that the universe is full of a fluid that does not resist compression. Since there is a lot of space between galaxies, and they aren't moving very fast, this seems plausible too. In this context such a fluid is called dust.

Now, with all these simplifications, our differential equations reduce to \[ \left( \frac{\dot{a}}{a} \right)^2 = \frac{8\pi G}{3}\rho, \qquad \left(\frac{\ddot{a}}{a}\right) = -\frac{4 \pi G}{3} \rho \] so we immediately get \[ \left( \frac{\dot{a}}{a} \right)^2 = -2 \left(\frac{\ddot{a}}{a}\right) \] or \[ \dot{a}^2 = -2a \ddot{a}. \]

Faced with a differential equation like this - nonlinear, and not one I've seen a special method to solve - I resort to guessing. It looks as if taking \(a(t)\) to be some power of \(t\) might work, since the powers on each side of that equation would match, so I guess that \(a(t)\) is proportional to \(t^\alpha\), and try to find a value of \(\alpha\) that works.

Plugging this into the equation gives \(\alpha^2=-2\alpha(\alpha-1)\), or \(\alpha(3\alpha-2)=0\). The \(\alpha=0\) solution is not interesting: it would mean that \(a(t)\) was constant, so the \(\alpha = 2/3\) solution is the relevant one.

This gives us \[ a(t) = a_0\left( \frac{t}{t_0} \right)^{\frac{2}{3}} \]

Now we're getting somewhere. We can see that this is fine for any value of \(t>0\), but when \(t=0\) it all goes to pieces. \(a(0)=0\), and if you look at the first equation you'll see that the expression for \(\rho\) involves a division by \(0\), which is never a good sign.

But let's gloss over that for the moment. What I'd like to know is the value of \(t_0\), because that's how long the universe (according to this model) has been around.

We can do something a little bit cunning here, and compute \(H(t)=\dot{a}/a\). This gives us \[ H(t) = \frac{2}{3t} \] or, more to the point, \[ t_0 = \frac{2}{3H_0} \] and \(H_0\) is something that we can try to find from observation. Admittedly, the measurements are quite tricky, but at least it's some observational data that tells use the age of the universe.

Well, no.

That doesn't tell us the age of the universe. What it tells us is that if we push the model back in time, the very longest we can push it is this \(t_0\). It doesn't tell us that the universe sprang into existence that long ago and then evolved according to these equations subsequently.

In fact, we can even say that this value can't be quite right.

The Age of the Universe?

If we push this back then eventually we are in a regime where the density is very large. At this point, it is almost certainly a bad approximation to assume that \(p=0\), and we need to use a more accurate model that takes better care of how pressure and density are related in some kind of hot plasma.

Well, of course, we have reasonable models for this. We plug in the appropriate equation of state, and work out how old the universe is given that for a sufficiently early 'now' that equation of state is relevant.

But then the problem recurses. For sufficiently early times, the conditions become so extreme that we really don't know how to model the matter.

And of course, in all the above I was working with various simplifications. We need to work with rather more complicated evolution equations to do something more realistic.

And when we do all that, we find that the 'age of the universe' is a little under fourteen billion years.

But this age is not really the age of the universe. It is how long it has been since the start of the time when we have some physical understanding of how the stuff filling the universe behaves.

Before that? Since we don't know how the stuff behaves, and it is very likely that classical physics will stop being the relevant model in any case, it's hard to push earlier.

There are lots of speculative theories, some more fun and some more plausible than others. But we certainly can't deduce from what we know that the universe itself has only existed for this fourteen billion years, only that our ability to describe it fails when we try to get further back than that.

The Age of the Universe As We Know It

This is the bottom line. When we talk about the age of the universe in cosmology, we aren't really talking about the age of the universe as such. We're talking about the length of time for which we have some understanding of what is going on in the universe, i.e the time since which content of the universe was something we have reasonable models for. The Big Bang is an idealization of the time at which the universe became accessible to our theories and models.

Before that, it may have spent an undetermined, or even infinite, length of time full of some kind of very high energy quantum field which underwent a phase transition to the hot dense plasma that looks to us like a Big Bang beginning to the universe. Or maybe something even weirder, such as a state in which the notion of time itself doesn't make sense, so that the 'age of the universe' is the length of time for which time has been a useful notion. There are various notions out there of what might have preceded the Big Bang, but by the very nature of things, they tend to be relatively unconstrained by observations.

Addendum

Nalini Joshi (@monsoon0) has pointed out that \[ \dot{a}^2 = -2a \ddot{a} \] can be solved without resorting to guessing the right answer, by using separation of variables (twice). Dividing both sides by \(a \dot{a}\) gives \[ \frac{\dot{a}}{a} = -2 \frac{\ddot{a}}{\dot{a}} \] which can be integrated immediately to give \[ \ln(a) = -2 \ln(\dot{a}) + C_0 \] so that \(a = C_1\dot{a}^{-2}\).

This rearranges to \[ \dot{a}^2 = \frac{C_1}{a} \] and so \[ \dot{a}=\frac{C_2}{\sqrt{a}} \] which can then be separated to give \[ \sqrt{a}da = C_2 dt \] Integrating this (and choosing the constant of integration so that \(a(0)=0\)) yields the solution given above.

Saturday, 3 March 2018

A Regiment of Monstrous Functions

How badly can a function \(f:\mathbb{R} \to \mathbb{R}\) behave? In fact, remarkably badly. Today, I will mostly be posting a list of functions which (to my taste) exhibit various forms of bizarre behaviour.

Not continuous at a point

The simplest type of pathological function is probably a function which is well-behaved, except for having a jump discontinuity. An example of such a function is given by \[ f(x) = \left\{ \begin{array}{ccc} x &\text{if} & x \le 0 \\ x+1 & \text{if} & x\gt 0 \end{array} \right. \] What makes this function misbehave is essentially that it has two different definitions, applicable on different parts of the domain, and which somehow disagree where the different parts meet. This suggests a strategy for constructing 'monsters' by appropriate choices of how to split up the domain and how to define the function on the different parts.

Nowhere continuous

We can take this notion of making a function discontinuous at one point and push it to the limit by splitting up the domain into two sets, each of which is dense. One familiar way of making such a split is to consider the rational numbers (\(\mathbb{Q})\) and the irrational numbers (\(\mathbb{R} \setminus \mathbb{Q}\)). Then if \(x \in \mathbb{R}\), no matter how small a number \(\varepsilon \ge 0\) is chosen, the interval \((x-\varepsilon,x+\varepsilon)\) contains both rational and irrational numbers.

So let's now look at the function \[ f(x) = \left\{ \begin{array}{rcc} 0 &\text{if} & x \in \mathbb{Q} \\ 1 & \text{if} & x \in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] If we choose some real number \(x\), then if \(x\) is rational, so \(f(x)=0\), we can find irrational numbers as close as we like to \(x\), so that arbitrarily close to \(x\) are values on which \(f\) takes on the value \(1\); likewise if \(x\) is irrational, so \(f(x)=1\), we can find rational numebrs as close as we like to \(x\), so that arbitrarily close to \(x\) are values on which \(f\) takes on the value \(0\).

It follows then that no matter what value of \(x\) we choose, \(f\) is not continuous.

We can sort of visualize the graph of this \(f\) as a pair of dotted lines, one at height \(0\) (for the rational values of \(x\)) and one at height \(1\) (for the irrational values). It's then pretty plausible than changing the value of \(x\) by any amount at all involves jumping up and down between the two lines, so the function is not continuous anywhere.

This rather went from the sublime to the ridiculous in one jump. But I think that the next case is still stranger.

Continuous at just one point

This time we have \[ f(x) = \left\{ \begin{array}{rcl} 0 &\text{if} & x \in \mathbb{Q} \\ x & \text{if} & x \in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] By the same argument as above, this function is not continuous at any value of \(x\) other than \(0\).

But at \(0\), the story is different.

This time, if we chose a value of \(x\) close to zero, the value of \(f(x)\) is either \(0\), if \(x\) is rational, or \(x\) which is small, if \(x\) is irrational. In either case, we can make \(f(x)\) as close as we want to \(0\) by taking \(x\) sufficiently small: so \(f\) is continuous at just this one point.

In much the same way as before, we can try to visualize the graph as a pair of lines again, this time at height \(0\) for rational values, and at height \(x\) for irrational ones. Away from zero we have the same situation as before, but at zero, where the graphs 'intersect', the two definitions (almost) agree, and so the function is continuous.

So we can have a function that is continuous everywhere except at one point, or just at one point. It's not hard to see we can have as many discontinuities as we want. But now it's time for a really weird function.

Continuous on just the irrationals

This one is hard to believe. First, though, agree that if \(x\) is a rational number, then we express it as the ratio of two integers \(m/n\) where \(n>0\) and \(\gcd(m,n)=1\). We then define \[ f(x) = \left\{ \begin{array}{rcl} 1/n &\text{if} & x \in \mathbb{Q} \\ 0 & \text{if} & x \in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] So if \(x\) is a rational number, then \(f(x)\) is non-zero, but arbitrarily close to \(x\) are irrational numbers, on which \(f\) takes the value \(0\). So \(f\) is not continuous at \(x\).

On the other hand, if \(x\) is not rational, then no matter how large an integer \(N\) we choose, there is an interval about \(x\) so small that no rational number with denominator less than \(N\) lies inside that interval. So any number inside the interval is either irrational, so \(f\) takes on the value \(0\), or is rational, so the denominator must exceed \(N\), and the value is less than \(1/N\).

So \(f\) is continuous at this \(x\).

This is very hard to comprehend. I can try to think of a graph analogous to the other two cases, but when I think about it carefully I realize I'm kidding myself.

Continuous on just the rationals

You might expect (I certainly expected) that it would be possible to do this by a suitable clever adaptation of the previous example. It turns out that that doesn't work. In fact, nothing works. It isn't possible for a function to be continuous on just the rationals.

More than anything else, this makes me realize that I don't really understand just how the previous example works.

This pretty much wraps it up for the continuity monsters. But there are other monsters to admire.

Differentiable, but the derivative is not continuous

This is subtler than you might expect. You can't get it by integrating a function with a discontinuity, because the result isn't differentiable. So consider \[ f(x) = \left\{ \begin{array}{ccl} x^2\sin(1/x) &\text{if} & x \neq 0 \\ 0 & \text{if} & x =0 \end{array} \right. \] Then as long as \(x \neq 0\), the usual rules of differentiation give us \[ f'(x) = 2x \sin(1/x) - \cos(1/x) \] But as \(x\) approaches \(0\), this is not at all well-behaved. It just oscillates faster and faster between (approximately) \(\pm 1\).

On the other hand, if \(h \neq 0\) then \[ \begin{split} \frac{f(h+0)-f(0)}{h} &= \frac{h^2\sin(1/h)}{h}\\ &= h \sin(1/h) \end{split} \] so as \(h \to 0\), since \(\sin(1/h\) is bounded above by \(1\) and below by \(-1\), this also tends to \(0\), so that \(f'(0)=0\).

So it is possible for a derivative to be discontinuous, but not because of a jump discontinuity.

We can fairly obviously make this work for higher and higher derivatives by using higher powers of \(x\). That isn't very interesting. Let's look at something a bit weirder.

Differentiable at just one point

In fact, this will be even stranger than it sounds. We can have a function that is differentiable at just one point, but not even continuous anywhere else. \[ f(x) = \left\{ \begin{array}{rcl} x^2 &\text{if} & x \in \mathbb{Q} \\ 0 & \text{if} & x \in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] Again, by just the same argument as before, this function is not continuous at any non-zero value of \(x\).

But (rather like the above example) if \(h \neq 0\), we have \[ \frac{f(0+h)-f(0)}{h} = \left\{ \begin{array}{rcl} h &\text{if} & h \in \mathbb{Q} \\ 0 & \text{if} & h \in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] so again we see that as \(h \to 0\), the limit is \(0\), so \(f'(0)=0\).

And again, we can make this as differentiable as we want at \(0\) by using higher powers of \(x\).

It's probably worth saying out loud that we can't do this by working out derivatives of derivatives, since the derivative only exists at \(0\). Instead, we say that \(f\) is differentiable \(n\) times at \(x\) if there exist numbers \(f^{(i)}(x)\) for \(i=1,\ldots n\) with \[ f(x+h) = f(x) + f'(x)h + \ldots + \frac{1}{n!}f^{(n)}(x) + e \] where \(e/h^n \to 0\) as \(h \to 0\).

Now, these are all interesting in their own right, just as examples of what can happen. Let's finish with an example which is interesting and useful.

Infinitely differentiable but not analytic

We finish off with \[ f(x) = \left\{ \begin{array}{ccl} \exp(-1/x) &\text{if} & x \gt 0 \\ 0 & \text{if} & x \le 0 \end{array} \right. \] A straightforward calculation shows that this function has derivatives of all orders at \(0\), and the derivatives are all \(0\). So this function is infinitely differentiable, but since its power series is just \(0\), it is not analytic.

Using this, we can build a new function: \[ g(x) = \frac{f(x)}{f(x)+f(1-x)} \] This function is \(0\) for negative values of \(x\), \(1\) for values of \(x > 1\), and smoothly interpolates between them. Functions built out of these are important in differential geometry and differential topology, to patch together smoothly objects which are defined locally. It is an argument using such functions that proves, for example, that any smooth manifold admits a Riemannian metric.

You missed a bit!

There is one very well-known type pathological function which I haven't mentioned: the functions which are continuous but nowhere differentiable, first described by Karl Weierstrass. The reason is that I wanted to consider functions where there is an explicit (and simple) formula for the value of \(f(x)\), so that the properties could be seen directly. I've neglected the continuous but not differentiable functions although they are also interesting, because they are defined by means of limits rather than a simple formula.

Enough is enough

Most of these functions are, basically, a freak show. We look at them to see objects with bizarre or surprising properties; and some actually have practical uses. But even those without obvious application are important. They help to find weaknesses in our understanding of basic concepts, and thereby to sharpen our understanding of them.

Also, they're fun.

Comment

It has been pointed out in the comments below that technically these functions are not freaks. In fact, amongst functions, the continuous functions are the freaks (i.e. the rarities), and amongst continuous functions, the differentiable ones are the freaks. But in terms of the kind of functions that students meet in practice, where (at worst piecewise) analytic is standard, these functions are still the monsters.

Acknowledgement

I probably wouldn't have written this if @panlepan hadn't tweeted about continuous but non-differentiable functions.
Thanks to Manley Perkel for typo spotting.
thanks to delio for his comments on what is (and is not) a freak.