Tuesday, 20 August 2019

Second derivative versus differentiating twice.

How should we think of the second derivative, and what does it tell us?

The Derivative

Before worrying about higher derivatives, let's remember what the first derivative is and does.

We say that a function \(f:I \to \mathbb{R}\) (where \(I\) is some open interval in \(\mathbb{R}\)) is differentiable at \(x \in I\), with derivative \(f'(x)\) if \[ \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} = f'(x). \]

Why do I say \(I\) is an open interval?

It's just to make sure that if \(x \in I\), then for \(h\) sufficiently small, \(x+h\) is also in \(I\). You can also work with the endpoint of an interval, but then you have to worry about one-sided limits, and it's a bit messier. Open intervals are more convenient.

With all this in hand, it's not too hard to show that a function \(f\) is differentiable at \(x\), with derivative \(f'(x)\) if (and only if) \[ f(x+h)=f(x)+hf'(x)+o(h) \] where \(o(h)\) is an error term with the property that \[ \lim_{h \to 0} \frac{o(h)}{h} = 0 \] so that the derivative is the best linear approximation to changes in the values of \(f\), if one exists.

I claimed in an earlier post this this best linear approximation is also the best way to think about the derivative.

The second derivative

So, suppose we have a function \(f:I \to \mathbb{R}\), and it's differentiable at each \(x \in I\). Then we have a new function, \(f'\), given by \(f':I \to \mathbb{R}\), which gives the derivative of \(f\) at each \(x \in I\).

Now, there's no particular reason for \(f'\) to be differentiable. In fact, there's no obvious reason for it even to be continuous, and in fact it doesn't have to be. But it can be, and that's the case we'll think about now.

A good trick is worth repeating. So, given \(f':I \to \mathbb{R}\), it is differentiable at \(x \in I\) with derivative \(f''(x)\) if \[ f''(x) = \lim_{h \to 0} \frac{f'(x+h)-f'(x)}{h} \] and we call this the second derivative of \(f\) at \(x\).

And then it turns out that when this happens, we have \[ f(x+h)=f(x)+hf'(x)+\frac{h^2}{2}f''(x)+o(h^2) \] where \(o(h^2)\) is now an error term which has the property that \[ \lim_{h \to 0}\frac{o(h^2)}{h^2}=0. \]

In other words, the first and second derivative between them give the best quadratic approximation to changes if \(f(x)\). (This idea, and repeating it with more derivatives and higher order polynomial approximations leads to the idea of jets, about which I will say no more: follow the link if you are intrigued.)

This is very nice. We can now think of the second derivative as the correction which gives us a best quadratic approximation to the values of \(f(x)\), and use this best quadratic approximation as an alternative definition: \(f:I \to \mathbb{R}\) is twice differentiable at \(x\) with first and second derivatives \(f'(x)\) and \(f''(x)\) if \[ f(x+h)=f(x)+hf'(x)+\frac{h^2}{2}f''(x)+o(h^2) \]

Except that we can't. Unlike the case of the first derivative, this does not characterize second derivatives. If a function has a second derivative, then we get a best quadratic approximation. But this time we can't make the reverse argument: the existence of a best quadratic approximation does not imply that a function can be differentiated twice.

Monstrous Counter-examples

There's a useful function (at least, useful if you're looking for counter-examples to do with continuity) defined on \(\mathbb{R}\) as follows: \[ f(x) = \left\{ \begin{array}{ccc} 0 & \mbox{if} & x \in \mathbb{Q}\\ x &\mbox{if} & x\in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] where \(\mathbb{Q}\) denotes the set of rational numbers. This function has the (maybe surprising) property that it is continuous at \(0\), but nowhere else. (If you enjoy pathological function like this, there are more here.)

We can build on this. Let's think about the function \(g\) given by \(g(x)=f(x)^2\), and think about what happens at \(x=0\). We have \[ g(0+h)= \left\{ \begin{array}{ccc} 0 & \mbox{if} & h \in \mathbb{Q}\\ h^2 &\mbox{if} & x\in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \]

Then \[ \begin{split} |g(0+h)-g(0)-h \times 0| &= \left\{ \begin{array}{ccc} 0 & \mbox{if} & h \in \mathbb{Q}\\ h^2 &\mbox{if} & x\in \mathbb{R} \setminus \mathbb{Q} \end{array} \right.\\ & \leq |h^2| \end{split} \] and therefore \[ g(0+h)=g(0)+ h\times 0 +o(h) \] i.e. \(g\) is differentiable at \(0\) with derivative \(0\).

But \(g\) is not even continuous at other values of \(x\). So we have a function which is differentiable at just one point.

Let's run with this ball.

Consider the function \(h\) defined by \(h(x)=f(x)^3\).

Again, this function is continuous at \(0\) but nowhere else. Since it isn't continuous away from \(0\), it certainly isn't differentiable in any neighbourhood of \(0\), so it can't be differentiated twice: a differentiable function must be continuous.

But on the other hand, \[ \begin{split} |g(0+h) - g(0) - h \times 0 - \frac{h^2}{2} \times 0| &= \left\{ \begin{array}{ccc} 0 & \mbox{if} & h \in \mathbb{Q}\\ |h^3| &\mbox{if} & x\in \mathbb{R} \setminus \mathbb{Q} \end{array} \right.\\ & \leq |h^3| \end{split} \] so that \[ g(0+h) = g(0) + h \times 0 + \frac{h^2}{2} \times 0 + o(h^2) \] and so \(g\) does have a best quadratic approximation at \(0\), even though it is only differentiable once there.


So, what should we make of this?

We know that a best linear approximation really is a first derivative; being linearly approximable is equivalent to being differentiable.

But this doesn't work for higher degree approximations. A function may be quadratically approximable without being twice differentiable. The two notions are now separate.

We might say that the function has a second derivative, even though it cannot be differentiated twice, but that would probably lead to unnecessary confusion: it's better to accept that the relationship between approximations and derivatives doesn't actually hold for higher derivatives in the same way as it does for the first derivative.

You might think that this doesn't matter much in practice, since the functions we actually use don't have these weird continuity or differentiability properties. I'd be hard put to argue with that.

Nevertheless I think it's an instructive example of something you might reasonably expect to be true turning out not to be, and of the kind of weirdness that you can find when you allow yourself to consider functions that aren't simple combinations of analytic ones.

Sunday, 14 July 2019

Infinite arithmetic and the parable of the bingo balls

Hilbert's (or The Infinite) Hotel is a well-known setting for thinking about infinite sets, and in particular for thinking about their sizes. There's a less well-known approach that I think complements it nicely. I don't know who came up with it, so I'll just call it the parable of the bingo balls.

The Setting

Here's the idea. You have two baskets. One of them, basket A, contains a collection of bingo balls, one labelled with each positive integer. The other, basket B, is empty.

At 12 noon, you transfer some of the of balls from basket A to basket B: ball \(1\) up to ball \(n_1\), where \(n_1 \ge 1\). At 12:30, you remove some number of balls from basket B, and transfer some balls, ball \(n_1+1\) to ball \(n_2\), where \(n_2 \ge n_1+1\) from basket A to basket B. At 12:45, you remove some more balls from basket B, and transfer the next collection of balls, always choosing the ones with the the next available labels, from basket A to basket B. You repeat this process whenever the amount of time until 13:00 is halved. Clearly you need infeasibly fast hands to do this in practice: fortunately, we don't need to worry about this, we're just thinking about an ideal mathematical process.

What is in the baskets at 13:00?

It's not hard to persuade yourself that basket A is now empty. Whatever \(n\) you think of, ball \(n\) was removed on or before the \(n^{\text{th}}\) operation. But what about basket B?

It turns out that this depends on the process.

Basket B is empty

In the simplest case, at the first operation you move ball \(1\) from basket A to basket B, and at each subsequent operation you remove from basket B the ball that just got put in, and transfer the next ball from basket A to basket B.

The after each operation, basket B has one ball in it. Indeed, after any number of operations, basket B has one ball in it.

You might naively think that since there is one ball in basket B after every operation, there will be one ball in basket B at 13:00. But funny things happen with infinity.

Ball 1 isn't in basket B, because that was removed at the second operation. Ball 2 isn't, because it was removed at the third. Ball 3 isn't, because that was removed at the fourth. And so on. Whatever ball you name, say ball \(n\), it was removed at operation ball \(n+1\). So after we finish, there are no balls at all in basket B.

A bit more surprising is the fact that I can move as many balls as I like from basket A to basket B at each stage and basket B still ends up empty.

For example, suppose I move two balls from A to B at each stage, so in operation \(n\) I move balls \(2n-1\) and \(2n\) from A to B. Then after \(n\) operations, I've moved \(2n\) balls from A to B, and removed \(n-1\) balls, so there are \(n+1\) balls in basket B. The number of balls in basket B is growing larger and larger. Isn't it obvious that there will be infinitely many balls in basket B when I finish?

But no. We can use exactly the same argument as before. Ball \(1\) is removed at the second operation, ball \(2\) at the third, and so on. Whatever ball you think of, it is removed at some stage (though balls with a big value of \(n\) have to lie about in basket B for quite a while before they are removed).

Something funny is happening here. At stage \(n\), there are \(n+1\) balls in basket B, but after infinitely many steps there are none. Am I trying to say that when \(n\) gets really large, \(n+1\) is somehow getting near to 0?

No, I'm not that crazy. What I'm saying is that the number of balls in basket B at the end cannot be thought of as somehow being approached more closely to by the number of balls in basket B after \(n\) operations, as \(n\) gets bigger and bigger.

It's not hard to see that it doesn't matter how many balls I move from basket A to basket B at each stage, eventually all of them get removed from basket B, and basket B ends up empty.

In fact, I can even delay the process of removing balls from basket B as often as I want, or do it as rarely as I want (as long as I don't stop), and the result is the same. If I wait until operation \((n+1)^2\) to remove ball \(n\), then all the balls in basket B are eventually removed, even though I am putting balls in at every operation, and taking out a single ball at larger and larger intervals.

Here's a fun example. At operation \(1\) I move over one ball, then at operation \(2\) I remove one and move over three, and then at operation \(n\) I remove one and I move over \(n+1\) balls. After then \(n^\text{th}\) operation there are \(n\) more balls than there were before. Then after \(n\) operations, there are \(1+2+\ldots+n\) balls in basket B. So it looks like after I'm finished, there ought to be \(1+2+3+\ldots\) balls in basket B, but in fact there are none.

Of course, I'm not trying to say that \[ 1+2+3+\ldots = 0 \] any more than I would try to say that \[ 1+2+3+\ldots = -\frac{1}{12}. \]

You could even try to persuade yourself that this is reasonable: I put infinitely many in, and I take infinitely many out, so there are none left.

But there's more to it than that.

Basket B isn't necessarily empty

I put in a single ball at each operation. But I wait until after \(N+1\) operations before I start removing balls.

I could do this by first removing ball \(1\), then ball \(2\) and so on, and basket B would end up empty.

But I could also do it by first removing ball \(N+1\) in operation \(N+2\), then ball \(N+2\), and so on. When 13:00 rolls along, every ball with a label of \(N+1\) or more is gone, but balls \(1\) to \(N\) are still there.

So I can arrange for basket B to have any number of balls I want in it by choosing my removal strategy appropriately.

In fact, I can even arrange for basket B to have infinitely many balls in it.

One way to go about it is to move ball \(n\) from basket A to basket B at operation \(n\), and then alternate removing nothing from basket B and removing the next even-numbered ball. This leaves all the odd-numbered balls in basket B at the end, which is an infinite collection.

You may feel that's somehow cheating: you're only taking a ball out on every second operation: but I could also just remove the ball with the next number, so on operation \(2n\) I remove ball number \(n\), and that way I'm left with basket B empty.

Alternatively, I could move two balls from basket A to basket B on each operation, and take out one of those balls at the next operation.

So, the net effect of removing an infinite set from an infinite set can be to leave nothing at all, a finite set of any size you like, or an infinite set of the same size as the one you started with.

What does this tell us?

You can extract more than one possible message from this.

One message is that arithmetic involving the infinite has some counter-intuitive properties.

I think this is the commonest point of view: We have a game with all sort of interesting properties to explore, so we do. It's certainly mine.

If you take this point of view, you find that the infinite sets I've been looking at are only one kind: a set whose elements can be matched up with the positive integers is called countable, and what I've been calling infinite is really countably infinite. There are bigger (and bigger and bigger) kinds of infinity to play with, once you've sorted out what you mean by bigger when you're talking about infinite sets.

If this sounds like fun to you, then you might like to check out the properties of cardinal numbers.

Another is that arithmetic involving the infinite has unacceptable properties, so you reject it as a part of mathematics.

Some people take this point of view very seriously. They will accept the notion of integers of any size, no matter how big, but regard it as illegitimate to talk about a set of all integers, because that would have infinitely many elements, and they find the price too bit to pay.

If that sounds like you, or like you want to be, you might find it interesting to check out finitism .


Thanks to Adam Atkinson for pointing out some ambiguity in the first version.

Saturday, 25 May 2019

A second look at complex differentiation

What does it mean to differentiate a complex function?

If we have a complex function, \[ f:\mathbb{C} \to \mathbb{C}, \] then we can try to define a derivative by analogy with the real version: we say that \(f\) has derivative \(f'(z)\) at \(z\) if \[ \lim_{h \to 0} \frac{f(z+h)-f(z)}{h} = f'(z). \]

The catch is that now \(h\) is a complex number. For this to make sense, it must give the same answer no matter how \(h \to 0\). The usual thing is then to insist that you get the same answer if \(h\) is pure real or pure imaginary, and find some constraints that have to be satisfied: these are the Cauchy-Riemann equations.

If you're of a suspicious turn of mind, you might ask why it's enough for it to work for \(h\) pure real or pure imaginary: that makes \(h\) approach \(0\) along one of two rather special lines. What if \(h\) doesn't do that? Also, this argument shows that if \(f\) is complex differentiable, it must satisfy the Cauchy-Riemann equations: but it doesn't tell us that if the Cauchy-Riemann equations are satisfied, then \(f\) must actuallly be complex differentiable.

But there's another approach, which I think has a few advantages.

  • It's based on a general definition of derivative, rather than guessing something that 'looks like' the real (variable, that is) derivative.
  • It doesn't relay on \(h\) approaching \(0\) in any particular way.
  • It shows that the Cauchy-Riemann equations are necessary and sufficient for complex differentiability.
  • I just think it's prettier.

So, let's recall what the derivative really is: if you have a function \(\pmb{f}:\mathbb{R}^n \to \mathbb{R}^m\), so\(\pmb{f}\) is a vector with components \(f^1 \ldots f^m\), then \(\pmb{f}\) is differentiable at \(\pmb{x}\) with derivative \(D\pmb{f}(x)\) (an \(m \times n\) matrix) if \[ \pmb{f}(\pmb{x}+\pmb{h}) = \pmb{f}(\pmb{x})+D\pmb{f}(\pmb{x})\pmb{h}+o(\|\pmb{h}\|). \]

Then the \(i,j\) element of \(D\pmb{f}(x)\) is \(\partial f^i/\partial x^j\).

In the case where \(\pmb{f}:\mathbb{R}^2 \to \mathbb{R}^2\), we can be much more explicit than that. We write \[ \pmb{f}(\pmb{x} ) = \left( \begin{array} {c} u(x,y)\\ v(x,y) \end{array} \right) \] where \[ \pmb{x} = \left( \begin{array} {c} x\\ y \end{array} \right) \] and then \[ D\pmb{f} = \left( \begin{array}{cc} \frac{\partial u}{\partial x} & \frac{\partial u}{\partial y}\\ \frac{\partial v}{\partial x} & \frac{\partial v}{\partial y} \end{array} \right) \] has the property that \[ \begin{split} \left( \begin{array}{c} u(x+h_x,y+h_y)\\ v(x+h_x,y+h_y) \end{array} \right) &= \left( \begin{array}{c} u(x,y)\\ v(x,y) \end{array} \right) + \left( \begin{array}{cc} \frac{\partial u}{\partial x} & \frac{\partial u}{\partial y}\\ \frac{\partial v}{\partial x} & \frac{\partial v}{\partial y} \end{array} \right) \left( \begin{array}{c} h_x\\ h_y \end{array} \right) +o(\sqrt{h_x^2+h_y^2})\\ &= \left( \begin{array}{c} u(x,y)\\ v(x,y) \end{array} \right) + \left( \begin{array}{c} \frac{\partial u}{\partial x}h_x + \frac{\partial u}{\partial y} h_y\\ \frac{\partial v}{\partial x}h_x + \frac{\partial v}{\partial y} h_y \end{array} \right) +o(\sqrt{h_x^2+h_y^2}). \end{split} \]

But now we can define the complex number \(z=x+iy\), and the complex function \(f:\mathbb{C} \to \mathbb{C}\) by \[ f(x+iy) = u(x,y)+iv(x,y). \] So \(f\) will be differentiable at \(z\) with derivative \(f'(z)\) if \[ f(z+h)=f(z)+f'(z)h+o(|h|) = f(z)+f'(z)h + o(\sqrt{h_x^2+h_y^2}) \] where \(h=h_x+ih_y\).

So what is this \(f'(z)\)?

Let's call it \(d_x+id_y\), so that \(d_x\) and \(d_y\) are the real and imaginary parts of \(f'(z)\) respectively. Then we have \[ \begin{split} f'(z)h &= (d_x+idy)(h_x+ih_y) \\ &= d_x h_x-d_y h_y + i(d_y h_x + d_x h_y). \end{split} \]

So now we have two different ways of describing the same function: once as a function \(\mathbb{R}^2 \to \mathbb{R^2}\), and once as a function \(\mathbb{C} \to \mathbb{C}\). But it's the same function, both times, so the derivative must really be the same. In other words, since the expressions must match for all possible \(h_x\) and \(h_y\), \(\pmb{f}\) can be thought of as a complex differentiable function if and only if \[ \begin{split} \frac{\partial u}{\partial x}h_x + \frac{\partial u}{\partial y} h_y &= d_x h_x-d_y h_y \\ \frac{\partial v}{\partial x}h_x + \frac{\partial v}{\partial y} h_y &= d_y h_x + d_x h_y. \end{split} \]

Matching these up, then, we have the conditions \[ \begin{split} d_x &= \frac{\partial u}{\partial x} = \frac{\partial v}{\partial y}\\ d_y &= \frac{\partial v}{\partial x} = -\frac{\partial u}{\partial y}. \end{split} \]

And putting it all together, we see that the real function given by \((x,y) \to (u(x,y),v(x,y))\) can be thought of as a complex differentiable function \(x+iy \to u(x,y)+iv(x,y)\) if and only if \(u\) and \(v\) satisfy the Cauchy-Riemann equations \[ \begin{split} \frac{\partial u}{\partial x} &= \frac{\partial v}{\partial y}\\ \frac{\partial v}{\partial x} &= -\frac{\partial u}{\partial y}. \end{split} \]

I, at least, find this more satisfying.

Monday, 29 April 2019

Cauchy and the (other) mean value theorem.

I just realised that there's a neat relationship between Cauchy's integral formula and the mean value theorem for harmonic functions of two variables, so for the benefit of anybody else who hasn't already noticed it, here it is.

First, let's remember what the Cauchy integral formula tells us.

If \(f\) is a holomorphic function on a simply connected domain, \(D\), \(a \in D\), and \(C\) is a simple closed contour in \(D\) surrounding \(a\), then \[ f(a) = \frac{1}{2\pi i} \oint_C \frac{f(z)}{z-a} dz. \]

I confess, this always seems like magic to me.

One way of seeing that it is true is to note that for a holomorphic function continuous deformation of the countour doesn't change the integral: therefore we can take \(C\) to be an extremely small circle centred on \(a\), so that \(f(z)\) is very close to \(f(a)\) all around the contour. Then if we parameterise \(C\) as \(a+\varepsilon e^{i \theta}\), where \(0 \leq \theta < 2\pi\), the integral is approximately \[ \frac{f(a)}{2 \pi i} \oint_C \frac{1}{z-a} dz = \frac{f(a)}{2 \pi i} \int_0^{2 \pi} i d\theta = f(a) \] and the approximation can be made arbitrarily good by choosing \(\varepsilon\) small enough.

And even with that, it still seems like magic.

Now, let's take a look at this in terms of real and imaginary parts. We can think of the complex number \(z\) as \(x+iy\), and then thinking about \(f\) as a function of \(x\) and \(y\). As is customary, I write: \[ f(z) = u(x,y)+iv(x,y) \] where \(f\) being holomorphic tells us that \(u\) and \(v\) satisfy the Cauchy-Riemann equations, \[ \frac{\partial u}{\partial x} = \frac{\partial v}{\partial y}, \qquad \frac{\partial u}{\partial y} = - \frac{\partial v}{\partial x} \] and as a consequence \[ \frac{\partial^2 u}{\partial x^2}+\frac{\partial^2 u}{\partial y^2} = 0 = \frac{\partial^2 v}{\partial x^2}+\frac{\partial^2 v}{\partial y^2} \] so that both \(u\) and \(v\) are harmonic functions.

It's less obvious, but equally true that if \(u\) is a harmonic function of \(x\) and \(y\), then there is another harmonic function \(v\) such that \[ f(z)=f(x+iy) = u(x,y) + iv(x,y) \] is holomorphic.

So now let's see what the Cuachy integral formula says, in terms of \(u\) and \(v\).

To this end, let \(C\) be a circular contour of radius \(R\), centred on \(a\), and parameterised by \(\theta\) as \[ a+Re^{i\theta}, \qquad 0 \leq \theta < 2\pi \]

Then we have \[ \begin{split} \frac{1}{2 \pi i} \oint_C \frac{f(z)}{z-a}dz &= \frac{1}{2 \pi i} \int_0^{2 \pi} \frac{f(z(\theta))}{Re^{i\theta}} iRe^{i\theta} d\theta \\ &= \frac{1}{2\pi} \int_0^{2\pi} u(x(\theta),y(\theta))d\theta + i \frac{1}{2\pi} \int_0^{2\pi} v(x(\theta),y(\theta))d\theta \end{split} \]

So the real part of \(f(a)\) is the mean value of \(u(x,y)\) on a(ny) circle centred on \(a\), and the imaginary part is the mean value of \(v(x,y)\) on a(ny) circle centred on \(a\).

But the real part of \(f(a)\) is just \(u(Re(a),Im(a))\), and similarly for \(v\)!

This result, that the value of \(u(x,y)\) for harmonic \(u\) is the mean value of \(u\) on any circle centred on \((x,y)\), is the mean value theorem for harmonic functions.

We can go backwards: assuming the mean value theorem for harmonic functions, we can deduce the Cauchy integral formula (at least, if we are happy that contour integrals of holomorphic functions are unchanged by a deformation of the contour).

So it turns out that Cauchy's integral formula, a fundamental result in complex analysis, is equivalent to a real analysis result about the properties of harmonic functions.

In retrospect, I suppose it should be obvious in some sense that Cauchy's integral formula relies on the properties of harmonic functions, since holomorphic functions are built out of harmonic ones. But I hadn't put the two ideas together like this before.

Tuesday, 23 April 2019

Trig functions with (almost) no triangles

In this post I'm going to look at what we can say about the trigonometric functions, \(\sin\) and \(\cos\) without actually doing any classical trigonometry, in other words, without using triangles or ratios of sides - if we know enough other maths. Note that this is not a "we should teach it this way" article it's an "isn't it great how mathematics all hangs together" article.


We start off with a little Euclidean geometry, as described in Cartesian coordinates.

Consider the unit circle in the Euclidean plane, centred on the origin, so given by \[ x^2+y^2=1 \]

and define \(\cos(t)\) and \(\sin(t)\) as the \(x\) and \(y\) coordinates of the point that you get to on this circle if you start at the point \((1,0)\) and travel \(t\) units of length (inches if the circle is one inch in radius, meters if it is one meter in radius, and so on) anti-clockwise around the circle.

This is how I define the functions \(\cos\) and \(\sin\); at some point I should make contact with the usual definitions in terms of ratios of side lengths in right-angled triangles. But not yet.

But we can immediately notice that since I've defined these functions as the \(x\) and \(y\) coordinates of points on this circle, we know that \[ \sin^2(t)+\cos^2(t)=1 \] and since the circumference of the circle is \(2\pi\), both functions are periodic with period \(2\pi\).

I can also extend my definition to negative values of \(t\), by thinking of this as distance in the clockwise direction from the same starting point.

Then we also see immediately from the symmetry of the circle that \[ \sin(-t)=-\sin(t), \qquad \cos(-t)=\cos(t) \]


And now a tiny nod towards differential geometry.

Let's think about the circle, where I now parameterise it by the \(t\) I defined above: so we can associate with any value of \(t\), the point \[ \pmb{r}(t) = (\cos(t),\sin(t)) \] (and I commit the usual sin of identifying a point in the plane with its position vector).

So, what can I say about \(\dot{\pmb{r}}\) (where I use the overdot to denote differentiation with respect to \(t\), in the finest Newtonian tradition).

Again, since each point lies on the unit circle, we have \[ \pmb{r}(t).\pmb{r}(t)=1 \] so, differentiating, we have \[ 2\pmb{r}(t).\dot{\pmb{r}}(t)=0 \] and the tangent to the circle at any point is (as we already know from Euclidean geometry) orthogonal to the radius to that point.

From this, we know that \[ \dot{\pmb{r}}(t) = k(-\sin(t),\cos(t)) \] for some \(k\).

We can say more, though. Since \(t\) is distance along the circle, so that for all \(t\) \[ t = \int_{0}^t \|\dot{\pmb{r}}(u) \| du, \] then we must have \[ \| \dot{\pmb{r}}(t) \| = 1. \]

Finally, we note that when \(t=0\), the tangent vector points vertically upwards. This finally leaves us with \[ \dot{\pmb{r}}(t)=(-\sin(t),\cos(t)) \] or, in other words, \[ \frac{d}{dt}\sin(t) = \cos(t), \qquad \frac{d}{dt}\cos(t)=-\sin(t). \]


Now we take a little detour through some real analysis.

For no apparent reason, I introduce two functions defined by power series, \[ \text{SIN}(t) = \sum_{n=0}^\infty (-1)^{n}\frac{t^{2n+1}}{(2n+1)!}, \qquad \text{COS}(t) = \sum_{n=0}^\infty (-1)^{n} \frac{t^{2n}}{(2n)!}. \]

These are new functions, and as far as we know so far, distinct from \(\sin\) and \(\cos\) as I defined them up above. But we'll see in a moment that the choice of names is not a coincidence, and that there is a very close relationship between these power series and the geometrically defined \(\cos\) and \(\sin\).

We can immediately see that \(\text{SIN}(0)=0\) and \(\text{COS}(0)=1\).

But since the two functions are defined by power series, we can differentiate both term by term and find that \[ \frac{d}{dt}\text{SIN}(t) = \text{COS}(t), \qquad \frac{d}{dt}\text{COS}(t)=-\text{SIN}(t). \]

Differentiating again, we find that each of \(\sin\), \(\cos\), \(\text{SIN}\) and \(\text{COS}\) satisfies the ordinary differential equation \[ \ddot{X} = -X. \] Furthermore, both \(\sin\) and \(\text{SIN}\) satisfy the same initial conditions at \(t=0\), namely that the value of the function is \(0\), and its derivative is \(1\); similarly for \(\cos\) and \(\text{COS}\). We therefore deduce that \[ \sin(t)=\text{SIN}(t), \qquad \cos(t)=\text{COS}(t). \]

This gives us two things:

  • An effective way of calculating the functions (at least for small values of \(t\)).
  • Another handle on the functions, which we are just about to use.

Addition Formulae

Finally we make use of just a little complex analysis.

Looking at the power series we now have for \(\sin\) and \(\cos\), and remembering the power series for the exponential function, we have \[ \exp(it)=\cos(t)+i\sin(t). \]

But now we have \[ \begin{split} \cos(s+t)+i\sin(s+t) &= \exp(i(s+t))\\ & = \exp(is)\exp(it)\\ & = \cos(s)\cos(t)-\sin(s)\sin(t)\\ & +i(\sin(s)\cos(t)+\cos(s)\sin(t)) \end{split} \] from which we extract \[ \cos(s+t)=\cos(s)\cos(t)-\sin(s)\sin(t), \qquad \sin(s+t)=\sin(s)\cos(t)+\cos(s)\sin(t). \]

And finally, some triangles

It's time to make contact with the triangle notion of these trigonometric functions, so consider a right angled triangle, and let \(\theta\) be the radian measure of the angle at one of its vertices, \(V\), (not the right angle). Place this triangle in the first quadrant, with the vertex \(V\) at the origin, and the adjacent side along the \(x\)-axis. (This may require a reflection.) Call the vertex on the \(x\) axis \(U\) and the remaining vertex \(W\).

But we can scale the triangle by dividing each side length by the length of the hypotenuse, which doesn't affect any of the ratios of side lengths.

The traditional definition of the sine of angle \(\theta\) is that it is the length of the adjacent side to \(V\) divided by the length of the hypotenuse; similarly \(\cos(\theta)\) is the length of the side opposite to \(V\) divided by the length of the hypotenuse. Then in our rescaled triangle, the vertex \(U'\) corresponding to \(U\) lies on the unit circle, and (by the definition of radian measure of angles) the angle \(\theta\) is the distance around the circle from the point \((1,0)\).

But now, the length of the hypotenuse is \(1\), and so the ratio for the sine of \(\theta\) is just our \(\sin(\theta)\), and similarly for \(\cos(\theta)\).

So, finally, we see that \(\sin\) and \(\cos\) have all the old familiar properties.

I love it when a bunch of different bits of maths come together.


Thanks for @ColinTheMathmo for some useful comments and corrections.

Thursday, 21 March 2019

Solar garbage disposal.

Why can't we drop our dangerous waste into the sun?

It's tempting to consider getting rid of dangerous (or undesirable) waste by shooting it into space and letting it fall into the sun; after all, it seems pretty unlikely that we could drop enough of anything into the sun to have an adverse effect on it.

So, just how hard is this?

Let's just start off by thinking about how hard it start off with something in the same orbit as the earth, and knock it out of that orbit into whose closest approach to the sun actually takes it inside the sun. We can worry later about getting it into this initial orbit.

Now, an orbit consists of an ellipse with one focus at the sun. Let's call the distance from the (centre of the) sun at perihelion (the closest point) \(r_1\), and the distance at aphelion (the farthest point) \(r_2\), and the orbital speeds at these positions \(v_1\) and \(v_2\) respectively. Then we have the (badly drawn) situation below:

Perihelion and aphelion convenient points to think about, because the orbital velocities at these points are perpendicular to the lines joining them to the focus. This makes it particularly easy to calculate the angular momentum there.

It's really quite hard to describe orbits properly: finding the position of an orbiting body as a function of time cannot be done in terms of the usual standard functions. However, we can avoid a lot of unpleasantness by being clever with two conserved quantities, the energy and angular momentum of the orbiting body. Since these are both conserved, they always have the value they have at perihelion.

At perihelion, then, the body has angular momentum and energy given by \[ \begin{split} L &= mv_1 r_1\\ E &= \frac{1}{2}mv_1^2 - \frac{GMm}{r_1} \end{split} \] where \(G \approx 6.67 \times 10^{-11}\text{m}^{3}\text{kg}^{-1}\text{s}^{-2}\) is Newton's gravitational constant, and \(M\) is the mass of the gravitating body (which for us will be the sun).

To reduce the algebra a little, we'll take advantage of the fact that everything is a multiple of \(m\), and think of the angular momentum and energy per unit of mass (i.e. per kilogram). That means we think about \[ \begin{split} l &= v_1 r_1\\ e &= \frac{1}{2}v_1^2 - \frac{GM}{r_1} \end{split} \]

A useful observation is that for a closed orbit, we must have \(e<0\); we'll save this fact up for later.

And now we'll do the sneaky thing of finding what the perihelion distance is for an orbit of a given angular momentum and energy.

The idea is simple enough: use the first equation to express \(v_1\) in terms of \(l\), then substitute that into the second equation to get a relationship between \(l\), \(e\) and \(r_1\).

This gives \[ e= \frac{l^2}{2r_1^2} - \frac{GM}{r_1} \] so that \[ r_1^2 +\frac{GMr_1}{e}- \frac{l^2}{2e} = 0 \]

So the solutions are given by \[ \begin{split} r_1 &= \frac{1}{2}\left( -\frac{GM}{e} \pm \sqrt{\frac{G^2M^2}{e^2} +2\frac{l^2}{e} }\right)\\ &= -\frac{GM}{2e} \left(1 \pm \sqrt{1+\frac{2l^2e}{G^2M^2}} \right) \end{split} \]

There are, of course, two roots to this quadratic: they are \(r_1\) and \(r_2\), the aphelion and perihelion values. (We could have gone through just the same process expressing \(l\) and \(e\) in terms of \(r_2\), and obtained the same quadratic.)

We should pay attention to signs here: we are taking a factor of \(|GM/e|\) out of the square root, but since \(e<0\), as mentioned above, this is \(-GM/e\).

Sanity check: for a circular orbit, the radius \(r\) is constant, and from Newton's law we obtain \(v^2/r = GM/r^2\). From this we find \[ e=-\frac{GM}{2r} \] and \[ l^2=v^2r^2 = GMr \] so that \[ \frac{2l^2e}{G^2M^2}=-1 \] and \(r_1=r_2\), i.e. the orbit is indeed circular.

Since \(r<0\), the square root term is smaller than \(1\), and we want the perihelion value, so we have \[ r_1 = -\frac{GM}{2e} \left(1 - \sqrt{1+\frac{2l^2e}{G^2M^2}} \right) \]

Now, let's see what the situation is for starting off with a body in the same orbit as the earth, and trying to adjust it so that it hits the sun. The earth's orbit is very nearly circular, so we'll just assume it is exactly circular, with constant radius and orbital speed.

By the magic of Google (or looking in a book), we find that the mass of the sun is\(M=1.99\times 10^{30}\text{kg}\), the radius of the sun is \(6.96 \times 10^{8}\text{m}\), the orbital radius of the earth is \(1.5 \times 10^{11} \text{m}\) and the orbital speed is \(2.98 \times 10^{4}\text{ms}^{-1}\).

In the diagram below, if the earth's orbit is the black one, we want to divert it to the red orbit by reducing the orbital speed from \(v\) to a smaller value \(u\), so that perihelion would occur inside the sun.

Now, we can set \(r_1 = 6.96 \times 10^{8}\text{m}\), and, noting that \(e\) and \(l\) are functions of the orbital speed, solve numerically for the orbital speeds which give a perihelion distance of less than \(6.96 \times 10^{8}\text{m}\). I asked a computer, and it told me that \(u\) must lie in \((-2858,2858)\), so we have to slow the orbit down by a change of speed \(\Delta v\) of at least \(29800-2858\approx 27000 \text{ms}^{-1}\).

For a kilogram of mass, that means we need to use \((\Delta v)^2/2 \approx 360\) MegaJoules to change its speed by enough. This is quite a lot of energy.

There's also the problem of getting the mass out of the earth's gravitational field: this takes about another \(62\) megajoules.

So in total, we need to use about \(420\) megajoules of energy to drop each kilogram of stuff into the sun.

For comparison, it requires about \(15\) megajoules to transport a kilogram of stuff to the ISS, in low earth orbit\(1\). An example from closer to home is that a megajoule is the kinetic energy of a vehicle weighing one tonne, travelling at about \(161\text{km}\text{h}^{-1}\).

So there we have it: we won't be just dropping our waste into the sun any time soon because apart from anything else, it's just so much work.


  1. You might wonder if it's possible to do better by pushing the object at an angle to its original orbit, rather than just by slowing it down. Unfortunately, the option discussed above is the one requiring the least energy.
  2. I've only considered the situation of giving the object an orbit which falls into the sun. Could one do better by arranging one which flies close past another planet and takes advantage of a slingshot orbit? Yes, and that was used to help send probes to Mercury. This can save energy, but it is a delicate manoeuvre; I'll leave you to try to work out if it might make this form of waste disposal more feasible.


Thanks to Colins @icecolbeveridge and @ColinTheMathmo for comments and feedback.

Monday, 11 March 2019

A matter of weight

What do you weigh? More to the point, what does that mean?

What is weight?

What it isn't: mass

Like many people, I find out what I weigh by standing on the bathroom scale. Usually, the answer is approximately 80 kilograms. Or so I try to convince myself.

Well, there's a problem with that. Kilograms is a measure of mass, not weight. But what's mass?

To answer this, we rely on Newton's laws. Mass is resistance to force. If you act on a mass, \(m\), with a force, \(\pmb{F}\), then it has an acceleration, \(\pmb{a}\), and the three are related by \[ \pmb{F} = m \pmb{a}. \]

Actually, things are a bit more complicated than that, but as long as the mass isn't changing, it's OK.

Then if I have a reliable way of exerting a known force, I can compare the masses of two different object by comparing the accelerations imparted to both. And if I have a reference mass (and there is one for the kilogram) then I can work out the mass of anything else I act on with this force in terms of the reference mass.

OK, so that's mass (well, the quick version) sorted out. But what is weight, and what is my bathroom scale really telling me?

What weight is absolutely

The first definition that you are likely to meet in a physics text is that it is the force exerted on a body by gravity: sometimes the actual force (as a vector), but more usually the magnitude of the force.

It is a remarkable fact that the force exerted by gravity on an object is exactly proportional to its mass, so that all objects, whatever their mass, fall with the same acceleration (in the absence of confounding factors like air resistance).

In SI units (i.e. those used by the civilized world), the unit of force is the Newton, and the unit of mass is the kilogram. At the surface of the earth, the force of gravity is approximately \(9.8 \text{N}/\text{kg}\); a mass of 1 kilogram weighs about \(9.8\) Newtons.

We can work with this. If I have access to materials which can be compressed or extended by acting on them with a force, then I can build a machine which measures the (magnitude of the) force I exert on it. And if I know that what people really want to know is their mass, I can calibrate it so that if the force acting on it is the force that gravity exerts on their body, then the scale shows the mass that gravity must be acting on, rather than the actual force.

This is typically how a bathroom scale works.

And is fine and excellent, as long as I know what the force of gravity is where I use the bathroom scale. There is a minor issue caused by the fact that the force of gravity isn't quite the same all over the surface of the earth; but the variation is fairly small (from \(9.78 \text{N}/\text{kg}\) to \(9.83 \text{N}/\text{kg}\), and for everyday purposes we can pretty much ignore this variation.

But now we bump up against a bit of a problem. For the scale to work properly, we have to stand still. You'll have noticed that if you jump up and down on it, the displayed weight oscillates along with you. It's less likely that you've taken it into a lift, but if you have you'll have noticed that when the lift has an upward acceleration you show a higher weight, and if it has downward acceleration you show a lower weight.

The snag is, of course, that to find your weight, the scale has to hold you stationary in the earth's gravitational field, and measure the force it takes to do that.

This raises a slightly awkward question: how do you know when the scale is holding you stationary?

This could well seem like a rather stupid question. All you have to do is look around.

But what if you are in an enclosed room? How could you tell the difference between being (say) on the moon, with a weight of approximately one sixth of your weight on earth, and being in a lift which has a downward acceleration of about five sixths the acceleration due to earth's gravity, or indeed being in a rocket a long way away from any body of mass, accelerating at about one sixth of the acceleration due to earth's gravity?

It turns out, that's really hard to do. You can't do it by making local measurements, which is a heavily coded way of saying that no matter how good your measuring apparatus is, if you work in a sufficiently small laboratory, over a sufficiently small time span, you can't detect the difference. (If you allow non-local measurements, which effectively means that you work in a region big enough that the force of gravity varies appreciably, then you can measure the resulting tidal effects: here is a nice look at tidal effects from the point of view of an orbiting astronaut, from @ColinThe Mathmo.

This is a bit of a blow. We have what looks like a perfectly sensible definition, but it turns out to be hard to use. So we do that standard trick. We replace it by a somewhat different definition, which is easier to use, but doesn't look quite so sensible.

At least, it doesn't look quite to sensible to start with. Once you get used to it, you can convince yourself that it is in fact the sensible definition, and it's the original definition that's misled.

What weight is relatively

So what do we do instead?

We accept that since what we can measure (reasonably) depends on the state of motion of the equipment and body, we build that into the definition.

We now define the weight of an object in a given frame of reference to be the (magnitude of the) force required to hold it stationary in that frame of reference.

So, if your scale says you weigh a particular amount, that's what you weigh: you don't have to know whether you're on the moon, or whether you have to compensate because the scale is accelerating.

This does have it advantages. Principally, it means that you weight what you feel as if you weigh. If you're in a rocket taking off, or visiting a heavy-gravity planet (we can dream) you weigh more if you're standing still on the earth. If you're in a lift plummeting you towards an untimely end, or on the moon, you weigh less than if you're standing still on the earth. And that's all there is to it.

So, to the question I've been avoiding up to this point: is an astronaut in orbit, say on a space station, weightless?

There are two answers to this.

  • No. The force on gravity on him is slightly less than it is on the surface of the earth, so he weighs a little less than he would at home.
  • Yes. If he tries to stand on a scale, it reads zero; it takes no force to keep him stationary in his space-station, so in the frame of reference of the space-station, he is weightless.

And both answers are correct. Whether the astronaut is weightless or not depends on the definition of weight that you prefer.

But in either case, the astronaut is definitely not outside the gravitational influence of the earth; gravity is still pulling on him with a very significant force.

I prefer the second definition, because it's the one that doesn't require you somehow to know the situation outside your laboratory; it only requires you to make local measurements. But that doesn't make it right, only preferable from some points of view.

At least, that's the situation if you live in a universe where gravitation acts according to Newtonian theory. But in fact we don't really, and there's a better way of trying to understand gravity, and then we have a better reason for preferring the second definition.

What weight is relativistically

There's a great mystery in Newtonian mechanics: why is it that the mass which interacts via gravity is exactly the same as (or at least, exactly proportional to) the mass which resists acceleration? Very sensitive experiments have been carried out to test this, and they've always come out unable to detect a difference. They agree to no worse that one part in one hundred billion.

So, why should this be?

The current answer is that there is a better way to think about gravitational influence than as a force. In general relativity, we have a way of relating the distribution of matter to a distortion in space and time, which changes the notion of a straight line: we generalize this to a kind of curve called a geodesic. So a particle moving in a gravitational field is one which travels along a geodesic: there is no force acting on it, it's just (just!) that the nature of space and time mean that it doesn't travel along the same trajectory as it would in the absence of gravity.

Then any particle (with no external forces acting on it) travels along a geodesic which is determined purely by a starting point and a starting velocity; mass does not have a part in the trajectory of a the motion of a particle in a gravitational field. Of course, part of the joy of this is that it does give very nearly the same answer as Newtonian gravity.

Very nearly, but not quite the same as Newtonian gravity. And with careful observation we find that the general relativity version of gravity matches measurements better than the Newtonian one.

But now there is no force of gravity.

If I'm standing on the surface of the earth, the only force acting on me is the force of the earth pushing me off the geodesic that the curvature of space and time wants me to travel along. If I'm on the space station, there is no force acting on me at all.

So in this framework, the only definition of my weight that makes any sense is that it is the magnitude of the force required in some frame of reference to keep me stationary in that frame of reference.

This looks superficially the same as the second definition in Newtonian gravity, but it's really quite different in principle. In the Newtonian case, it's avoiding the issue that there's no easy way to access the actual force of gravity, and settling for what is easy to measure. In the general relativistic case, it's saying that weight is an illusion caused by being in an unnatural frame of reference.

I try to take some consolation from that as my weight seems to gradually, but inexorably, increase.