How should we think of the second derivative, and what does it tell us?

# The Derivative

Before worrying about higher derivatives, let's remember what the first derivative is and does.

We say that a function \(f:I \to \mathbb{R}\) (where \(I\) is some open interval in \(\mathbb{R}\)) is differentiable at \(x \in I\), with derivative \(f'(x)\) if \[ \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} = f'(x). \]

Why do I say \(I\) is an open interval?

It's just to make sure that if \(x \in I\), then for \(h\) sufficiently small, \(x+h\) is also in \(I\). You can also work with the endpoint of an interval, but then you have to worry about one-sided limits, and it's a bit messier. Open intervals are more convenient.

With all this in hand, it's not too hard to show that a function \(f\) is differentiable at \(x\), with derivative \(f'(x)\) if (and only if) \[ f(x+h)=f(x)+hf'(x)+o(h) \] where \(o(h)\) is an error term with the property that \[ \lim_{h \to 0} \frac{o(h)}{h} = 0 \] so that the derivative is the best linear approximation to changes in the values of \(f\), if one exists.

I claimed in an earlier post this this best linear approximation is also the best way to think about the derivative.

# The second derivative

So, suppose we have a function \(f:I \to \mathbb{R}\), and it's differentiable at each \(x \in I\). Then we have a new function, \(f'\), given by \(f':I \to \mathbb{R}\), which gives the derivative of \(f\) at each \(x \in I\).

Now, there's no particular reason for \(f'\) to be differentiable. In fact, there's no obvious reason for it even to be continuous, and in fact it doesn't have to be. But it can be, and that's the case we'll think about now.

A good trick is worth repeating. So, given \(f':I \to \mathbb{R}\), it is differentiable at \(x \in I\) with derivative \(f''(x)\) if \[ f''(x) = \lim_{h \to 0} \frac{f'(x+h)-f'(x)}{h} \] and we call this the second derivative of \(f\) at \(x\).

And then it turns out that when this happens, we have \[ f(x+h)=f(x)+hf'(x)+\frac{h^2}{2}f''(x)+o(h^2) \] where \(o(h^2)\) is now an error term which has the property that \[ \lim_{h \to 0}\frac{o(h^2)}{h^2}=0. \]

In other words, the first and second derivative between them give the best quadratic approximation to changes if \(f(x)\). (This idea, and repeating it with more derivatives and higher order polynomial approximations leads to the idea of jets, about which I will say no more: follow the link if you are intrigued.)

This is very nice. We can now think of the second derivative as the correction which gives us a best quadratic approximation to the values of \(f(x)\), and use this *best quadratic approximation* as an alternative definition: \(f:I \to \mathbb{R}\) is twice differentiable at \(x\) with first and second derivatives \(f'(x)\) and \(f''(x)\) if
\[
f(x+h)=f(x)+hf'(x)+\frac{h^2}{2}f''(x)+o(h^2)
\]

Except that we can't. Unlike the case of the first derivative, this does not characterize second derivatives. If a function has a second derivative, then we get a best quadratic approximation. But this time we can't make the reverse argument: the existence of a best quadratic approximation does not imply that a function can be differentiated twice.

# Monstrous Counter-examples

There's a useful function (at least, useful if you're looking for counter-examples to do with continuity) defined on \(\mathbb{R}\) as follows: \[ f(x) = \left\{ \begin{array}{ccc} 0 & \mbox{if} & x \in \mathbb{Q}\\ x &\mbox{if} & x\in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \] where \(\mathbb{Q}\) denotes the set of rational numbers. This function has the (maybe surprising) property that it is continuous at \(0\), but nowhere else. (If you enjoy pathological function like this, there are more here.)

We can build on this. Let's think about the function \(g\) given by \(g(x)=f(x)^2\), and think about what happens at \(x=0\). We have \[ g(0+h)= \left\{ \begin{array}{ccc} 0 & \mbox{if} & h \in \mathbb{Q}\\ h^2 &\mbox{if} & x\in \mathbb{R} \setminus \mathbb{Q} \end{array} \right. \]

Then \[ \begin{split} |g(0+h)-g(0)-h \times 0| &= \left\{ \begin{array}{ccc} 0 & \mbox{if} & h \in \mathbb{Q}\\ h^2 &\mbox{if} & x\in \mathbb{R} \setminus \mathbb{Q} \end{array} \right.\\ & \leq |h^2| \end{split} \] and therefore \[ g(0+h)=g(0)+ h\times 0 +o(h) \] i.e. \(g\) is differentiable at \(0\) with derivative \(0\).

But \(g\) is not even continuous at other values of \(x\). So we have a function which is differentiable at just one point.

Let's run with this ball.

Consider the function \(h\) defined by \(h(x)=f(x)^3\).

Again, this function is continuous at \(0\) but nowhere else. Since it isn't continuous away from \(0\), it certainly isn't differentiable in any neighbourhood of \(0\), so it can't be differentiated twice: a differentiable function must be continuous.

But on the other hand,
\[
\begin{split}
|g(0+h) - g(0) - h \times 0 - \frac{h^2}{2} \times 0|
&= \left\{ \begin{array}{ccc} 0 & \mbox{if} & h \in \mathbb{Q}\\ |h^3| &\mbox{if} & x\in \mathbb{R} \setminus \mathbb{Q} \end{array} \right.\\
& \leq |h^3|
\end{split}
\]
so that
\[
g(0+h) = g(0) + h \times 0 + \frac{h^2}{2} \times 0 + o(h^2)
\]
and so \(g\) **does** have a best quadratic approximation at \(0\), even though it is only differentiable once there.

# Conclusion

So, what should we make of this?

We know that a best linear approximation really is a first derivative; being *linearly approximable* is equivalent to being differentiable.

But this doesn't work for higher degree approximations. A function may be *quadratically approximable* without being twice differentiable. The two notions are now separate.

We might say that the function has a second derivative, even though it cannot be differentiated twice, but that would probably lead to unnecessary confusion: it's better to accept that the relationship between approximations and derivatives doesn't actually hold for higher derivatives in the same way as it does for the first derivative.

You might think that this doesn't matter much in practice, since the functions we actually use don't have these weird continuity or differentiability properties. I'd be hard put to argue with that.

Nevertheless I think it's an instructive example of something you might reasonably expect to be true turning out not to be, and of the kind of weirdness that you can find when you allow yourself to consider functions that aren't simple combinations of analytic ones.