Friday, 28 April 2017

What Goedel told us (and what he didn't).

Gödel's incompleteness theorem famously tells us that there are true statements about the integers which cannot be proven. This description rather invites the question of how one can know that a statement is true if it cannot be proven. I will try to give a description of Gödel's theorem which, although incomplete, at least tells no significant lies, and tries to make it fairly clear just what the theorem says.

First, I will describe the standard integers, then give a brief description of what we mean by "provable" in this context. Once the scene is set, we are in a position to see what Gödel's incompleteness theorem actually says. Finally, armed with a reasonably accurate statement of Gödel's theorem, we can see some of its consequences; and also understand better what is not a consequence! As a reward to the patient reader (or the clever skimmer) there is also a description of an interesting statement which is true, but not provable.

What are the integers?

In this context, I mean only the non-negative ones, and we all know what they are. They are the objects \(0, 1, 2, \ldots\) and we can add them together and multiply them. There are various rules for how the numbers behave, and a precise description was given by Giuseppe Peano:
We have a set, \(\mathbb{ N}\), a function \(s:\mathbb{N} \rightarrow \mathbb{N}\), a special element \(0 \in \mathbb{N}\), and two operations, \(+\) and \(\times\) (addition and multiplication). These all fit together in the following way, where we use the mathematical symbol \(\Rightarrow\) as a shorthand for "implies".

Given any \(m,n \in \mathbb{N}\),

\(s(n) = s(m) \Rightarrow n = m\)
\(0 \neq s(n)\)
\(n+0 = n\)
\(s(n+m) = n+s(m)\)
\(n \times 0 = 0\)
\(n \times s (m ) = n + n \times m\)
If \(K\) is any subset of \(\mathbb{N}\) such that (i) \(0 \in K\), and (ii) \(n \in K \Rightarrow s(n) \in K\), then \(K\) is in fact all of \(\mathbb{N}\).

The function \(s\) takes each number to the next, its successor (so \(s(n) = n+1\)); and then the first property says that no two different numbers have the same successor, the next says that 0 is not the successor of any number, the following pairs give recursive definitions of addition and multiplication. These just make sure that addiction and multiplication behave as usual. The the last is called the inductive property, and is what makes proof by induction work.

So, these axioms seem to capture what we mean by the integers: but it would be nice also to know that there is such a collection of objects and operations, in other words, to know that the properties we associate with the integers are in fact consistent! Luckily for us, John von Neumann gave a specific example of a set and a successor operation that satisfies all these properties: \(\emptyset\) is the empty set, the successor of any integer is the set of all integers up to and including that one, and the integers comprise all sets made up in this way.

So, \(0 = \emptyset\), \(1 = s(0) = \{0\}\), \(2 = s(1) = \{0,1\}\) and so on. Then it can be shown that the above requirements are categoric, which means that any other set with a successor function satisfying all 7 properties is just the same as the von Neumann one, except that we may have chosen different symbols for the numbers and functions. The integers so specified are what are called the standard integers. The mathematical term for a relationship between two structures which is really just a relabelling is an isomorphism; we say that any two sets with successor functions satisfying the above properties as isomorphic.

What is truth?

Then we can say that a property of numbers is true if it is true for the von Neuman model of the integers. It may be hard to tell, for any given property, whether or not it is true, but I hope we can all agree that the truth or falsity of any statement about these objects is well defined, and our ignorance is just our problem.

Then what about provability? A statement about the integers is provable if there is an argument from an agreed on set of axioms describing them using an agreed set of rules of inference that leads us to the statement. Together with the above properties, we also require a well defined set of rules of how we are allowed to argue about them: this set of rules is called first order predicate calculus. We need not concern ourselves with the exact nature of this set of rules here---it is enough for our purposes to note that such a set of rules exists. Those properties which matter to us will be introduced as necessary.

To rephrase slightly, truth is to some extent an accidental property; you look at the numbers in question and see whether the property holds. Truth is a property of the meaning of the statement, and so is what we call a semantic property. Provability, on the other hand, is structural; it follows explicitly from the grammatical form of the sentence, without reference to the particular objects we are supposed to be arguing about. Since the construction of grammatically acceptable sentences is called syntax in linguistics, we extend the use of this word slightly, and say that provability is a syntactic property.

All this can be encapsulated in the slogan that truth is a semantic issue, while provability is a syntactic one. There is a snag, though. When we consider formal proof, the tool consists of first order predicate calculus; this allows us to quantify over elements of the universe of discourse, and not over sets of elements. Note that property \(7\) requires a property of all subsets of \(\mathbb{N}\); first order predicate calculus, which is basically all that standard logic has, cannot express this. We have to replace that property by another one. One way of doing this is to use \(7'\) below:

\(7'\) If \(P(n)\) is any statement about the integer \(n\) such that (i) \(P(0)\) is true, and (ii) \(P(n) \Rightarrow P(n+1)\) for all \(n \in \mathbb{N}\), then \(P(n)\) is true for all \(n \in \mathbb{N}\).

This replaces the notion of a subset of \(\mathbb{N}\) by a sentence making a statement about an unspecified integer. Now, such a sentence is a statement which can be finitely expressed using the notions of successor, addition, multiplication and logical operations. So, we replace a single axiom referring to all subsets with what is called an axiom scheme, which represents infinitely many axioms, one for each sentence \(P\). But it turns out that \(7'\) is much less powerful than \(7\); there are far more subsets of \(\mathbb{N}\) than there are sentences which can be constructed as described above.

It is then perhaps not surprising that the set of axioms obtained by replacing \(7\) by \(7'\) cannot be used to prove all the true statements about \(\mathbb{N}\). The surprising thing is that we cannot even prove all the true statements that can be expressed using the more restricted language.

We now say that a statement about \(\mathbb{N}\) is a theorem if it is the last in a sequence of statements, each of which is either one of the axioms \(1 \ldots 6\), or an instance of (7'), or an axiom of the predicate calculus, or follows from previous statements in the sequence using the rule of inference modus ponens (which simply states that if we know \(A\) and we know that \(A \Rightarrow B\) , we can deduce \(B\)). The precise nature of the axioms of predicate calculus is irrelevant; you can chase up this detail in any standard logic text if you wish. The theory obtained by using \(1..6, 7'\) with all the apparatus of first order predicate calculus is called first order Peano arithmetic, or PA for short.

The point is that for those who are reluctant to consider semantic arguments, we have syntactic ones which depend only on the structure of the theory, not on the actual objects we are considering. This is regarded as more reliable by logicians. Many of us may find this somewhat perverse, but then, logicians are paid to be perverse.

Gödel's Insight

On with the story. We can think about relations between pairs of integers (such as \(n \gt m\), or \(m=2\times n\)). Any relation will be true for some pairs, and false for others. A relation can be encoded as a subset of the set of all pairs; the relation is true if the pair is in the subset, false otherwise. Again, there are many such relations, and a smaller (though still infinite) family which can be expressed in finite terms using the notions of addition, multiplication, successor, and the basic notions of predicate calculus. These are called expressible relations, or number theoretic relations.

What Gödel did was to cook up a particularly wonderful expressible relation. To see why it is wonderful requires a very brief digression, to explain what a Gödel numbering is.

Any string involving the symbols used in PA can be associated with a unique integer, in an effective way-which means that given any string the integer can be computed in finite time by a mechanical means, and that given any integer it is possible to decode it back to the original string (or to say that it does not correspond to one) again in finite time by a mechanical process. In particular, any proof in PA corresponds to a number, as does any single statement. What Gödel found was a number theoretic relation (so it is just some complicated statement involving successor, addition, multiplication etc, that can be checked by pure arithmetic) with an almost magical property.

He found a function, \(G(m,n)\) which is zero precisely when \(n\) is the Gödel number of a proof of the statement with Gödel number \(m\). Note that this is not the definition of \(G(m,n)\). \(G(m,n)\) is defined purely in terms of arithmetic properties of \(m\) and \(n\). This means that there is some purely arithmetic combination of \(m\) and \(n\) which is zero exactly when the statement encoded by \(n\) is a proof of the statement encoded by \(m\). This combination doesn't know anything about the statements that are encoded via the Gödel numbering---it just works out some combination of sums, products and so on if the two numbers it is fed. But we, knowing about Gödel numbering, can interpret it via this Gödel numbering as saying something about strings in PA.

Next step: consider the new sentence, \(S(m)\), which says that "no matter what \(n\) you consider, \(G(m,n)\) is never zero"; we write this \(\forall n, G(m,n) \neq 0\). (The mathematical symbol \(\forall\) is pronounced "for all".) Again, this can be given a Gödel number, since it can be expressed in terms of PA. Call this Gödel number \(i\), and consider \(S(i)\). Is this statement provable? Well, suppose it had a proof. That proof would have some Gödel number, say \(j\). Then \(G(i,j)=0\), contradicting \(S(i)\). So \(S(i)\) cannot have a proof. But we can see that in the interpretation via Gödel numbers, that's just what \(S(i)\) says, and so it is in fact true, i.e. it is a true statement about standard integers. But it cannot be proven within PA.

Actually, that's not quite true. What's actually true is that if PA is consistent, then it cannot be proven within PA. If PA is not consistent, then there's no problem with having a proof of a statement and a proof of its negation too. Of course, we believe that PA is consistent, because we have the von Neumann integers as an example that satisfies the axioms of PA.

So now we know that there are true statements about the standard integers which can be expressed in the language of PA, but not proven in PA.

What are the implications?

First, we note that we managed to argue that as long as PA was consistent, we could deduce that \(S(i)\) had no proof in PA. It turns out (again, not trivially) that the statement that PA is consistent can be made within PA; we call this statement \(\mbox{Consis}\). But there is a standard logical result called the deduction theorem that enables us to convert "assuming \(A\) we can deduce \(B\)" into a proof of "\(A \Rightarrow B\)". So what? Well, suppose we had a proof within PA that PA was consistent. Then we could chain this proof of \(\mbox{Consis}\) together with a proof of \(\mbox{Consis} \Rightarrow S(i)\) to get a proof of \(S(i)\), which we know to be impossible. Thus although the statement that PA is consistent can be made within PA, it cannot be proven within PA.

Second, you can't fix this just by adding in the Gödel statement as a new axiom. Exactly the same process will enable the construction of a new Gödel sentence which is true but not provable within PA.

It also tells us that we can't prove that axiomatic set theory is consistent, using only the language of axiomatic set theory; this is because PA lives naturally inside set theory, and so the incompleteness theorem still applies.

It also tells us that PA has nonstandard models, i.e. that one can find a set, \(M\), and a successor function, \(t\), which satisfy all of \(1 \ldots 6,7'\), but where \(M\) is not isomorphic to \(\mathbb{N}\); and also that there are statements true of this model but not of the standard integers. This is also not obvious, but it does follow from standard results in model theory, which is all about attaching semantics to syntactic structures.

There are also quite a few things it doesn't tell us.

First, and most importantly it doesn't tell us that we can never be sure of a proof; all it tells us is that first order reasoning can't justify itself completely.
It doesn't tell us any interesting statements that are true of the standard integers but not provable in PA, and
It doesn't even tell us if there are any.

The proof of the pudding

In fact, there are some. Here, as a reward for sticking with this (or for your acuity in jumping straight to here) is Goodstein's theorem, which is one (really quite astonishing) example.

Let \(n\) be any number, and construct a sequence beginning at \(n\) as follows. First, write \(n\) as a sum of powers of \(2\); write each of the exponents in this sum as a sum of powers of \(2\); keep on going as long as you can. (Don't forget that any positive integer to the power of \(0\) is \(1\).)So, for example, with \(n = 37\), we get \[ 37 = 2^{2^2+1} + 2^2 + 1 \] Now repeat the following pair of operations:

increase the base for this sum of powers expression by \(1\)
subtract \(1\) from the result

The first operation gives \[ 3^{3^3+1} + 3^3 + 1 \] and the second gives \[ 3^{3^3+1} + 3^3 \] Next, we get \[ 4^{4^4+1} + 4^4 - 1 = 4^{4^4+1}+3 \times 4^3 + 3 \times 4^2 + 3 \times 4 + 3 \] Proceeding with this for three more iterations, we reduce the \(3\) to \(0\), at the expense of all the \(4\)'s rising to \(7\). At the next iteration, the \(3 \times 7 \) becomes \(2 \times 8 + 7\) and we gradually kill off the trailing \(7\) which is now no longer affected by the growth of the base.

It seems obvious that the increase of the base makes the number much bigger, and all the second operation does is to subtract \(1\), so the number will just keep getting bigger and bigger. Trying it out starting with \(n=4\) will probably confirm this opinion.

Astonishing though it may seem, this process eventually reaches \(0\), no matter what \(n\) you start with. Roughly speaking, the "subtract \(1\)" operation eventually kills off each tower of exponentials, no matter how tall it is, knocking it down to something which is not affected by the next increase of base. It isn't quick, though.

But this fact cannot be proven in PA. It is true of the standard integers, and can be shown to be true by arguments which are more powerful than those available to PA. The proof is non-trivial, and requires a technique called transfinite induction.

And finally ...

This has already been too long; but I hope I've managed to get across some of the subtlety of Gödel's work, and just how mathematical notions of truth and provability are related.

For those who want more

R Smullyan (1988) Forever Undecided: a puzzle guide to Gödel
T Franzen (2005) Gödel's Theorem: an incomplete guide to it use and abuse
Wikipedia page on Gödel's theorems
Wikipedia page on Goodstein's theorem

Thursday, 20 April 2017

There's so much maths to learn...or is that so little?

The problem of remembering

Observation has led me to the conclusion that there are two very different kinds of mathematics student.

One kind thinks one of the disadvantages of mathematics is that there is so much to learn.
The other thinks one of the advantages of mathematics is that there is so little to learn.

The strange thing is that both are correct: where they differ is they way that they approach the process of learning the mathematics.

The student of the first kind is the one who simply tries to learn everything by heart, whether it is a brute fact, a concept, or a standard proof. It has to be memorized exactly, in the way one learns a poem in a language one doesn't speak.

The student of the other kind has different strategies for remembering different types of information.

I see it as part of my job to try to help students of the first kind become students of the second. So, how to go about that?

The types of things that need remembering

I'll break the types of information up into four categories, each with its associated method or learning. To a first approximation, the first two categories are school level, and the second two undergraduate level. But that's only a first approximation, and some particular items will straddle categories, in some cases depending on the level of the learner.

Arbitrary Stuff, eg order of precedence for arithmetic operators, names of functions...
Formulae and Statements of Theorems, eg binomial expansion, trig identities, integrals and derivatives, product rule for differentiation...
Conceptual Definitions, eg limit, continuity, differentiability ...
Proofs, eg sum of limits is limit of sums, a bounded monotone sequence has a convergent subsequence ...

How to remember them

One approach to the problem of having stuff to learn is simply to deny the necessity. After all, you can just look things up whenever you need to know them, so you don't need to remember them. Alas, that just isn't so. You need stuff to be part of your mental toolkit in a way that makes using it effortless so that you can devote the effort to solving the problem that you need to know the stuff for. If you're spending significant effort on trying to hold in short term memory things which aren't bedded into your long term memory, then you're robbing the real job of that effort.

So we accept that some things just have to live in our heads, and need strategies for getting them there. If possible, the strategies ought to be efficient.

Type 1

The things of the first category are what I think of as 'brute facts'; there's no underlying pattern or reason to aid you in committing them to memory, you just have to do it.

I'll be blunt: I am not an enemy of memorization or rote learning. This is not a guilty confession, it is a simple statement. In some contexts memorized material can be useful: it's a Good Thing to have a mind furnished with interesting bits of literature, or know the general timeline of an aspect of history, or to know that Bach precedes Beethoven chronologically, for example. And there isn't really any way out of it for some things. If you want to remember the order of precedence of arithmetic operations, or which of \(\sin(\theta)\) and \(\cos(\theta)\) is the \(x\) and which is the \(y\) coordinate of a point on the unit circle an angle \(\theta\) round from the \(x\) axis, or the definition of \(\sinh(x)\) in terms of \(e^x\) and \(e^{-x}\) then you pretty much just have to commit them to memory.

Mnemonics help, and flash cards with spaced rehearsal give a very effective method for fixing these things in memory. If there's a meaningful picture, draw it. (This goes for all categories!) Constant usage will also help: and why would you memorize something you don't need to use a lot?

Type 2

In the second category I place standard formulae and statements of theorems: I'll just refer to both as results here. Some are easily derived, others not so much. I am not in favour of using mnemonic devices to remember results, as I think these can distract from the meaning. For these, I recommend keeping a list (or lists), and referring to it/them at need.

But it is important to rehearse mentally the meaning of the result every time it is looked up. It is the meaning that needs to be remembered, not a literal pattern of words, letters and function names. (As an aside, I can say I find it distressing to see a student confused by being asked to differentiate \(v/u\) when they remember the quotient rule as something that tells them how to differentiate \(u/v\).)

Its also important to look for patterns, or special cases that make the general case easier to remember. I never learned the addition rules for \(\sin\) and \(\cos\), but I did learn the double angle formulae and then worked back to the 'obvious' thing that reduced to the double angle formulae.

Patterns are important. (I know I just said that.) When a pattern is spotted, the list should be rearranged to take advantage of that: put related results together with a comment on the relationship.

And sometimes it is easy to derive a result; whenever it is, the derivation should be worked through every time the result has to be looked up.

As before, if there is a good picture which illustrates the result, draw it and think about the relationship.

All this sounds like effort, and it is. But the effort of processing the formulae over and over makes it much easier to remember them. In the long run, it is more effective and gives more meaningful knowledge of the formulae than rote memorization of the strings of symbols does. It also a lot less drudgery (and more interesting!) than writing formulae out five hundred times and hoping that that will make tjem stick.

Type 3

The third category is less commonly encountered in secondary school. It contains definitions of mathematical ideas: concepts like limit, continuity, differentiability, for example, or objects like metric or topological spaces where there is a significant amount of structure to work with.

I suspect that part of the problem faced by many undergraduates is that their strategies for remembering mathematics have been developed to deal with what is mostly information of the first two kinds. However, as they progress, the type of material that has to be remembered increases considerably in complexity, and they may not receive much explicit guidance in how to cope with this new problem. Certainly it took me a long time to realize that it was an issue, and so to start to provide such guidance.

Again, I strongly recommend to my students that they don't try to learn off by heart the definition I give in class. I tell them to think about what the definition means, re-express it in their own words, in different ways, draw pictures, think of examples of things which satisfy the definition (and which don't). To try to distill out a brief expression of the idea in their own words, and use that as a skeleton for a formal statement.

This advice is heavily biased by the fact that the only way I can retain this type of material is by working with the idea until it is a natural tool, and then I can construct a formal definition whenever it is needed. And it is rarely expressed exactly the same way as the previous time.

Underneath it all, it isn't really a matter of learning the definition. It's a matter of absorbing the definition by working with it, and then constructing a statement of the definition when it is required.

Type 4

Learning proofs is something of a bugbear for many students. They often seem to see being asked to produce proofs as part of an exam as analogous to having some jack-booted fascist insist that they jump through a flaming hoop just because they can. So the first task is to explain that there is a rationale to why knowledge of some proofs is required: it is because they use techniques, or ideas, which can also be used in other situations. It helps to make this plausible if the proof techniques are in fact used in other situations.

So, let's assume that the students have been persuaded that knowing how to prove some standard results is a Good Thing for reasons other than to obtain an extra 5 marks in the exam. (In my experience, if the promise of 5 marks is the only motivation, it is a pretty weak one,, judging by the level of performance it elicits.)

Again, I strongly advise the students against trying to learn the proof off by heart.

The first thing I advise is to work through the proof to understand it. This seems obvious, but to at least some students it seems like additional work, not a labour-saving device. It's hard to remember something you don't understand.

But there is the question of what it means to understand it. A necessary, but not sufficient condition is that the logic of the step from each line to the next be clearly understood. But that certainly isn't enough. In addition to that there has to be an understanding of how the steps accumulate to provide the proof. There is generally a core idea that the structure of the proof is built around, and a firm grip of that central idea (or possibly ideas) together with how the individual steps realize the idea combine to give a solid feel for the truth of the result. Ideally, it shouldn't just be an intellectual acceptance that the steps go from premises to conclusion, so the conclusion must be right, but a more visceral feeling for the truth of the conclusion.

And once the proof has been thought through, wrestled with, and boiled down to a core idea or two, there should be no necessity to 'learn the proof'. The statement of the theorem automatically throws up the core ideas of the proof, and fleshing those ideas out to a formal proof is then just (!) a matter of putting in the detail.

The philosophy here is similar to that for Type 3; you don't learn the proof, you wrestle with it and work with it until it is absorbed into your mathematical toolkit. Eventually you remember it. At no point do you memorize it.

Disclaimer

It may have been patently obvious, but the above is not the result of me carrying out carefully designed scientific experiments into just which strategies for learning different types of material are most effective. Some of it is based on reading about how memory works, some on what works for me, and some on the responses I've received to advice I've given to students in the past.

Sunday, 16 April 2017

The proof goes this way. Or that way. It depends on what you're proving.

It seems that I'm not alone in my experience of students not presenting (or appreciating) the logical structure of a mathematical argument, so I'd like to thank everybody who responded to Which way do I prove that? for their comments, and for inadvertently reassuring me that it isn't my teaching (or at least, it isn't only my teaching) that's to blame. Phew.

So, what to do about it?

My preferred solution would be to have A-level marking require them to get it right, to reinforce what school teachers tell them. If the teaching wasn't undermined by the assessment, they'd be more likely to take the logical structure seriously in the first place, and form good habits rather than bad. Unfortunately, that's outside my control.

Next up is a heartfelt plea to teachers to at least insist on it being done in class/homework, and explicitly mark down when it isn't done right. That's partly because then it won't be my problem any more (which is always a good thing), but more because it's harder to re-shape a habit than form one. Also, it's a pity if those who don't go on further with maths don't come to appreciate this very central idea of how to construct a valid argument. I am skeptical about transferable skills at the best of times, but would like to think that the ability to construct a correct argument and spot a fallacious one is something that people could take away from a mathematics course.

OK, let's assume I can't just pass the buck either to schoolteachers or my colleagues. What can I do about it?

I've got some ideas which I plan to flesh out. Some are just exhortations, some are ideas for exercises which I hope will help.

First, the general advice to get them thinking about it the right way.

Before anything specific, remind them that when they are writing maths, they are supposed to be giving somebody a reason to believe the answer, not just giving them the answer. And explain that that's all we mean by a proof: it's the answer to the question "Why should I believe that?" Quite a few seem to go into a flat spin when the see the instruction 'Prove that...', because they don't actually know what they're being asked for, but they think it's something both mysterious and difficult.

Next, two ways of explaining how to see what might be missing in their working.

Suggest that they verbalize what they are writing down. Thanks to @DrBennison for this, it's a way of making the previous item concrete.
A slight variation on the previous: tell them they should read it out loud when they're finished. It should make sense!

That gives some kind of high-level guidance: but it tells them the intended destination, without giving much in the way of a detailed route-map. So it's not enough. Judging by what I see with missing or incorrectly pointing implications, they also need to sort out just what the relationship is between two statements, and how these need to fit together to form a correct argument.

I suspect the the problem is exacerbated by the fact that the appropriate direction for the implications depends on what one is trying to establish. There are two common types of problem involving chains of inference that students have to solve:

Find the values of \(x\) that satisfy some condition, usually an equation \(E(x)=0\) (or an inequality, \(E(x)>0\)).
Given statement \(A\), show that statement \(B\) follows.

It's very common to see solutions to the first problem of the form \[ \begin{split} & E(x) = 0 \\ \Rightarrow & F(x) = 0 \\ \vdots\\ \Rightarrow & G(x) = 0 \end{split} \] (where the solution set of \(G(x)=0\) is obviously \(S=\{a,b,\ldots c\}\))

Therefore the solution set of \(E(x)=0\) is \(S\).

The only problem is that this chain of inferences doesn't establish that those are actually the solutions to \(E(x)=0\), only that \(S\) contains the solution set of \(E(x)=0\); we have proved that \(E(x)=0 \Rightarrow x \in S\), and what we need to show is that \(E(x)=0 \Leftrightarrow x \in S\).

So we need to keep track of the relationships and (usually) make sure that they're really equivalences, and what the consequence is when they aren't. (Have we potentially lost solutions? Or gained spurious ones? And what do we have to do about that?)

Similarly, in the second case, it seems very common for students to start off with what they want to prove, \(B\), and find a chain of inference from \(B\) to \(A\), rather than vice versa. I see this in both attempts to prove that two expressions are equal, and in attempts at proof by induction, which I already grumped about.

Well, that's restated the problem and considered it in a bit more detail, and as a mathematician I'm tempted to regard that as a job well done and go off and have a cup of tea or think about something else for a bit. But I really want some progress: if not a solution, at least a solution strategy.

I think both ends of the problem need something done: the detail of just what the relationship is between successive statements in a solution, and the broader picture of just which relationships are required in a particular case.

I think that the following set of ideas for exercises will go some way towards addressing both problems. Now all I have to do is come up with some good, appropriately graded particular cases and try them out on a real live class of students.

Pairs of statements (equations or inequalities): which implies which?
Correct arguments with the connectives missing: fill them in.
Fixable arguments with some incorrect connectives: fix them.
Examples of "solving" equations where each statement implies the next, but the final solution set is wrong: where did it go wrong, and what can we do about it?
Samples of invalid arguments: explain what is wrong.

I'm sure there's nothing actually original there, just new to me. So I hope to hear from people already using ideas like this (or ones I haven't thought of!) about what you do, and how well it works. If there are resources out there that I don't know about, I'm delighted to be pointed at them. Alternatively, when I get round to building some specific examples (and now I've said it in public, I'm committed to doing it), I'll make them available to anybody who wants them.

Monday, 10 April 2017

Which way do I prove that?

I've been marking a lot of (first year undergraduate) coursework lately, and I'm been a fairly significant amount of work exhibiting the same problem. It's definitely in the minority, and I'm not sure whether it's getting more common, or I'm just getting more sensitive to its appearance, but I find it worrying.

I'd very much like to hear from people teaching (A-level or undergraduate) if they are finding similar issues, and if so, what remedies they find effective.

Here's an example of the type of working I mean for solving an equation: \begin{gather} x = \frac{2}{3-x} \nonumber \\ (3-x)x = 2 \nonumber\\ 3x-x^2 = 2 \label{nolog} \\ x^2-3x+2 =0 \nonumber \\ (x-1)(x-2)=0 \nonumber \\ \end{gather} therefore \(x=1\) or \(x=2\).

Consequently, I've been spending a lot of time giving feedback pointing out that a collection of equations isn't an argument, it's just a collection of equations. To make an argument (or solve a problem) you have to explain what the assertions have to do with each other. Sometimes one line is equivalent to the next, sometimes one line implies the next, and sometimes a line is implied by the next. Which of these happens really matters, because if you want to prove the last equation you need the implications to proceed down the page, but if you want to find the solution to an equation you need the implications to proceed upward.

So, in addition to pointing out the requirement to explain the logical connections on each script that required it, I also pointed out to the class that they would never see a mathematics book provide a sequence of equations without any indication of the logical connections: if they didn't believe me, they should just take out any of their current textbooks, or an A-level text from their previous year and look. There would always be an implication sign, or an equivalence, or a word like "hence" or "therefore" or possibly (depending on the nature of the sequence) "because".

Then I made an unfortunate discovery, by the simple expedient of looking in an A-level textbook. It didn't take me long to find sample calculations of type \eqref{nolog}. In the textbook, these sequences were always of equations which were each in fact equivalent to the next, but this wasn't said explicitly. Such examples must contribute to (a minority of) students not realizing it is important to explain the logical connections.

Of course, things are easily fixable in the case of \eqref{nolog}, since all the equations are in fact equivalent. But the consequence is that we then get things like: \begin{equation*} \begin{split} |x+1| &= 2x \\ (x+1)^2 &= 4x^2 \\ x^2+2x+1 &= 4x^2 \\ 3x^2-2x-1 &= 0\\ (3x+1)(x-1)&=0 \end{split} \end{equation*} therefore \(x=-1/3\) or \(1\). The problem is that third step, where the second equation implies the third, but not vice versa: a spurious solution is introduced. Alas, it gets worse.

The real problem of not being clear about the implications shows up not so much when an equation is being solved as when a result is being proved. For example, on asking for a proof that \(\sec^2(x)-1= \tan^2(x)\) one is likely to see something like: \begin{equation} \begin{split} & \sec^2(x)-1 = \tan^2(x) \\ \Rightarrow \qquad & 1-\cos^2(x) = \sin^2(x) \\ \Rightarrow \qquad & 1 =\cos^2(x)+\sin^2(x)\\ \Rightarrow \qquad & 1=1 \qquad \checkmark\\ \label{duff} \end{split} \end{equation} Each implication is correct, but the argument presented is that IF the required result is true, THEN \(1=1\). Unfortunately, this does nothing to prove that the required result is true.

This time out it showed up most obviously in a proof by induction; to prove that \(P(n)\) held for all integers \(n \geq 0\), just about everybody started off by proving \(P(0)\). But then a bunch of them continued by assuming \(P(n)\) was true when \(n=k\) for some arbitrary \(k \geq 0\), and then proceeded to write down \(P(k+1)\) and deduce \(P(k)\) from it.

Consequently, I've been trying to stress the importance of the logical flow from what is known to what is required, rather than vice versa.

I've also taken to showing the following "proof" as (I fondly hope) an example of why it is a mistake to present "proofs" like \eqref{duff}: \begin{equation} \begin{split} & 1 = 2 \\ \Rightarrow \qquad & 0 \times 1 = 0 \times 2 \\ \Rightarrow \qquad & 0=0 \qquad \checkmark\\ \label{duffer} \end{split} \end{equation} I'm assured by everybody I show it to that \eqref{duffer} is obviously wrong (which is something of a relief). I guess I'll find out how successful I've been when the exam scripts come in...

Saturday, 8 April 2017

Quadratic( Form)s, discriminants and determinants.

We all know that an important ingredient in the solutions of the quadratic equation \begin{equation} \label{quad} ax^2 + 2bx +c = 0 \end{equation} is the discriminant, \(\Delta\), where \begin{equation*} \Delta = b^2-ac = - \left| \begin{array}{cc} a&b\\b&c \end{array} \right| \end{equation*} and so is basically a determinant, as I previously pointed out .

Now, the discriminant is essentially an algebraic quantity, while the determinant is essentially a geometric one. This suggests that there might be some interesting geometry hiding inside the problem of solving a quadratic equation-by which I mean more interesting than just finding where a parabola cuts the \(x\)-axis. And there is.

Let's make something new out of \(a,b\) and \(c\): the quadratic form \begin{equation*} Q(x,y) = ax^2+2bxy +cy^2 = \left( \begin{array}{cc} x&y \end{array} \right) \left( \begin{array}{cc} a&b\\b&c \end{array} \right) \left( \begin{array}{c} x\\y \end{array} \right) \end{equation*} Then a curve of the form \begin{equation*} Q(x,y) = k \end{equation*} is a conic section.

If you know a little linear algebra, then the next bit can be seen quite nicely in terms of the eigenvalues of the quadratic form. But if you already know that much linear algebra, you've probably already figured out what I'm about to explain, so I'm not going to take that route.

Fortunately, it is also possible to get there by a judicious application of some algebraic brute force, so that's what's about to happen.

We can find out what type the conic section is by completing the square on \(x\). \begin{equation*} \begin{split} Q(x,y) & = a(x^2+2\frac{b}{a}xy) + cy^2\\ &= a\left(x+\frac{b}{a}y\right)^2 + \frac{ac-b^2}{a}y^2\\ &= a\left(x+\frac{b}{a}y\right)^2 -\frac{1}{a} \Delta y^2 \end{split} \end{equation*}

Now there are three possibilities.

If \(\Delta \lt 0\) then we have an ellipse (if \(a\) and \(k\) are the same sign);
if \(\Delta=0\), a straight line; and
if \(\Delta \gt 0\), a hyperbola.

But what does this have to do with quadratic equations? \(x\) is a solution of the quadratic equation \eqref{quad} if \(Q(x,1)=0\), so the solutions are given by the \(x\) coordinates of points where the line \(y=1\) intersects the curve \(Q(x,y)=0\), which is now a degenerate conic section. Now we can see what happens in each of the three cases.

When \(\Delta \lt 0\) the only point on the curve is the origin, so there is no point \((x,1)\).
When \(\Delta = 0\), the conic is the straight line \(x+by/a = 0\), so when \(y=1\) we have the single solution \(x=-b/a\).
Finally, when \(\Delta \gt 0\), the conic is a pair of straight lines given by \begin{equation*} \sqrt{\Delta}y=\pm a \left( x+ \frac{b}{a} y\right) \end{equation*} and setting \(y=1\) gives us the usual solution.

So we can now see that the discriminant of the original quadratic equation is also (minus) the determinant of the quadratic form that describes a (degenerate) conic section, and solutions of the quadratic equation are where the line \(y=1\) intersects this conic. What previously gave us algebraic information (the presence or absence of real roots) now gives us geometric information. When the determinant is positive, the conic degenerates to the origin, with no solutions; when it is zero, to a single line, with one solution; and when it is negative, to a pair of lines and hence two solutions.

To summarize: we can think of the discriminant as an algebraic object, whose value determines how many (real) square roots it has and so how many solutions the quadratic has; or we can think of it as a geometric object, the determinant of a quadratic form, which determines how many intersection there are between a (degenerate) conic section and the line \(y=1\) and so how many solutions the quadratic has.

Addendum: Once upon a time, a mathematics student might expect to spend considerable time studying conic sections. It's a topic that seems to have largely dropped out of most syllabuses (both at school and university level), and I have a nagging suspicion that we're the poorer for it. Take a look at Salmon's conic sections if you don't believe me. Apart from containing a wealth of lovely mathematics, it's beautifully written (when did you last think that about a textbook?), and you may find it surprising to find what is explained and what is assumed as previous knowledge.

Thanks to David Bedford (@DavidB52s) for pointing out a typo in the calculation. It's gone now.

Thursday, 6 April 2017

The quadratic formula is wrong.

No, of course the quadratic formula isn't incorrect. If you want the solutions to the equation \begin{equation} ax^2 + bx + c = 0 \label{tradQ} \end{equation} then they are, of course, given by the familiar formula \begin{equation} x= \frac{-b \pm \sqrt{b^2-4ac}}{2a}. \label{sol} \end{equation} You can't really argue with that. It's established using nothing more than very simple algebra, and I am not going to claim that basic algebra is incorrect.

But it is ugly: and I've seen people mix up where that \(4\) and \(2\) go. Fortunately, there's a neat way of having a simpler formula, and it's obtained by the equally neat trick of rephrasing the problem. (This is a standard trick in mathematics, and it doesn't get nearly enough publicity.)

So, instead of asking for solutions to \eqref{tradQ}, I ask for solutions to the equation \begin{equation} ax^2+2bx+c = 0. \label{newQ} \end{equation}

In other words, my new \(b\) in \eqref{newQ} is half of the original \(b\) in \eqref{tradQ}. There are two ways of seeing what the consequences are.

First, I can replace \(b\) in the formula \eqref{sol} by \(2b\). This gives \begin{equation*} \begin{split} x & = \frac{-2b \pm\sqrt{4b^2-4ac}}{2a} \\ &= \frac{-2b \pm 2\sqrt{b^2-ac}}{2a} \\ &= \frac{-b\pm\sqrt{b^2-ac}}{a} \end{split} \end{equation*}

Alternatively, you can go through the completion of the square argument: \begin{equation*} \begin{split} & \qquad ax^2+2bx+c = 0 \\ \Leftrightarrow & \qquad x^2+2\frac{b}{a}x+\frac{c}{a} = 0\\ \Leftrightarrow & \qquad (x+\frac{b}{a})^2 - \frac{b^2}{a^2} + \frac{c}{a} = 0 \\ \Leftrightarrow & \qquad (x+ \frac{b}{a})^2 = \frac{b^2-ac}{a^2}\\ \Leftrightarrow & \qquad x + \frac{b}{a} = \pm \frac{b^2-ac}{a}\\ \Leftrightarrow & \qquad x = \frac{-b \pm \sqrt{b^2-ac}}{a} \end{split} \end{equation*}

Both ways (of course!) give the same answer, \begin{equation} \label{newRoots} x = \frac{-b \pm \sqrt{b^2-ac}}{a} \end{equation} but now without the pesky coefficients of \(2\) and \(4\) to get in the right place.

But something else comes along for the ride. It's now very obvious that the discriminant \begin{equation*} \Delta=b^2-ac \end{equation*} looks an awful lot like a determinant: in fact \begin{equation*} \Delta = - \left| \begin{array}{cc} a & b \\ b & c \end{array} \right|. \end{equation*}

So in \eqref{newRoots} we have a simpler formula for the roots of a quadratic. But we also have a question, which is: what on earth is going on with the determinant? What is this matrix doing that the sign of its determinant should decide whether or not the quadratic has real roots?

The simpler formula for the roots is nice, but really this question (and its answer) are the reasons why I think that what I have just gone through is the 'right' way to state (and solve) a quadratic equation. The answer involves a journey to the fascinating world of conic sections, which deserves its own discussion, so I'll leave it to a future note.