How careful do you have to be with probability problems? Very careful. Very, very careful. Lack of enough care leads to (apparent) paradoxes.

The puzzle

So, here's the puzzle. It's something of a classic in probability.

I tell you that John Smith has two children. I also tell you that one of them is a girl. I then ask you for the probability that the other is a girl.

Obviously, we need some ground rules here. We'll work with the slight simplification that boys and girls are equally likely, and that the probabilities of boy and girl are independent.

A plausible answer, and maybe the one I'm hoping for, is that since boys and girls are equally likely, the probability that the other is a girl is \(1/2\).

Let's pretend that was your answer, because now I get the chance to show you something interesting.

Ah-hah! I pounce.

No, you aren't thinking it through carefully enough. If you know that one child is a girl, then all you know about the children is that they aren't both boys. There are four equally likely sets of children: boy-boy, boy-girl, girl-boy and girl-girl. Since we've exclude boy-boy, then we are equally likely to have any of the remaining possibilities. In one third of them is the other child a girl, so the probability is actually \(1/3\).

I can dress this up more formally. \(A\) is the event John Smith has a daughter and \(B\) is the event John Smith has two daughters. I want \(P(B|A)\), the probability of \(B\) given \(A\). Fortunately, there's a formula for that. Since \[ P(A \cap B) = P(B|A)P(A) \] and \(P(A \cap B)=P(B)\) (since \(A \cap B = B\)), I see that \[ P(B|A) = \frac{P(B)}{P(A)} \] I know that \(P(B)=1/4\) and \(P(A)=3/4\), so (just as I said above), \(P(B|A)=1/3\).

This seems strange, and counter-intuitive. But I'm pretty sure I didn't get it wrong, because I wrote a little bit of code and simulated it: the following code randomly creates \(n\) two-child families, and finds the fraction of the time that a family with at least one daughter actually has two.

import random

def sim(n):
    hasGirl=0
    hasTwoGirls=0
    for ii in range(n):   
        kids=[random.randrange(2),random.randrange(2)]
        if sum(kids)>0:
            hasGirl += 1
        if sum(kids)==2:
            hasTwoGirls+=1
    print(1.0*hasTwoGirls/(1.0*hasGirl))

Running this for moderately large \(n\) certainly suggests that the probability is \(1/3\).

How much more convincing can you get?

But then I read this book, Standard Deviations by Gary Smith. It's an excellent exposition of how statistics and data can be misused to mislead. I was thoroughly enjoying it until...

He considers this problem of the two children and demolishes the above argument.

The puzzle redux

John Smith has two children, and takes one of them for a walk. If the child that he took for a walk was a girl, what is the probability that the other child is also a girl?

"Ooh, I know this one," I thought. "It's a third."

And then I read on. Smith carefully and lucidly points one that this is wrong, and that the probability actually is \(1/2\).

Naturally I took personal offence at this, so I read it carefully a few times, trying to find his mistake.

I didn't find one.

So I did the obvious thing. I worked out the probability carefully.

There are two things I'm interested in here. \(A\) is the event The child John Smith took a walk with is a daughter and \(B\) is the event Both of John Smith's children are girls.

I want to know \(P(B|A)\). Just as before, I have \(A \cap B = B\), so that \[ P(B|A) = \frac{P(A \cap B)}{P(B)}=\frac{P(B)}{P(A)} \] But \(P(B) = 1/4\) and \(P(A)=1/2\), so \(P(B|A)=1/2\).

By now I was feeling decidedly uncomfortable, so I did the next obvious thing. I wrote a little bit of code and simulated the situation where a randomly chose child from a randomly chosen family is taken for a walk \(n\) times, and finds what proportion of the time the child taken for a walk is a girl, she has a sister at home:

import random

def walksim(n):
    girlWalk=0
    hasTwoGirls=0
    for ii in range(n):
        kids=[random.randrange(2),random.randrange(2)]
        if kids[random.randrange(2)]==1:
            girlWalk += 1
            if sum(kids)==2:
                hasTwoGirls+=1
    print((1.0*hasTwoGirls)/(1.0*girlWalk))

Running this was moderately large values of \(n\) convinced me that yes, indeed, the probability was \(1/2\).

By now I was thoroughly confused.

So What's Going On?

According to a quotation popularly attributed to Einstein, it's a mark of lunacy to keep asking the same question, hoping for a different answer. So naturally, I was a little concerned about just what it might mean if you managed to actually get a different answer.

So I thought about it a bit harder, and came to finally came to a (in retrospect, obvious) conclusion: they aren't actually the same puzzle.

So what's the difference? In version one, you're told that one of John Smith's children is a daughter. In version two, you're told that John Smith has taken one of his children for a walk, and that child is a daughter. Either way, we've been told that one of John Smith's children is a daughter.

But we haven't actually been given quite the same information.

Version 1 simply guarantees that at least one child is a daughter; version 2 tells us that the particular child that John Smith took for a walk is a daughter. The possibility that the child left at home is a daughter is excluded from the calculation in version 2, but included in version 1, which is why the probability that both are daughters is lower in version 1 than in version 2.

So the whole confusion boils down to what you think it means to be told that One of John Smith's children is a daughter. If you interpret it as saying no more than John Smith has at least one daughter, then the answer is \(1/3\). But if you interpret as saying that one particular child is a daughter (the older one, or the one standing on the left, or the only one who has ever been locked alone in the cupboard under the stairs for misbehaving, or, indeed, the one that he took for a walk that day) then the answer is \(1/2\).

To the extend that there's a disagreement here, it isn't an internal disagreement in probability theory (which would be a Very Bad Thing Indeed), it's a disagreement of interpretation of what One of John Smith's children is a daughter means.

The moral

As is often the case, the problem isn't the mathematical analysis: in each version that was straightforward. The problem is setting up the right mathematical model for the situation, which can be very tricky if there's any linguistic ambiguity in the description.

Shiny Pebbles and other stuff

Monday 9 April 2018

The two child puzzle

The puzzle

The puzzle redux

So What's Going On?

The moral