Image of Parson Bayes

For my grandkids:

In his book Thinking Fast and Slow, Daniel Kahneman gives an example of elementary Bayesian inference, posing this question:
"A cab was involved in a hit-and-run accident at night. Two cab companies, the Green and the Blue, operate in the city. You are given the following data: 85% of the cabs in the city are Green and 15% are Blue. A witness identified the cab as Blue. The court tested the reliability of the witness under the circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time. What is the probability that the cab involved in the accident was Blue rather than Green?"
Kahneman goes on to observe that "The two sources of information can be combined by Bayes's rule. The correct answer is 41%. However, you can probably guess what people do when faced with this problem: they ignore the base rate and go with the witness. The most common answer is 80%."

So why is the correct answer 41%?

Example of elementary Bayesian inference from Thinking Fast and Slow


Click here to run the calculator for Kahneman's example, or enter the values yourself above. (Calculator accepts arguments by query string, e.g.

Bayes' Theorem is easy to understand when shown graphically, as above. A more usual example, and one more relevant to most people, would be a medical test for differential diagnosis. Take the example above, but read A = patient with twitching nostrils has Wilbur's Nostritis (persistent twitching of the nostrils, I just made that up), and B = patient tests positive for Wilbur's Nostritis. Leave all the numbers the same. Say 15% of patients presenting with twitching nostrils have Wilbur's Nostritis (one of many causes of nostril twitch), and for these, Wilbur's Test will detect it with 80% sensitivity. But 85% of patients with twitching nostrils do not have Wilbur's Nostritis, and of these, 20% will get a positive test result, because the test is only 80% specific. So you would expect that only 41% of patients presenting with twitching nostrils who get a positive test result on Wilbur's Test will actually have Wilbur's Nostritis.

This is not an unusual scenario for medical tests, and explains why tests are often repeated, or additional and different tests done. For example, if the patient has tested positive on Wilbur's Test, and you have another, different test (Orville's Test) that has the same specificity and sensitivity, you can start with .41 as the prior probability P(A), and if the patient tests positive on Orville's test, the probability of his having Wilbur's Nostritis rises to about 74%. (A bunch of assumptions are involved here that we will not tarry over.)

It should be noted that sensitivity and specificity often have different values. If you take the original example and change the specificity to 97%, but leave the sensitivity at 80%, then P(A|B) doubles from about 41% to about 82%. If you increase specificity to 100% (green cab always identified as such), false positives are zero, and P(A|B) = 100%. You still have the false negatives, of course, but you can at least be sure that if the cab was identified as blue, it really was blue.