Here is a simple
puzzle:
A man takes a diagnostic test for a certain disease and the result is positive. The false positive rate for the test in this case is the same as the false negative rate, 0.001. The background prevalence of the disease is 1 in 10,000. What is the probability that he has the disease?
This problem is
one of the simplest possible examples of a broad class of problems, known as
hypothesis testing, concerned with defining a set of mutually contradictory
statements about the world (hypotheses) and figuring out some kind of measure
of the faith we can have in each of them.
It might be
tempting to think that the desired probability is just 1- (false-positive
rate), which would be 0.999. Be warned, however, that this is quite an infamous problem. In 1982, a
study was published1 for which 100 physicians had been asked to
solve an equivalent question. All but 5 got the answer wrong by a factor of
about 10. Maybe it’s a good idea then to go through the logic carefully.
Think about the
following:
- What values should the correct answer
depend on?
- Other than reducing the
false-positive rate, what would increase the probability that a person
receiving a positive test result would have the disease?
The correct
calculation needs to find some kind of balance between the likelihood that the
person has the disease (the frequency with which the disease is contracted by
similar people) and the likelihood that the positive test result was a mistake
(the false positive rate). We should see intuitively that if the prevalence of
the disease is high, the probability that any particular positive test result
is a true positive is higher than if the disease is extremely rare.
The rate with
which the disease is contracted is 1 in 10,000 people, so to make it simple, we
will imagine that we have tested 10,000 people. Therefore we expect 1 true case
of the disease. We also expect 10 false positives, so our estimate goes from
0.999 to 1 in 11, 0.09091. This answer is very close, but not precisely
correct.
The frequency
with which we see true positives must be reduced by the possibility that we can
have false negatives also, how do we encode that in our calculation?
We require the
conditional probability that the man has the disease, given that his test
result was positive, P(D|R+). This is the number of ways of getting
a positive result and having the disease, divided by the total number of ways
of getting a positive test result,
where D is the
proposition that he has the disease, C means he is clear, and R+ denotes
the positive test result.
If we ask what
is the probability of drawing the ace of hearts on the first draw from a deck
of cards and the ace of spades on the second, without replacing the first card
before the second draw, we have P(AHAS) = P(AH)P(As|AH).
The probability for the second draw is modified by what we know to have taken place on the
first.
Similarly, P(R+D)
= P(D)P(R+|D), and P(R+C) = P(C)P(R+|C), so
- P(D) is the background rate for the
disease.
- P(R+|D) is the true
positive rate, equal to 1 – (false negative rate).
- P(C) = 1 – P(D).
- P(R+|C) = false positive
rate
So
which is
0.09090.
The formula we
have arrived at above, by simple application of common sense is known as Bayes’
theorem. Many people assume the answer to be more like 0.999, but the correct
answer is an order of magnitude smaller. As mentioned, most medical doctors also get
questions like this wrong by about an order of magnitude. The correct answer to
the question, 0.0909, is called in medical science the positive-predictive value of the test. Generally, it is known as the posterior probability.
Bayes’ theorem
has been a controversial idea during the development of statistical reasoning,
with many authorities dismissing it as an absurdity. This has led to the
consequence that orthodox statistics, still today, does not employ this vitally
important technique. Here, we have developed a special case of Bayes’ theorem
by simple reasoning. In generality, it follows as a straightforward
re-arrangement of probabilistic laws (the product and sum rules) that are so
simple that most authors treat them as axioms, but which in fact can be
rigorously derived (with a little effort) from extremely simple and perfectly
reasonable principles. It is
overwhelmingly one of my central beliefs about science that a logical calculus
of probability can only be achieved, and the highest quality inferences
extracted from data when Bayes’ theorem is accepted and applied whenever
appropriate.
The general
statement of Bayes’ theorem is
Here 'I' represents the background information: a set of statements concerning the scope of the problem that are considered true for the purposes of the calculation. In working through the medical testing problem, above, I have omitted the 'I', but in every case where I right down a probability without including the 'I', this is to be recognized as short hand - the 'I' is always really there and the calculation makes no sense without it.
The error that
leads many people to over estimate, by an order of magnitude, probabilities
such as the one required in this question is known as the base-rate fallacy.
Specifically in this case, the base rate, or expected incidence, of the disease
has been ignored, leading to a calamitous miscalculation. The base-rate fallacy
amounts to believing that P(A|B) = P(B|A). In the above calculation this
corresponds to saying that P(D|R+), which was desired, is the same
as P(R+|D), the latter being equal to 1 – false positive rate.
In frequentist statistics, a probability is identified with a frequency. In this framework, therefore, it makes no sense to ask what is the probability that a hypothesis H is true, since there is no sense in which a relative frequency for the truth of H can be obtained. As a measure of faith in the proposition H in light of data, D, therefore, the frequentist habitually uses not P(H|D), but P(D|H), and so he commits himself to committing the base-rate fallacy.
In case it is
still not completely clear that the base rate fallacy is indeed a fallacy, lets employ a thought experiment with an extreme case. (These extreme cases, while not necessarily realistic, allow the desired outcome of a theory to be obtained directly and compared with the result of the theory - something computer scientists call a 'sanity check'.) Imagine the
case where the base rate is higher than the sensitivity of the test. For
example let the sensitivity be 98% (ie 2% false positive rate) and let the
background prevalence of the disease be 99%. Then, P(B|A) is 0.98, and
substituting this for P(A|B), we have an answer that is lower than P(A) = 0.99.
The positive result of a high-quality test (98% sensitivity) giving lower
probability that the test subject has the disease than before the
test result was known.
[1] Eddy, D. M. (1982). Probabilistic reasoning in
clinical medicine: Problems and opportunities. In D. Kahneman, P. Slovic, & A. Tversky (Eds.),
Judgment under uncertainty: Heuristics and biases (pp. 249–267). Cambridge, England: Cambridge University Press. (In
this study 95 out of 100 physicians answered between 0.7 and 0.8 to a similar
question, to which the correct answer was 0.078.)
No comments:
Post a Comment