The following
data were used in a case of alleged gender bias in graduate admissions at an
American university:
|
Applications
|
% Admitted
|
Male
|
8442
|
44 %
|
Female
|
4321
|
35 %
|
Assume that we know that male and female applicants are equally capable. Is the allegation of discrimination
against female applicants proved by the data?
The data come
from a real case, from 1973 at the University of California, Berkeley. The data
looked damning, but a subsequent analysis actually demonstrated a slight but statistically significant bias in
favor of women. The original data are incomplete and do not account for confounding
factors. This is an example of Simpson’s paradox.
An imaginary
example
Before looking
into the resolution of this paradox, we'll examine some hypothetical data on
lung cancer rates. This clever example is
taken from a ‘Bad Science’ article, ‘Any set of figures needs adjusting before it can be usefully reported,’
by Ben Goldacre, and gives us a good chance to recognize quickly the likely 'true' explanation for the data. Again, these are imaginary numbers.
Our hypothetical researchers first produce the following
results from an epidemiological study:
Drinkers
|
Non-drinkers
| |
Lung
cancer rates
|
13.7 %
(366
/ 2666)
|
5.0 %
(98
/ 1954)
|
The data suggest
a causal relationship between drinking alcohol and contracting lung cancer.
Questions:
Does it seem
plausible that such a relationship exists?
Is there a
likely alternative explanation?
It is well
established, indeed it was one of the early triumphs of epidemiology, that the risk of lung
cancer is increased by smoking. Thinking along these lines, we imagine the investigators going back to the study participants and
asking them whether or not they smoked. The revised results are shown below:
Drinkers
|
Non-drinkers
| |
Smokers
|
23.1 %
(330
/ 1430)
|
23.2 %
(47
/ 203)
|
Non-smokers
|
2.9 %
(36
/ 1236)
|
2.9%
(51
/ 1751)
|
Here we discover
the cause of the problem with the original data: the actual cause of the
difference in cancer rates between the two groups was the difference in the
occurrence of smoking. Smoking acted as a confounder in the original
study. Looking at the new data, we see that for the two groups, smokers and
non-smokers, the rate of lung cancer is completely independent of whether a
person drinks alcohol. We also see that the number of people taking part in the
study who smoke but do not drink is very small. This is the fact that gave the
appearance of a higher risk for drinkers, compared to non drinkers.
So, what is
Simpson’s Paradox?
Simpson’s
paradox is a trap that one can fall into when determining cause and effect
relationships from frequency data.
It is that a
degree of correlation can be changed considerably when the data are divided
into different sub-groups, i.e. when the data are adjusted for confounders.
As in the case
above considering smokers and drinkers, it may appear that A caused B, but
when A is resolved into separate statements, such as AC and A~C ('~' meaning 'not'), the causal influence of A is seen to disappear.
It may even be,
as in the UCB sexual discrimination case, that the observed effect reverses direction when the apparent causal agency (e.g. the individual
being male or female) is resolved according to possible values of a confounding
variable, as we'll now see.
Back to the
initial problem
We
had the following data on graduate admissions:
Applications
|
% Admitted
| |
Male
|
8442
|
44 %
|
Female
|
4321
|
35 %
|
These data suggest very strongly that there
is a problem of women being unfairly treated. (In frequentist terms, the null hypothesis is rejected at a very high level of significance.) But take a
moment to consider this: what kind of confounders might be active in such a
situation? Are male and
female applications necessarily the same in all relevant respects? How might
they differ?
The following
extended data set clears up a lot. The applications have been broken down by
department:
Department
|
Male
|
Female
| ||
Applications
|
Admitted
|
Applications
|
Admitted
| |
A
|
825
|
62%
|
108
|
82%
|
B
|
560
|
63%
|
25
|
68%
|
C
|
325
|
37%
|
593
|
34%
|
D
|
417
|
33%
|
375
|
35%
|
E
|
191
|
28%
|
393
|
24%
|
F
|
272
|
6%
|
341
|
7%
|
Now when male
and female admittance is compared for each department individually, it is seen
that for most departments, female applicants had a higher probability to be
admitted. The difference is small but unlikely to be due to chance.
We see that the percentages accepted get smaller going
down the list, i.e. the hardest departments to get into are near the bottom.
Also the numbers of female of applicants tend to be lower near the top of the
list, while the opposite holds for the male applicants.
The reason,
therefore, that it appeared at first that women were being unfairly treated was
that most of the female applications were to departments that are harder to get
into, while most of the men were applying to departments with larger percentages of
applicants accepted.
How can we
guard against Simpson’s paradox?
This analysis illustrates an important general point about the presentation and reporting of statistics. The explanation for how the original study was being confounded would have been masked, had we only inserted the observed rates (the percentage figures) in the above tables. It is only by looking at the raw numbers that we can fully appreciate what is going on. This leads on to a much more general principle, that the raw data are always beneficial to examine, and should not be discarded after the statistical analysis has been performed. Researchers should continue to pay attention to their raw data, and it should also, where possible, be made available to reviewers and anybody interested in reading the published findings resulting from a study.
Where possible, the best guard against Simpson’s paradox is randomization. If the subjects in a trial have been randomized into 'treatment groups', then the potential confounders that go with them should also
be randomly distributed between groups, and only the true effects of the
treatments will be observed. This illustrates the vast superiority of
data from a carefully designed randomized trial over observational data. Epidemiological 'findings' from studies with tens of thousands of subjects have, on several occasions, been overturned as soon as data from randomized trials have become available. This article by Gary Taubes, 'Do we really know what makes us healthy?' includes several examples from medical science, including the once-held belief that hormone replacement therapy reduces a woman's risk of death by heart attack: the presence of confounders in one study with 16,500 participants was revealed when somebody noticed that women receiving HRT were also much less likely to be murdered than those not receiving the treatment.
This is not to say that epidemiological data should be ignored completely. Sometimes it is the best information we can get, and epidemiologists are clever people who understand the pitfalls of their profession far better than I do. In fact, many forms of science are of the epidemiological type, where randomization is impossible.
What does Simpson's paradox teach us about probability theory?
We might feel that the evident problems with the simple datasets in the above examples undermine the logical foundations of statistical analysis. How can it be that a clear relationship between two variables, as evidenced by the analysis of empirical data, turned out not to be a causal one? Contrary to what many have assumed, however, probability theory is only indirectly concerned with causal dependence. Probability theory deals with the logical investigation of information, and so, as I will discuss in a future post, is fundamentally only concerned with the logical dependence of variables, rather then with matters of direct causation.
I think the table with (non)drinkers vs (non)smokers should have the 2nd column in reverse order, i.e. so that smokers/nondrinkers are 2.9% and non-smokers/non-drinkers are 23.2%
ReplyDeleteOther than that, thanks for this article!