![]() |
in partnership with | ![]() |
| Providing Outreach in Computer Science | Bringing Bayesian solutions to real-world risk problems |
Fred and Jane study on the same course
spread over two
years. To complete the course they have to complete 10 modules. At the
end,
their average annual results are:
|
|
Year 1 average |
Year 2 average |
Overall Average ? |
|
Fred |
50 |
70 |
60 |
|
Jane |
40 |
62 |
51 |
So how is it possible that Jane got the prize for the student with the best grade?
It’s because the overall average figure
given in the column
is an average of the year averages rather than an average over all 10
modules.
We cannot work out the average for the 10 modules unless we know how
many
modules each student takes in each year.
In fact:
Fred took 7 modules in Year 1 and 3 in Year 2
Jane took 2 modules in Year 1 and 8 modules in Year 2.
Assuming each module is marked out of 100, we can use this information to compute the total scores as follows:
|
|
Year 1 total |
Year 2 total |
Overall total |
Real Overall average |
|
Fred |
350 (7 x 50) |
210 (3 x 70) |
560 |
56.0 |
|
Jane |
80 (2 x 40) |
496 (8 x 62) |
576 |
57.6 |
So clearly Jane did better overall than Fred.
This is an example of what is commonly known
as Simpson’s
paradox. It seems like a paradox –
Fred’s average marks are consistently
higher than Jane’s average marks but Jane’s overall
average is higher. But it
is not really a paradox. It is simply a mistake to assume that you can
take an
average of averages without (in this case) taking account of the number
of
modules that make up each average. Look at it the following way and it
all becomes
clear:
In the year when Fred did the bulk of his modules he averaged 50; in the year when Jane did the bulk of her modules she averaged 62. When you look at it that way it is no surprise that Jane did better overall.
For
example, a medical study compared
the
success rates of two treatments for kidney stones. Each treatments was
applied
to two groups of people – one group in which each subject had
a small stone and
one group in which each subject had a large stone. The
‘average’ success rates
were:
|
|
Small stones |
Large stones |
Overall Average ? |
|
Treatment A |
93% |
73% |
83% |
|
Treatment B |
87% |
69% |
76% |
It was concluded therefore that treatment A was more effective.
But on inspecting the number of patients in each
of the four
groups it becomes clear that exactly the opposite was true:
|
|
Small stones |
Large stones |
Overall successes |
Overall success rate |
|
Treatment A |
81/87 |
192/263 |
273/350 |
78% |
|
Treatment B |
243/270 |
55/80 |
289/350 |
83% |
For more on Simpson’s paradox see here.