Simpson's Paradox

Classify two groups with respect to the incidence of one attribute; if the groups are then separated into several categories (subgroups), the group with the higher overall incidence can have lower incidence within each category (sub-group).

Examples:

Death sentence in murder cases for white and black defendants in the Southern U.S.
 
Sentence
 
Defendant's race 
Death
Not
total 
White
19 (11.8%)
141
160
Black
17 (10.2%)
149
166
 Higher percentage of Whites than Blacks are sentenced to death for murder.

 Suppose you then wondered if it makes sense to look at the race of the victim:
  
Defendant's race
 
White
Black
Victim's race
Death
Not
Death
Not
White
19 (12.6%)
132
11 (17.5%)
52
Black
0 (0%)
9
6 (5.8%)
97
Here, for both Black and White victims, a higher percentage of Black convicted murderers are sentenced to death than are White murderers (5.8% vs. 0% and 17.5% vs 12.6% for Black and White victims respectively).

How could this be?? There is a third variable here that is hidden in these statistics. If we look at likelihood of being sentenced to death for killing Whites and Blacks, we see that the death sentence is more likely if the victim was White than if the Victim was Black.
  
Sentence
 
Victim's race
Death
Not
total
White
30 (14.0%)
184
214
Black
6 (5.4%)
106
112
How does this help us explain the paradoxical finding that Whites are more likely to be sentenced to death than are Blacks once convicted of murder, yet for both Black and White victims, Black murderers are more likely to be sentenced to death? Victims race is a confound: People tend to murder members of their own race in this sample. Whites were more likely to murder Whites and this is treated as a more serious crime, at least in terms of the likelihood of it leading to the death penalty.

[Relative risk ratio = (30/214) ¸ (6/112) = 2.6]

This is an example of the danger of combining data from several distinct groups (with respect to the relation between two variables) in calculating correlations. One way to have avoided the initial erroneous correlation would have been to use stratified sampling. If even numbers of people are samples from the categories, the overall relationship will be an average of the relations in the subcategories.