Classify two groups with respect to the incidence of one attribute; if the groups are then separated into several categories (subgroups), the group with the higher overall incidence can have lower incidence within each category (sub-group).
Examples:
Death sentence in murder cases for white and black defendants in the Southern U.S.
|
|
||||
|
Defendant's
race
|
|
|
total | |
|
White
|
|
|
160 | |
|
Black
|
|
|
166 | |
Suppose you then wondered if it makes sense to look at the race of the victim:
|
|
|||||
|
|
|
||||
|
Victim's
race
|
|
|
|
|
|
|
White
|
|
|
|
|
|
|
Black
|
|
|
|
|
|
How could this be?? There is a third variable here that is hidden in these statistics. If we look at likelihood of being sentenced to death for killing Whites and Blacks, we see that the death sentence is more likely if the victim was White than if the Victim was Black.
|
|
||||
|
Victim's
race
|
|
|
total | |
|
White
|
|
|
214 | |
|
Black
|
|
|
112 | |
[Relative risk ratio = (30/214) ¸ (6/112) = 2.6]
This is an example of the danger of combining data from several distinct groups (with respect to the relation between two variables) in calculating correlations. One way to have avoided the initial erroneous correlation would have been to use stratified sampling. If even numbers of people are samples from the categories, the overall relationship will be an average of the relations in the subcategories.