Christopher H. Morrell
Loyola College
Journal of Statistics Education v.7, n.3 (1999)
Copyright (c) 1999 by Christopher H. Morrell, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.
Key Words: Categorical data; Comparing proportions.
Real world examples of the reversal of the direction of an association when an additional explanatory variable is taken into account are unusual and hard to find. This article presents an example of Simpson's paradox from a South African longitudinal study of growth of children. The example demonstrates the importance race plays in every aspect of South African life.
Simpson's Paradox (Simpson 1951) occurs when the direction of an association between two variables is reversed when a third variable is controlled. This topic is sometimes covered in introductory statistics courses. For example, the introductory texts by Moore and McCabe (1998) and Wardrop (1995) include a section on this paradox. In addition, two recent papers provide examples of this reversal (Appleton, French, and Vanderpump 1996 and Westbrooke 1998). However, real world examples of the paradox are rare. This paper presents an example of Simpson's Paradox that occurred in a South African longitudinal study of growth of children.
The Birth to Ten study (BTT) (Yach et al. 1991, Levitt et al. 1999) commenced in the greater Johannesburg/Soweto metropolitan area of South Africa during 1990. A birth cohort was formed from all singleton births during a seven-week period between April and June 1990 to women with permanent addresses within a defined area. Identification of children born during this seven-week period and living in the defined areas took place throughout the first year of the study, by the end of which 4029 births had been enrolled. The BTT study collected prenatal, birth, and early development information on these children. The aim of the study was to identify factors related to the emergence of cardiovascular disease risk factors in children living in an urban environment in South Africa. In 1995, when the children were five years old, the children and caregivers were invited to attend interviews. Detailed questionnaires were completed that included questions about living conditions within the child's home, the child's exposure to tobacco smoke, and additional health-related issues. The five-year sample consisted of 964 children. Unfortunately, there was a great deal of missing data in the baseline group, especially on the variables reported below.
If the five-year sample is to be used to draw conclusions about the entire birth cohort, the five-year group should have characteristics similar to those who were not traced from the initial group. Thus, the five-year group was compared to those who did not participate in the five-year interview on a number of factors. One of the factors was a variable that determined whether the mother had medical aid (which is similar to health insurance) at the time of the birth of the child.
Table 1 shows that 11.1% of those in the five-year cohort had medical aid, whereas 16.6% of those who were not traced had medical aid. This difference is statistically significant (p-value = .007). The subjects in the BTT study are also classified by their racial group. In this article, we consider only white and black participants in the BTT study. Table 2 shows the distribution of the medical aid variable broken down by the race of the participants. For white participants, 83.3% of those in the five-year follow-up cohort had medical aid, whereas 82.5% of those who did not participate in the five-year tests had medical aid. In the black group, the corresponding percentages are 8.9% and 8.7%. This shows that even though overall a smaller percentage of participants had medical aid in the five-year cohort, when the race of the subjects is taken into account, the association is reversed. However, there is no statistically significant difference between the percentages when race is taken into account (p-value = .945 and .891 for whites and blacks, respectively).
Table 1. Number (and Percentage) of Subjects Whose Mothers Had Medical Aid
| Children Not Traced | Five-Year Group | |
| Had Medical Aid | 195 (16.6%) | 46 (11.1%) |
| No Medical Aid | 979 (83.4%) | 370 (88.9%) |
| Total | 1174 (100%) | 416 (100%) |
Table 2. Number (and Percentage) of Subjects Whose Mothers Had Medical Aid by the Race of the Participants
| White | Black | |||
| Children Not Traced | Five-Year Group | Children Not Traced | Five-Year Group | |
| Had Medical Aid | 104 (82.5%) | 10 (83.3%) | 91 (8.7%) | 36 (8.9%) |
| No Medical Aid | 22 (17.5%) | 2 (16.7%) | 957 (91.3%) | 368 (91.1%) |
| Total | 126 (100%) | 12 (100%) | 1048 (100%) | 404 (100%) |
This reversal, or elimination, of association is easily explained. Whites tend to have much more access to medical aid than do black South Africans. In addition, many more blacks were originally included in the Birth to Ten study than whites. Consequently, when the race groups are combined, a relatively small percentage of the subjects have access to medical aid. At the five-year follow-up, very few whites agreed to attend the screening exams (only 8.6% of those with data on the medical aid variable). Possibly the whites felt that they had little to gain from participating in the study, while a larger proportion of blacks (27.8% of those with data on the medical aid variable) continued into the five-year study. The blacks may have valued the medical checkup and screening provided to children in the study as a replacement for (or in addition to) a regular medical checkup.
The file birthtotena.dat contains the category labels and cell frequencies for the three-way table (Table 2). The file birthtotenb.dat lists each case on a separate line with three variables that indicate whether or not the mother had medical aid, whether or not the mother was traced for the five-year interview, and race. The file birthtoten.txt is a documentation file containing a brief description of the datasets.
I thank the Birth to Ten Study and the Chronic Diseases of Lifestyle Programme at the Medical Research Council in Cape Town, South Africa, for the use of these data.
Columns
1 - 5 Aid/NoAid
7 - 15 Traced/NotTraced
17 - 21 White/Black
23 - 25 Cell Count
Values are aligned and delimited by blanks. There are no missing values.
Columns
1 Medical Aid? (0 = No, 1 = Yes)
3 Traced? (0 = No, 1 = Five-Year Group)
5 Race (1 = White, 2 = Black)
Values are aligned and delimited by blanks. There are no missing values.
Christopher H. Morrell
Mathematical Sciences Department
Loyola College
4501 North Charles Street
Baltimore, MD 21210-2699
JSE Homepage | Subscription Information | Current Issue | JSE Archive (1993-1998) | Data Archive | Index | Search JSE | JSE Information Service | Editorial Board | Information for Authors | Contact JSE | ASA Publications