Journal of Statistics Education v.3, n.2 (1995)
Copyright (c) 1995 by Mary Rouncefield, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.
Key Words: Boxplot; Scatterplot; Life expectancy; Population growth.
This paper describes a case study based on data taken from the U.N.E.S.C.O. 1990 Demographic Year Book and The Annual Register 1992 giving birth rates, death rates, life expectancies, and Gross National Products for 97 countries. Suggested activities include exploratory graphical analyses to answer several central questions. These include an investigation into the wealth and life expectancies of different country groups and their population growth. Inequalities in the life experiences of different groups become readily apparent. Students are stimulated to generate their own questions and to find possible solutions.
1 This case study was originally used with first year undergraduates in a variety of degree programmes taking an introductory course in statistics. They were all non-mathematicians (and in many cases non-mathematical also!). This is one of the very first activities used in the course. In addition to providing a rationale for acquiring basic skills in statistical thinking, there are two other main objectives.
2 The case study is based on data collected from the U.N.E.S.C.O. 1990 Demographic Year Book and The Annual Register 1992 (data for 1990). This article describes how the data can be used to provide questions and exercises for first-year undergraduate students. Indeed, the case study itself can form the first half-semester's course as so much data is provided, and so many real questions and issues can be raised.
3 The data consist of the following variables for 97 countries.
4 The students' work is organised around several central questions. This list is not exhaustive and students are expected to generate at least one question of their own (and to try to answer it).
5 In previous courses MINITAB has been used to analyse the data, but another package such as SPSS (Windows) would be equally suitable. In answering the questions, the following statistical techniques may be used.
6 The data can be split into country groups (using the BY subcommand in MINITAB). A clear representation of the data is obtained when a boxplot for Gross National Product is drawn for each country group. Outliers are clearly shown but will need to be identified by the students.
7 Further questions will naturally arise: Why are Singapore and Hong Kong outliers in Asia? Why is Libya an outlier in Africa? How do these richer countries compare to countries in Western Europe or the Middle East?
8 One of the main aims of this kind of analysis is that students will generate their own questions. They do not need external motivation to think about and interpret the graphs and statistical results. Histograms can be compared to the boxplots and their relative merits discussed.
9 Suggested analyses and graphs are summarised for the other questions; as before, students are encouraged to discuss what they have found and to ask further questions.
10 The countries can be split into groups and a boxplot of life expectancy drawn for each country group. Do any outliers appear within the groups? Which countries are these?
11 Life expectancies for females and males generate slightly different sets of graphs. In which ways are they similar? In which ways are they different? Why are some countries outliers?
12 If a scatterplot of male versus female life expectancies is drawn, there is a strong positive association. The answer to the question would appear to be "Yes, they do," but is this the best graph to draw? The computer package can be used to calculate the difference between female and male life expectancy for each country (females have higher life expectancies in most countries). This new variable of "differences" can be analysed in various ways:
13 Side-by-side boxplots provide an excellent visual display of the data on male and female life expectancies.
14 A scatterplot of birth rate versus death rate will quickly show that the variables are related. Further scatterplots will show in addition that birth rate and death rate are each related to GNP (as is infant death rate).
15 This analysis, however, is fairly simplistic, and some groups of students may be inspired to investigate the age structure of populations in different countries. A country with a very young population will tend to show a lower death rate and higher birth rate than a country with an aging population.
16 This question can be investigated by calculating birth rate minus death rate. Most students realise that this new variable gives some measure of the change in population. Results range from -1.8 (a decreasing population) to 37.8 (an increasing population). The mean is around 18.4, but what does it signify? (Note that these values are "per thousand" so 20 per thousand is 2%.)
17 What happens to a population with an annual growth rate of 2%? How long will it take to double in size? This can be investigated using a spreadsheet, and different growth rates can be compared.
18 Often people do not understand rate of inflation (with regard to prices). This is a rate of inflation with regard to people! What we are looking at is the gradient of the gradient.
19 Using this dataset, students can ask real questions about real-life situations. These in turn raise ethical and moral questions, which motivate students' learning, making the subject matter more relevant and interesting. Just as the teacher of history or literature would not avoid moral issues in her lessons, the statistics teacher likewise should not avoid them.
20 Exploratory data analysis and graphical techniques are easily accessible to students, even with limited mathematical skills. These techniques make interpretation and inference possible without the use of formal hypothesis tests. Students can interpret their results with reference to a real situation and gain experience with handling a large dataset.
21 My students have enjoyed using this dataset and I hope yours will too.
22 The file poverty.dat.txt contains the raw data. The file poverty.txt is a documentation file containing a brief description of the dataset.
Values are aligned and delimited by blanks.
Missing values are denoted with *.
Day, A. (ed.) (1992), The Annual Register 1992, 234, London: Longmans.
U.N.E.S.C.O. 1990 Demographic Year Book (1990), New York: United Nations.