University Students' Ability to Apply Statistical Procedures

Paul L. Gardner and Ingrid Hudson
Monash University

Journal of Statistics Education v.7, n.1 (1999)

Copyright (c) 1999 by Paul L. Gardner and Ingrid Hudson, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.


Appendix 1: Test Items

(1) Item MA: Mathematics Achievement

A researcher investigates whether mathematics achievement in the middle high school years is influenced by whether or not the student is a first-born child, and also whether there is a gender difference. A random sample of 100 14-year-old students is obtained from four secondary schools located in various socio-economic status areas of a large city, and a standard test of mathematics achievement developed by the Australian Council for Educational Research is administered.

Scores on ACER test
First-born Later-born
Male (N=22)
27
42
33
...
(N=34)
48
17
35
...
Female (N=18)
39
43
16
...
(N=26)
24
13
32
...

What statistical test(s) could the researcher use to investigate the effect of birth-order and gender on mathematics achievement?

(2) Item DM: Drink Music

Studies have shown that music can affect mood, emotion, task performance, and cognition. It was hypothesised that the tempo of country-western music played in bars was related to the consumption of alcohol. Observers visited three bars featuring recorded country-western music on three Friday nights. They obtained permission to tape record the music and to make observations of patrons at selected tables. When the music began, the rate of sipping an alcoholic beverage was recorded for each patron. The music tapes were analyzed for the tempo (beats per minute) of each song; the mean number of sips during each song was also recorded.

Tempo Mean number of sips
35 1.150
38 1.150
44 0.400
... ...
112 0.750
118 0.625
(N = 18)

A scatter plot is drawn, and the relationship is found to be linear. What statistical test(s) could be used to determine the strength of the association between tempo and rate of drinking?

From R. E. Kirk (1984), Elementary Statistics (2nd ed.), Monterey, CA: Brooks/Cole Publishing Co., p. 127; based on a paper by P. J. Bach and J. M. Schaefer (1979), The tempo of country music and the rate of drinking in bars. Journal of Studies on Alcohol, 40, 1058-1059.

(3) Item LB: Light Bulbs

The data in the table are said to represent the lifetimes of 300 light bulbs tested to failure.

Light bulb life (hours) Frequency
2050-2100 1
2000-2050 1
1950-2000 2
1900-1950 3
1850-1900 6
1800-1850 8
1750-1800 12
1700-1750 16
1650-1700 21
1600-1650 25
1550-1600 28
1500-1550 29
1450-1500 29
1400-1450 27
1350-1400 24
1300-1350 20
1250-1300 16
1200-1250 12
1150-1200 7
1100-1150 6
1050-1100 3
1000-1050 2
950-1000 2

What statistic(s) could be used to test whether the frequency distribution conforms to a normal distribution?

Source: F. N. David and E. S. Pearson, Elementary Statistical Exercises, Cambridge: Cambridge University Press, cited in R. J. MacG. Dawson (1996), How many light bulbs does it take to generate a data set? The American Statistician, 50(3), 247-249.

(4) Item AB: Aggressive Behaviour

An early-childhood researcher wishes to investigate whether children's observations of aggressive behaviour affect the amount of their own aggressive behaviour. A sample of 21 six-year-old girls was randomly assigned to one of two conditions. The experimental group viewed a TV program containing numerous aggressive acts; the control group viewed a program without aggressive acts. Afterwards, each girl was observed at play, and the number of aggressive acts counted. A few girls displayed unusually large numbers of aggressive acts, so the researcher converted the frequency counts to ranks, with the smallest number of aggressive acts given the ranking of 1 and the largest number, 21.

Ranks of children in experimental group 3, 17, 12.5, ..., 19 (N=10)
Ranks of children in control group 8, 2, 1, 10, ..., 5 (N=11)

What statistical test(s) could be used to investigate whether observing aggressive behaviour affects children's aggressive behaviour?

Based on an example in R. E. Kirk (1984), Elementary Statistics (2nd ed.), Monterey CA: Brooks/Cole Publishing Co., p. 403.

(5) Item HT: Homework Time

A mathematics teacher surveys her students to obtain an estimate of the amount of time spent each week doing mathematics homework, and records her students' scores on an end-of-term mathematics test.

Homework (hrs/week) 8 0.5 6 2 3 1.5 ... 4 2
Test score 95 45 60 55 60 55 ... 85 40
(N = 25)

One student, who reported that he did 2.5 hours of mathematics homework per week, was absent from the test due to illness. What statistical procedure(s) could the teacher use to predict the student's likely test score?

(6) Item CF: Cat Food

A pet-food manufacturer obtains ten pairs of kittens, each pair coming from one litter. In a trial of a new-formula cat-food, one kitten in each pair is fed on a diet of Superkat, the other a diet of Powerpuss. The table shows the gain in weight (in grams) of each kitten after a week.

Kitten pair: A B C D ... J
Superkat diet 12 15 22 17 ... 8
Powerpuss diet 14 17 21 20 ... 11
(N = 10 pairs)

What statistical test(s) are appropriate for investigating whether there is a significant difference between the two types of cat-food?

(7) Item TC: Test Completion

A university lecturer gives an end-of-semester test in which students are allowed as much time as they need to complete it, observes the order in which the 16 students hand in their test papers, and later records the grade awarded to the student.

(HD = High Distinction; D = Distinction; C = Credit; P = Pass; N = Fail)

Student A B C D ... O P
Order of completion 3 5 1 8 ... 14 10
Final grade D HD D C ... N P
(N = 16)

What statistical test(s) could be used to investigate whether there is a relationship between order of completion and final result?

(8) Item GB: Gender Balance

In 1710, Dr. John Arbuthnott, the personal physician to Queen Anne, collected birth registration statistics from the City of London dating back over a period of more than twenty years. He observed a definite tendency for the number of male babies born in any one year to exceed the number of females. Imagine that the data were as follows:

Year Predominant gender
1688 M
1689 F
1690 M
1691 M
1692 F
1693 M
... ...
1709 M

What statistical test(s) could be used to investigate whether the predominance of male births over females in each year was statistically significant?

(Source of background information: L. A. Marascuilo and R. C. Serlin (1988), Statistical Methods for the Social and Behavioral Sciences. New York: W. H. Freeman & Co.) The data are invented.

(9) Item LR: Light Reactions

Twelve students are randomly assigned to three conditions of a perceptual-motor experiment where ambient illumination is the independent variable and subject reaction time is the dependent variable.

Reaction time (seconds)
Bright light Medium light Dim light
1.1, 1.0, 0.8, 1.5 1.4, 1.1, 1.6, 1.9 0.8, 0.7, 1.2, 1.3

How could the hypothesis that ambient light affects reaction time be investigated?

Source: K. C. Clayton (1984), An Introduction to Statistics for Psychology and Education, Columbus, OH: Charles Merrill, p. 197.

(10) Item AA: Arithmetic Achievement

A researcher obtains measures of mental age (X1) from an IQ scale, reading ability (X2) from a standard test of reading comprehension, and arithmetic ability (Y) from a standard test of arithmetic achievement for a sample of 40 upper primary school children.

X1 X2 Y
9.2 61 126
10.0 47 60
8.0 79 117
7.4 40 96
... ... ...
7.0 55 91
(N = 40)

Another child has a mental age of 11.2 and a reading test score of 63. What statistical technique(s) could the researcher use to predict that child's arithmetic score?

Source: Table 45.1 in L. A. Marascuilo and R. C. Serlin (1988), Statistical Methods for the Social and Behavioral Sciences, New York: W.H. Freeman.

(11) Item RA: Reading Ability

Mr. Kemp, the Grade 3 teacher at Vanstone Primary School, is concerned about the literacy levels of his class and administers a standardised reading test for which the population mean for third-graders is 65.0. His class of 24 students obtains a mean score of 59.7 and a standard deviation of 9.3. What statistical test(s) could he use to determine whether his class is significantly different from the population?

(12) Item SH: Smoking Habits

Tunbridge et al. (1977) report data on smoking and survival obtained from a longitudinal study in an English town:

Relationship between smoking habits and 20-year survival in 1314 women

  Smoker Nonsmoker Total
Dead
Alive
139
443
230
502
369
945
Total 582 732 1314

What statistic(s) could be used to test whether there is an association between smoking and survival?

(Note: The data, which are genuine, might seem to suggest that smoking increases one's chances of survival. Simple statistical tests are not always the most appropriate ones. The data listed here take no account of the age of the participants, nor do they allow for the fact that all those who had already died from smoking-related illnesses never made it to the start of the 20-year survey!)

Source: W. M. G. Tunbridge, D. C. Evered, R. Hall, D. R. Appleton, M. Brewis, F. Clark, J. Grimley Evans, E. Young, T. Bird, and P. A. Smith (1977), The spectrum of thyroid disease in a community: The Whickham Survey. Clinical Endocrinology, 7, 481-493, cited in D. R. Appleton, J. M. French, and M. P. J. Vanderpump (1996), Ignoring a covariate: an example of Simpson's Paradox. The American Statistician, 50(4), 340-341.

(13) Item II: Inherited Intelligence

According to one genetic theory, IQ test scores of two brothers ought to show a correlation of 0.50. To test this theory, records of a school district were searched to find the IQ test scores of brothers born within two years of each other; 49 such pairs were found. The correlation between the sets of IQ scores was found to be 0.58.

What statistical test(s) could be used to determine whether the observed correlation differs significantly from the theoretically expected value?

Source: Chapter 24 in L. A. Marascuilo and R. C. Serlin (1988), Statistical Methods for the Social and Behavioral Sciences, New York: W. H. Freeman.

(14) Item MU: Menzies University

The Vice-Chancellor of Menzies University successfully introduces a policy in which all first-year students are required to take a one-semester unit in essay-writing. The V-C asks the university's Education Research Unit to compare the performance of students in various faculties of the university. A random sample of 25 students from each faculty is obtained. The head of ERU has access to individual students' prior performance in Year 12 English, and suggests that the comparison should attempt to control for prior differences in ability in English.

Variable X: Student performance at Year 12 English (0-20 scale)

Variable Y: Student performance at first-year Essay-writing subject (0-100 scale)

Arts Business Engineering Medicine Etc.
X Y X Y X Y X Y ...
19 80 14 62 17 75 18 95 ...
18 90 15 71 14 83 19 87 ...
12 65 12 55 12 53 15 74 ...
... ... ... ... ... ... ... ... ...

What statistical method(s) would allow the ERU head to investigate whether there were differences in performance in essay-writing between faculties, after taking into account prior differences in ability in English as measured by the Year 12 results?

(15) Item AE: Alcohol Effects

Twenty members of a local Rotary Club participate in an experiment on the effects of alcohol on reaction time. They are randomly assigned to two groups. Each person in the Alcohol group is given three cans of rum-and-cola to consume over a period of an hour; members in the No-Alcohol group are given identically-labelled cans containing cola mixed with a non-alcoholic rum-flavoured syrup. Each person's reaction time is then measured in a driver-simulation apparatus.

 Reaction time (seconds)
Alcohol group (N=10) 0.37, 0.42, 0.28,..., 0.45
Non-Alcohol group (N=10) 0.29, 0.32, 0.37,..., 0.22

What statistical test(s) could be used to investigate whether alcohol makes a difference to people's reaction time? (Invented data)

(16) Item DV: Drinking Vodka

A group of 63 young adults is divided, on the basis of interview, into daily users and non-users of alcohol. Each member of these two groups is then further randomly allocated to one of three sub-groups and given 0 or 1.5 or 3 ounces of vodka. A motor performance test is then carried out which requires the participant to keep a beam of light focussed on a randomly moving target during a ten-minute period. Time on target is measured. The data are summarised in the table below, and an analysis of variance procedure yields a significant F ratio.

 NonuserUser
Group 1 2 3 4 5 6
Amount of vodka (ounces) 0 1.5 3 0 1.5 3
Sample size 10 8 9 12 9 15
Mean time (minutes) 8.9 6.4 3.1 9.1 8.8 7.6
Standard deviation 1.6 2.2 2.7 1.4 1.9 2.3

What statistical procedure(s) could be used to investigate whether the difference between the means of any two groups is significant?

Source: Chapter 33 and Table 33.1 in L. A. Marascuilo and R. C. Serlin (1988), Statistical Methods for the Social and Behavioral Sciences, New York: W. H. Freeman.

(17) Item ME: Maze Errors

Twelve students were divided at random into three groups of four people each, and given a complex maze problem which was mounted on a board. They were required to track through the maze with a stylus. An electrical system registered the numbers of errors made. Each participant was given five trials. One group of students was told that maze-solving ability was related to intelligence, a second group that average college students made 20 errors, and the third group that they should make as few errors as possible.

  Student Trial 1 Trial 2 Trial 3 Trial 4 Trial 5
Intelligence group 11 40 39 33 33 20
  12 40 33 ... ... ...
  ...          
  14          
Average 20 group 21          
  ...          
  24          
Few-errors group 31          
  ...          
  34 ... ... 21 23 21

What statistical procedure(s) should be used to investigate whether the number of errors is related to the nature of the information given to each group?

Source: Table 42.6 in L. A. Marascuilo and R. C. Serlin (1988), Statistical Methods for the Social and Behavioral Sciences, New York: W. H. Freeman.

(18) Item SE: Self-Esteem

A psychological researcher is reviewing the literature on the effects of various therapeutic treatments on the self-esteem of troubled young men who have been identified as displaying suicidal tendencies. Treatment A yielded a mean self-esteem score of 54.5, in comparison with a control group that did not receive treatment and scored a mean of 51.3, standard deviation 7.5. In treatment B, the researchers used a completely different measure of self-esteem, and obtained a mean score of 22.4 for the experimental group and 18.1 for the control (standard deviation, 5.2). What method(s) could the literature reviewer use to compare the relative effectiveness of the two treatments?

(19) Item GI: Gross Income

A social scientist obtains a sample of males aged in their thirties, and gathers data on three variables:

X: socio-economic status of their parents (measured on a 7-point scale)

Y: years of formal education (ranging from 10 to 19 years)

Z: current annual gross income in dollars

What statistic(s) could be used to measure the strength of the relationship between income and the combined effect of the other two variables?

Problem suggested by a discussion in D. Freedman, R. Pisani, and R. Purves (1978), Statistics, New York: W. W. Norton, p. 197.

(20) Item YM: Young Mothers

A psychologist studies a sample of young mothers and obtains rating-scale and observational measures on five variables: dysphoria (a measure of depression), emotional closeness (to her baby), duration (time spent interacting with the baby), marital conflict, and husband's psychiatric history. The psychologist wishes to investigate whether there are any latent variables (i.e., patterns of relationships) underlying these five variables. What method(s) could be employed to search for such patterns?

Problem suggested by a discussion in B. Everitt and D. Hay (1992). Talking about statistics. A Psychologist's Guide to Design and Analysis, London: Edward Arnold, Ch. 9.

(21) Item RC: Reading Comprehension

A teacher of English administers a reading comprehension test with a range of possible scores of 0-50 to her class. The mean score of the class is 29.5 and the standard deviation is 7.5. She also gives her students an essay to write and marks it on a 0-100 scale. The mean score of the class on the essay is 73 and the standard deviation 12. One of her students, Mary, scores 43 on the reading comprehension test and 86 on the essay. What statistical procedure(s) could the teacher use to decide whether Mary was relatively better at reading comprehension or essay-writing?

Return to Gardner and Hudson Paper


JSE Homepage | Subscription Information | Current Issue | JSE Archive (1993-1998) | Data Archive | Index | Search JSE | JSE Information Service | Editorial Board | Information for Authors | Contact JSE | ASA Publications