Representativeness in Statistical Reasoning: Identifying and Assessing Misconceptions

Linda S. Hirsch and Angela M. O'Donnell
Rutgers, The State University of New Jersey

Journal of Statistics Education Volume 9, Number 2 (2001)

Copyright © 2001 by Linda S. Hirsch and Angela M. O'Donnell, all rights reserved.
This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.


Key Words: Cognitive conflict; Group learning; Instructional intervention.

Abstract

The purpose of the study was to develop a valid and reliable test instrument to identify students who hold misconceptions about probability. A total of 263 students completed a multiple-choice test that used a two-part format rather than the typical one-part format. Results of the study showed that even students with formal instruction in statistics continue to demonstrate misconceptions. The test instrument developed in this study provides instructors with (1) a valid and reliable method of identifying students who hold common misconceptions about probability, and (2) diagnostic information concerning students' errors not frequently available through other formats. The test instrument was further evaluated in an instructional intervention study.

1. Introduction

"Probability is the study of likelihood and uncertainty. It plays a critical role in all of the professions and in most everyday decisions" (Halpern 1996, p. 242). Being able to reason effectively about probability is necessary for many practical concerns such as interpreting weather reports, understanding DNA evidence at trials, the risks of childbirth defects, and car insurance rates among others (Abelson 1995; Derry, Levin, Osana, and Jones 1998). Konold (1995) noted that everyday reasoning often relies on reasoning about probabilities, and people use a variety of heuristics to judge probability. Deficiencies in statistical knowledge and probabilistic reasoning are reported in fields such as medicine (Klatsky, Geiwitz, and Fisher 1994) and can have serious consequences for the diagnosis and treatment of disease. However, formal training in statistics has been shown to improve individuals' ability to reason about everyday problems (Derry et al. 1998; Lehman, Lempert, and Nisbett 1988; Nisbett, Fong, Lehman, and Cheng 1987).

The importance of being able to reason effectively about probability and statistics was recognized by the National Council of Teachers of Mathematics (1991) in the standards for mathematics instruction that recommended that students be familiar with statistical tools such as collecting, organizing, and presenting data and be capable of reasoning about probability and drawing inferences. Similar recommendations were made in the Mathematics Objectives 1990 Assessment (Educational Testing Service 1990). According to these reports, the ability to interpret numerical information and to reason mathematically should be regarded as basic skills. Unfortunately, current secondary school curricula are only beginning to incorporate statistical skills, and, as a consequence, most students enter college with very little formal experience with the laws of probability and probabilistic reasoning (Derry et al. 1998).

Fallacies in reasoning can occur because of violations in the application of laws of probability. Examples of such errors include stereotyping, confirmation bias, and matching bias. Many of these errors occur because of misconceptions about probability. Students often do not understand the laws of probability and form misconceptions through informal experiences outside the classroom (Garfield and Ahlgren 1988; Konold, Pollatsek, Well, Hendrickson, and Lipson 1990). Students may develop their own way of reasoning about uncertain events (Kahneman and Tversky 1972, 1973; Konold 1991; Shaughnessy 1977, 1981; Tversky and Kahneman 1971). Their lack of understanding may be due to a lack of experience with the mathematical laws of probability or because they use heuristics (Kahneman, Slovic, and Tversky 1982; Kahneman and Tversky 1972; Tversky and Kahneman 1974). The use of heuristics to estimate complicated probabilities often results in accurate or very reasonable estimates, but heuristics may also be misleading, causing misconceptions in how people think about probability. Even experts in probability unconsciously use heuristics in some situations (Tversky and Kahneman 1974). Although formal training in statistics is associated with improved reasoning (Nisbett et al. 1987), many students who receive formal instruction continue to have misconceptions about the nature of probability and probabilistic reasoning (Kahneman and Tversky 1972, 1973; Konold 1989, 1991; Shaughnessy 1977, 1981; Tversky and Kahneman 1971).

Misconceptions of probability are particularly resistant to elimination during typical classroom instruction as they appear to be of a psychological nature and are strongly held (Konold 1989, 1991, 1995; Shaughnessy 1977, 1981). Students are often able to assimilate new information they learn in the classroom into their existing beliefs and misconceptions, or they alter new information so that it is consistent with their current understanding, and, as a result, students continue to hold misconceptions (Konold 1995). They can often appear to have modified their beliefs and provide correct answers but subsequently perform as they did before (De Lisi and Golbeck 1999). Many features of an instructional environment such as power differentials between instructors and students can result in students' "going along" with the explanations in class, but the instruction may not produce any conceptual change.

In his reflections on teaching probability and statistics, Shaughnessy (1992) stressed the need to (a) know more about how students think about probability, (b) identify effective methods of instruction, and (c) develop consistent, reliable methods of assessment that more accurately reflect students' conceptual understanding. The majority of research on teaching probability and statistics has focused on how students think about probability (Konold 1989, 1991; Konold et al. 1990; Lipson 1990; Pollatsek, Lima, and Well 1981) with relatively little attention to instructional methods or assessments. Without consistent and reliable methods for more accurately assessing students' conceptual understanding of probability, it is difficult it evaluate instructional methods intended to eliminate misconceptions. Results of two studies found that the number of students who understood the concept of independence was much lower than that indicated in the National Assessment of Educational Progress (Konold, Pollatsek, Well, Lohmeier, and Lipson 1993).

The study reported here was conducted to address the need identified by Shaughnessy (1992) for consistent, reliable methods of assessment that more accurately reflect students' conceptual understanding. We selected the misconception of representativeness in learning about probability as the target content. Representativeness is a heuristic used to estimate the probability of uncertain events by relying on the degree to which a sample or event reflects the population of such events; it includes a misinterpretation of the law of large numbers and insensitivity to prior events or sample size. For example, most people think that when tossing a coin, a sequence of six tails (T T T T T T) is less probable than a mixed sequence such as T H H T H T because a sequence of all tails is not representative of the distribution of events. An apparently random sequence of heads and tails (e.g., T H H T H T) is often considered representative because this sequence looks more like the theoretical distribution which comprises 50% tails and 50% heads. The misconception of representativeness was selected because it has been consistently identified as an obstacle to students' conceptual understanding of probability and probabilistic reasoning (Kahneman and Tversky 1972; Shaughnessy 1977, 1992). Researchers have already begun developing test materials to identify the existence of this misconception (Konold 1989, 1990) but little work been done to establish the reliability and validity of the test instruments. The purpose of this study was to develop a valid and reliable test instrument to identify students who hold misconceptions of representativeness.

2. Procedure

2.1 Participants

Participants were graduate students (n = 61) and undergraduate students (n = 202) enrolled in statistics and educational psychology classes from two colleges. Almost 13% of the students were from one college and the remaining students were from the other. Efforts were made to include students from a wide range of courses and majors to ensure the inclusion of students (graduate and undergraduate) who varied in their level of experience with probability and in the degree to which they held misconceptions of representativeness. Descriptive statistics on the number of students included from various majors and classes of majors are included in Table 1. Psychology was the most frequently occurring major (n = 69 or 26% of participants).


Table 1. Distribution of Participants' Majors

Major n Percent
Psychology 69 26.2
Social Sciences 48 18.3
Natural/Physical Sciences 34 12.9
Engineering 32 12.2
Education 27 10.3
Statistics 20 7.6
Food Science/Nutrition 12 4.6
Pharmacy/Health 11 4.2
Non-Matriculated/Other 10 3.8
Total 263 100%


2.2 Materials

Participants were asked to indicate their current major, class status, and the number of graduate and/or undergraduate level statistics courses they had taken. Students took a 16-item test developed by the first author to identify misconceptions of representativeness. The first author had permission to construct items that resemble those designed by Konold (1990). Konold's items were designed to take advantage of the efficiency of multiple-choice formats while preserving the quality of information inherent in students' explanations. In the first part of an item, students chose the correct answer to a problem stem from among five options. In the second part, students justified their answer in part one by selecting from a number of explanations.

In this study, two forms of the test were used. Both forms of the test contained the same items, but the ordering of items was different on each form of the test. This was done to ensure that there were no ordering effects. Two items were open-ended questions, included to determine if students had the basic knowledge of probability required to answer the more difficult multiple-choice questions. The open-ended questions required students to calculate the probability of a simple event from a distribution of equally likely events. Responses to these two questions were not included in the scoring of the test.

The remaining fourteen two-part items were presented in a format that included multiple-choice and justification components. The first part of each of the 14 items asked students for an assessment of probability. Students were presented with several possible outcomes and were asked "Which event is most likely?" or "Which event is least likely?" For example:

If a fair coin is tossed six times, which of the following ordered sequences of heads (H) and tails (T), if any, is LEAST LIKELY to occur?

  1. H T H T H T
  2. T T H H T H
  3. H H H H T T
  4. H T H T H H
  5. All sequences are equally likely.

A student who identified (c) H H H H T T as being "least likely" is thought to hold a misconception of representativeness because choosing (c) would likely indicate a belief that the result of repeatedly tossing a coin must be a random mixture of heads and tails. If students calculated the probability of each event correctly, they would have found that option (e) is correct. ("All sequences are equally likely.")

The second part of each item asked students to identify a specific reason (justification) for their answer to the first item in the pair. Common explanations for misconceptions of representativeness, identified in previous research and through clinical interviews with students, were used to construct the second part of each item. The item in the previous example was followed on the test by:

Which of the following best describes the reason for your answer to the preceding question?

  1. Since tossing a coin is random, you should not get a long string of head or tails.
  2. Every sequence of six tosses has exactly the same probability of occurring.
  3. There ought to be roughly the same number of tails as heads.
  4. Since tossing a coin is random, the coin should not alternate between heads and tails.
  5. Other _____________________________________

Items on the test varied in the event used (coin tossing or rolling of a die) and the use of most likely or least likely. In addition, the length of the sequences of outcomes from the coin tosses or dice rolls varied from as few as four to as many as 12. See the Appendix for a complete list of all questions. These variables -- length of outcome sequence, judgment of most or least likely, and task differences (six possible outcomes from the roll of a die versus two possible outcomes from a coin toss) -- are known to influence judgments of probability.

3. Results

3.1 Scoring of Correct Responses and Misconceptions

The justifications used in the second part of each item were designed specifically to confirm whether or not students had misconceptions. Correct answers to the first part of each item need to be justified with the appropriate reason. Justifications (provided in the second part of each item) that indicate a misconception of representativeness, whether the answer to the first part is correct or incorrect, differentiate students who have misconceptions from those who do not have misconceptions. Students who do not have misconceptions include those who answer correctly and those who answer incorrectly but do not indicate a misconception as the source of their error. For example, a student with a misconception of representativeness who identifies (c) H H H H T T in the first part of the question as being "least likely," might also identify (b) "Since tossing a coin is random you should not get a long string of heads or tails" in the second part of the question as the reason for his or her answer. To demonstrate understanding of the underlying probability concept, respondents who correctly identify (e) "All sequences are equally likely" as the correct answer to the first part of the question, should also select reason (c) "Every sequence of six tosses has exactly the same probability of occurring" as the answer to the second part of the question.

Each of the fourteen multiple-choice items was scored based on the combined responses to both parts of the question. If the responses to both parts of the questions were correct (i.e., the justification was consistent with the probability assessment), the item was scored as correct. If the response to either part of the question indicated a misconception of representativeness, then the item was scored as a misconception. If the response to either part of the question was not correct, but did not indicate a misconception of representativeness, the item was scored as incorrect.

Each student was assigned two scores on the test. The first score indicated only the number of correct responses. Each item scored as correct was given one point. All other items were given zero (possible range: 0 to 14). The distribution of correct responses is summarized in Table 2. Participants were considered to have passed the test if they answered at least 80% of the items correctly. Therefore the cut-off score for passing was 11 of 14, or 80%, a typical score for mastery-oriented instruction.


Table 2. Distribution of Correct Responses

# Correct Responses n Percent Cumulative Percent
0 2 0.8 0.8
1 5 1.9 2.7
2 10 3.8 6.5
3 13 4.9 11.4
4 8 3.0 14.4
5 11 4.2 18.6
6 17 6.5 25.1
7 16 6.1 31.2
8 11 4.2 35.4
9 20 7.6 43.0
10 22 8.4 51.0
11 19 7.2 58.2
12 23 8.7 66.9
13 33 12.4 79.3
14 53 20.2 99.5


An average of 87% of the students answered the various parts of the open-ended questions correctly, thus showing evidence of rudimentary knowledge of probability.

The second score indicated only the number of misconception responses. Each item scored as a misconception was given one point. All other items were given zero (possible range: 0 to 14). The distribution of misconception responses is summarized in Table 3. A panel of experts reviewed the test and scoring protocol following guidelines for performance standard-setting suggested by Hambleton (1996) and agreed that participants who provided answers that indicate misconceptions of representativeness to at least two pairs of items were considered to have misconceptions of representativeness. This practice of determining performance standards in this manner is common. If participants did not indicate misconceptions on at least two items, they were not considered to have misconceptions of representativeness.


Table 3. Distribution of Misconception Responses

# Misconception Responses n Percent Cumulative Percent
0 93 35.4 35.4
1 28 10.6 46.0
2 21 8.0 54.0
3 24 9.1 63.1
4 14 5.3 68.4
5 22 8.4 76.8
6 21 8.0 84.8
7 13 4.9 89.7
8 9 3.4 93.1
9 3 1.1 94.2
10 10 3.8 98.0
11 2 0.8 98.8
12 1 0.4 99.2
13 1 0.4 99.6
14 1 0.4 100.0


Two forms of the test were used. One hundred and twenty-one participants completed Form A, and 142 participants completed Form B. No significant differences were found between the two forms of the test. Of the participants who took Form A of the test, 55% were identified as having misconceptions of representativeness. 52% of those who took Form B were identified as having misconceptions.

3.2 Item Characteristics

Item difficulty was measured by the proportion of correct responses. Items ranged in difficulty from .44 to .81 for the multiple-choice questions and .78 to .97 for the open-ended questions. Only the multiple-choice items were scored as indicating a misconception. The percentage of responses that indicated misconceptions ranged from five to 41. See Table 4 for a detailed summary of item difficulty and misconception responses.


Table 4. Item Characteristics of Test Questions

Question # Type of Item Difficulty:
Proportion Correct
Misconception Responses (%)
1 a open-ended .97  
1 b open-ended .87  
1 c open-ended .86  
2 a open-ended .94  
2 b open-ended .78  
2 c open-ended .79  
3 multiple-choice .79 5
4 multiple-choice .81 8
5 multiple-choice .81 10
6 multiple-choice .81 10
7 multiple-choice .44 29
8 multiple-choice .48 31
9 multiple-choice .68 26
10 multiple-choice .61 33
11 multiple-choice .61 35
12 multiple-choice .57 41
13 multiple-choice .75 21
14 multiple-choice .73 20
15 multiple-choice .68 30
16 multiple-choice .76 13


3.3 Consistency of Responses to Similar Items

It was anticipated that participants would respond in similar ways to similar types of items. Pairs of items that described different events with sequences of the same length, pairs of items that described the same event with sequences of different lengths, and pairs of items that differed only in the use of most or least likely were examined to confirm whether responses to both pairs of items were consistent. The percent agreement between responses to pairs of similar items was calculated to measure the consistency with which participants provided the same type of answer to both items. The average percent agreement exceeded 80%.

3.4 Reliability

Measures of validity and reliability provide evidence that the psychometric properties of the test are consistent with those of a valid, reliable test. The fourteen pairs of multiple-choice items intended to identify students who held misconceptions of representativeness were scored as "1" (indicates a belief in representativeness) or "0" (does not indicate a belief in representativeness). The agreement coefficient (Subkoviak 1988), a measure of the consistency of classification, estimated the reliability to be .84.

3.5 Validity

A known-groups validation approach was used to determine whether the test is able to distinguish between students who have a lot of experience with probability and students who have no experience with probability. Formal training in statistics was expected to reduce the incidence of misconceptions of representativeness. The underlying assumption was that students who had formal classroom experience with probability and statistics would have fewer misconceptions of representativeness than students with little or no formal training in probability and statistics. More expertise (as evident by more classroom experience) would be anticipated to reduce errors. The degree to which the test can distinguish between students with formal experience and students with no experience provides evidence for the validity of the test. Based on the number of graduate and or undergraduate statistics classes they had taken, participants were classified as having extensive formal experience with probability (n = 38 with three or more courses), considerable formal experience with probability (n = 34 with two courses), some formal experience (n = 150 with one course) or no formal experience with probability (n = 41 had not yet taken a course).

The four levels of experience and whether participants passed (or failed) the test were analyzed using a 4 × 2 chi-square test of independence to determine whether performance on the test appears to be dependent on experience with probability and statistics. Results indicated that participants with more formal experience with probability and statistics were more likely to pass the test than students with less formal experience ($\chi^{2}$ = 14.4, df = 3, p < .01) (Table 5). The four levels of experience and the presence or absence of misconceptions were also analyzed using a 4 × 2 chi-square test to determine whether misconceptions of representativeness appear to be independent of experience with probability and statistics. Results indicate that participants with more formal experience with probability and statistics are less likely to hold misconceptions of representativeness ($\chi^{2}$ = 13.4, df = 3, p < .01) (Table 6).


Table 5. Test Performance (Pass/Fail) and Formal Experience with Probability and Statistics

  Test Performance
Number of Statistics Courses Passed (%) Failed (%) Total
0 13 (32%) 28 (68%) 41
1 68 (45%) 82 (55%) 150
2 20 (59%) 14 (41%) 34
3 or more 27 (71%) 11 (29%) 38
Total 135 128 263


Table 6. Misconceptions of Representativeness and Formal Experience with Probability and Statistics

  Misconceptions
Number of Statistics Courses Yes (%) No (%) Total
0 28 (68%) 13 (32%) 41
1 87 (58%) 63 (42%) 150
2 15 (44%) 19 (56%) 34
3 or more 12 (32%) 28 (68%) 38
Total 135 128 263


The relationship between class status and the number of statistics classes taken was examined to determine if the graduate students were more likely than the undergraduate students to have taken more than one statistics class. With the exception of the undergraduate statistics majors, who had taken three or more statistics courses, it appeared that the majority of the students who had taken two or more statistics classes were graduate students. To eliminate the possibility that test performance (pass/fail) or the presence of misconceptions was related to class status (i.e., the fact that graduate students may have developed better reasoning skills), rather than to experience with statistics, the chi-square tests presented in the previous paragraph were performed separately for graduate and undergraduate students, and the relationships found were approximately the same. Further, two additional 2 × 2 chi-square tests found class status (graduate vs. undergraduate) to be independent of whether students appear to hold misconceptions ($\chi^{2}$ = 3.5, df = l, p > .05) and whether students passed the test ($\chi^{2}$ = 3.02, df = l, p > .05). While we can rule out class status as a factor influencing the difference observed between those with more formal and less formal experience with statistics, the number of statistics courses taken may be a proxy for interest or ability.

4. Instructional Interventions

Two versions of the instrument were created for use as a pre-test and a post-test in an intervention study. Instructional interventions that have successfully eradicated misconceptions have all been characterized by what Novak (1977) calls "cognitive dissonance" or cognitive conflict. Unless classroom instruction is designed to address students' conceptual understanding and the potential misconceptions held by students, most students continue to hold misconceptions after instruction (Konold 1995). Three possible methods of creating and resolving cognitive conflict were examined: direct instruction, individual activities, and small group activities in which students had varying degrees of misconceptions about probability. In each of these instructional treatments, attention was drawn to discrepancies between expected events and actual events. The treatments differed in the "visibility" of the discrepancy (being told about it, recognizing it oneself, participating in a group directed to attend to the discrepancy) and whether the recognition of conflict was teacher led or student led. The effectiveness of the three different approaches was compared to a control group that received instruction that was not specifically designed to create cognitive conflict and conflict resolution.

4.1 Method

4.1.1 Participants

Participants were 103 undergraduate students in educational psychology classes at a large state university in the northeast United States who were judged to have misconceptions of representativeness on the basis of a pretest. A small number of students who did not appear to hold misconceptions of representativeness were needed for the small group activity-based instructional intervention. Data from these students were not included in analyses.

4.1.2 Test of Representativeness

Two different, but equivalent, forms of the test described earlier were used. One was used as a pre-instruction test to identify students who held misconceptions of representativeness, and the second was used as a post-instruction test to evaluate the effectiveness of the various instructional interventions.

4.1.3 Activities

Two of the four instructional interventions included activities designed to confront students with misconceptions of representativeness. The first activity required students to repeatedly draw colored marbles at random from a small bag. Prior to each selection, participants were asked to calculate the probability of one of the possible outcomes and record their "prediction" on paper. After observing the actual outcome, participants were asked to think about whether the results were consistent with their predictions. The second activity simulated a game of chance similar to the New Jersey Pick 6 Lottery. Participants were asked to pretend to buy lottery tickets, after which they held a mock drawing to find a possible winner. Prior to the mock drawing, participants were asked to calculate the probability that their ticket would be a winner and the probability that the teacher's ticket would be a winner. Most students picked a random set of six numbers and the teacher picked six numbers in sequence. After the drawing, participants were asked to calculate the probability of the winning ticket and discuss the results.

4.1.4 Procedure

Initial screening of participants took place in large group sessions. Participants identified as having misconceptions were subsequently invited to participate in two additional sessions, scheduled one week apart. When participants were invited to participate in the additional sessions, they were not aware that the experiment they were being asked to participate in was related to the test they had taken previously. Subsequent intervention sessions were scheduled for small groups of students. Time slots for sessions were randomly assigned to treatments with the condition that one session of each treatment must be run prior to the scheduling of an additional session for that treatment.

In the first session, participants in all four instructional interventions received the same initial instruction, a lecture on the laws of probability. The lecture was presented via videotape to ensure the equivalence of the instruction received. The instructor featured on the videotape is an instructor of statistics and has won awards in recognition of his teaching. The videotaped lecture presented an introduction to the laws of probability, covered definitions of terms necessary to understanding probability, counting techniques, sample space, the classical and frequency definitions of probability, how probabilities are calculated and interpreted, the concept of equally likely events, independence, and the effects of sample size on the probability of sequences of events.

After the videotaped lecture, participants received an additional 25 minutes of instruction related to the material in the lecture that was appropriate to their instructional condition. All participants received the additional instruction from the same teacher, but the teacher was not the instructor in the videotaped lecture. During the second session (one week later), participants received an additional 45 minutes of instruction from the same teacher and then took the post-test. The teacher followed a script for each of the additional instruction sessions. The topics covered and the questions asked of participants in the additional sessions were the same for all four instructional interventions, and only the mode of presentation differed (i.e., direct instruction vs. group activities).

The first instructional intervention (Control) served as a control group. The videotaped lecture was followed by a brief period of direct instruction and a question and answer period during which participants worked individually to solve simple probability problems related to a teacher demonstration. The teacher provided the correct answers to questions and the probability problems. No attempt was made to confront participants with their misconceptions or help the participants resolve any conflict that may have arisen between their prior knowledge about probability and the correct concepts of probability.

The other three instructional interventions were designed to confront participants with their misconceptions related to representativeness and help them resolve the conflict between their misconceptions and the correct concepts of probability. In the second instructional intervention (Predictions), the videotaped lecture was followed by the same brief instruction, demonstration, and question and answer period. During this period, participants were required to make predictions about the outcome of the same problems presented in the first instructional intervention. Participants worked individually to solve the probability problems and make predictions about the possible outcomes and recorded their predictions on their own answer sheets. However, the teacher did not just provide answers to the questions, but also discussed misconceptions of representativeness. The teacher contrasted misconceptions of representativeness with the ideas of independence and equally likely events by talking about the results of the problem. Participants were instructed to compare their predictions to the correct answers and to think about why they may have answered incorrectly, and they were then encouraged to think about why the teacher's answer was correct.

The third intervention (Individual Activity) was the same as the second intervention with the exception that, instead of watching a teacher demonstration, participants were given the appropriate materials to conduct the demonstration themselves. The teacher questioned participants and asked them to make predictions about the results before conducting their own demonstration (activity). After participants completed the activity, the teacher provided the correct answers to the questions and discussed misconceptions of representativeness as in the Prediction intervention.

The fourth intervention (Group Activity) was the same as the previous one except that instead of conducting individual activities to simulate the probability problem, participants worked in groups of four or five. Groups comprised three or four members with misconceptions of representativeness and one member without misconceptions of representativeness. Participants in the Group Activity intervention who held misconceptions of representativeness were assigned to groups of three or four by the researcher prior to instruction, and then a participant without misconceptions was randomly assigned to each group. This was done in order to help create cognitive conflict within each of the groups during the post-lecture instructional activities used in the Group Activity intervention. Based on the initial screening, participants were identified as having or not having misconceptions. They were not aware of the results of their tests until after completing the intervention. Therefore, the students in each group without misconceptions were not aware that they were planted in the groups or that their prior knowledge of the material was different than that of the other participants. Test results obtained from the participants in the fourth intervention with the small-group activities who did not hold misconceptions of representativeness prior to instruction were not included in the final data analysis.

4.2 Results

4.2.1 Effects of Instructional Interventions

Chi-square tests of independence were conducted to determine whether the presence or absence of misconceptions on the post-test was related to (a) the type of instructional intervention, (b) the creation and resolution of cognitive conflict, (c) the use of activities in creating and resolving cognitive conflict, and (d) the use of individual or group activities in creating and resolving cognitive conflict. A 4 × 2 chi-square test of independence found no relationship between the instructional interventions and the presence or absence of misconceptions related to representativeness following instruction ($\chi^{2}$ = 3.94, df = 3, p > .05). This was also true when treatments designed to create cognitive conflict were contrasted with the control intervention ($\chi^{2}$ = 1.70, df = l, p > .05).

A determination of whether the presence or absence of misconceptions was related to the use of activities in instructional efforts to create cognitive conflict and conflict resolution was conducted by comparing the Prediction intervention to a combination of the Individual Activity and Group Activity treatments. The use of activities in instructional efforts to create cognitive conflict and conflict resolution did not appear to be more effective in eradicating students' misconceptions of representativeness than a non-activity-based instructional intervention intended to create cognitive conflict and conflict resolution ($\chi^{2}$ = 0.05, df = l, p > .05). To determine whether the presence or absence of misconceptions of representativeness is related to the type of activity (individual or group) used in instructional efforts to create cognitive conflict and conflict resolution, the Individual Activity intervention was compared to the Group Activity. While results of the chi-square test approached conventional levels of statistical significance, there is insufficient evidence to indicate a difference between the effectiveness of the use of individual and group activities to eradicate students' misconceptions related to representativeness ($\chi^{2}$ = 3.07, df = l, p = .08).

4.2.2 Follow-up Testing

A small subset of participants in Study II (n = 27) agreed to take the post-test again several weeks after instruction. Another six subjects who were pre-tested and identified as having misconceptions of representativeness but who did not participate in any of the instructional interventions also took the follow-up post-test, for a total of 33. The six subjects who did not participate in an instructional intervention all provided answers that indicated misconceptions of representativeness on the follow-up post-test, indicating that the initial assessment of their misconceptions was reliable. Of the twenty-seven participants who took the post-test again after receiving one of the four instructional interventions, only three gave answers that indicated misconceptions of representativeness. All three of these participants received the control intervention that was not intended to create cognitive conflict and conflict resolution. Results of a 2 × 2 chi-square test indicated that the presence or absence of misconceptions of representativeness on the follow-up post-test was related to participation in one of the four instructional interventions ($\chi^{2}$ = 19.56, df = l, p < .001). Although the sample size of returning students is very small, it is possible that there may be long-term benefits of cognitive conflict and conflict resolution in eradicating misconceptions of representativeness that are not evident immediately following instruction.

5. Discussion

The problem of over-estimating students' understanding of probability using available test instruments has created an urgent need for consistent, reliable tests that will more adequately assess students' understanding of probability and probabilistic reasoning (Shaughnessy 1992; Konold 1991). A unique set of test questions to identify students who hold common misconceptions of representativeness was developed. The reliability and validity of the test questions were assessed in this study.

Most multiple-choice test items contain only one part, similar to the first part of each pair of items on the test instruments used here. The typical one-part format may ask the test-taker for an assessment of the probability of a particular event, but does not necessarily attempt to determine how the test-taker arrived at the answer. Tests of this nature have been found to over-estimate students' conceptual understanding of probability. Alternative test formats that will provide a more complete evaluation of students' conceptual understanding of probability are needed. The two-part multiple-choice format used in this study is one such alternative.

If the participants in this study had been graded based on their answers to only the first part of each pair of multiple-choice test items, their average score would have been 80%. Based on their responses to both parts of each pair of items, their average score was 61%. Apparently, some participants were able to provide correct answers to many of the questions without really understanding the reason for their answers.

By using both parts of each item to score the test in this study, the problem of over-estimating students' knowledge of probability associated with one-part, multiple-choice items was reduced. In many instances, participants who answered the first part of a question correctly provided a reason for their answer in the second part of the question that indicated a belief in representativeness. Because the two-part format was used, these participants were identified as having misconceptions of representativeness rather than identified as knowing the correct answers. Most participants who did not answer the first part of the question correctly provided a reason for their answer that indicated either a belief in representativeness, or some other kind of error. Using the two-part format, it was possible to distinguish between the participants who had misconceptions of representativeness and the participants who made other kinds of errors. This type of validity check and diagnostic information is not usually available using the typical one-part, multiple-choice format used on most achievement tests.

Although formal instruction appears to reduce the proportion of students who hold misconceptions, a substantial number of students with formal training continue to have misconceptions. Of the 72 participants who had taken at least two college-level statistics courses, 27 (37.5%) held misconceptions of representativeness. This may be an indication that the instruction they received was not effective in terms of conceptual understanding of the concepts of probability and probabilistic reasoning. As a consequence, instructional interventions designed specifically to eliminate students' misconceptions are a necessity.

In the intervention study reported here, instruction appeared to have an effect, as students did not show evidence of misconceptions after instruction. In follow-up testing, however, a number of students in the control condition who did not exhibit misconceptions on the first post-test displayed misconceptions on the second post-test. While this finding is hardly strong proof of the effect of instruction, the results are conceptually consistent with expectations related to the benefits of instruction designed to create cognitive conflict and provide opportunities for its resolution. The importance of delayed post-testing is illustrated here. True understanding or conceptual change will only be evident in the long term. The intervention study did provide a successful context for use of the instrument developed here to detect misconceptions of representativeness.


Appendix: A Test of "Representativeness"

    1. What is the chance that the first toss of a fair coin results in a head?

    2. The first toss of the coin does result in a head, and the coin is tossed a second time. What is the chance that the second toss results in a head?

    3. The coin is tossed a third time. What is the chance that the third toss results in a head?

    1. What is the chance that the first roll of a fair die results in a 6?

    2. The first roll of the die does result in a 6, and the die is rolled a second time. What is the chance that the second roll results in a 6?

    3. The die is rolled a third time. What is the chance that the third roll results in a 6?

  1. A fair coin is tossed, and it lands heads up. The coin is tossed a second time. What is the probability that the second toss is also a head?

    1. 1/4
    2. 1/2
    3. 1/3
    4. Slightly less than 1/2
    5. Slightly more than 1/2

    Which of the following best describes the reason for your answer to the preceding question?

    1. The second toss is less likely to be heads because the first toss was heads.
    2. There are four possible outcomes when you toss a coin twice. Getting two heads is only one of them.
    3. The chance of getting heads or tails on any one toss is always 1/2.
    4. There are three possible outcomes when you toss a coin twice. Getting two heads is only one of them.
    5. _______________________________________

  2. The first roll of a fair die results in a 3. The die is rolled a second time. What is the chance that the second roll also results in a 3?

    1. 1/36
    2. 1/5
    3. 1/6
    4. Slightly less than 1/6
    5. Slightly more than 1/6

    Which of the following best describes the reason for your answer to the preceding question?

    1. There are thirty-six possible outcomes when you roll a die twice. Getting two 3's is only one of them.
    2. The second toss is less likely to be a 3 because the first toss was a 3.
    3. The chance of getting a 3 on any one roll is always 1/6.
    4. Any of the other five numbers is more likely than a 3.
    5. Other ______________________________________

  3. A bag has 9 pieces of fruit: 3 apples, 3 pears, and 3 oranges. Four pieces of fruit are picked, one at a time. Each time a piece of fruit is picked, the type of fruit is recorded, and it is then put back in the bag. If the first 3 pieces of fruit were apples, what is the fourth piece MOST LIKELY to be?

    1. A pear
    2. An apple
    3. An orange
    4. An orange or a pear are both equally likely and more likely than an apple.
    5. An apple, orange, or pear are all equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. This piece of fruit is just as likely as any other.
    2. The apples seem to be lucky.
    3. The picks are independent, so each fruit has an equally likely chance of being picked.
    4. The fourth piece of fruit won't be an apple because too many have already been picked.
    5. Other ______________________________________

  4. A box contains 6 balls: 2 are red, 2 are white, and 2 are blue. Four balls are picked at random, one at a time. Each time a ball is picked, the color is recorded, and the ball is put back in the box. If the first 3 balls are red, what color is the fourth ball MOST LIKELY to be?

    1. Red
    2. White
    3. Blue
    4. Blue and white are equally likely and more likely than red.
    5. Red, blue, and white are all equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. The fourth ball should not be red because too many red ones have already been picked.
    2. The picks are independent, so every color has an equally likely chance of being picked.
    3. Red seems to be lucky.
    4. This color is just as likely as any other color.
    5. Other _____________________________________

  5. If a fair coin is tossed five times, which of the following ordered sequence of heads (H) and tails (T), if any, is MOST LIKELY to occur?

    1. H T H T T
    2. T H H H H
    3. H T H T H
    4. Sequences (a) and (c) are equally likely.
    5. All of the above sequences are equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. Every sequence of five tosses has exactly the same probability of occurring.
    2. Since tossing a coin is random, the coin should not alternate between heads and tails.
    3. Any of the sequences could occur.
    4. There ought to be roughly the same number of tails as heads.
    5. Other _____________________________________

  6. If a fair die is rolled five times, which of the following ordered sequence of results, if any, is MOST LIKELY to occur?

    1. 3 5 1 6 2
    2. 4 2 6 1 5
    3. 5 2 2 2 2
    4. Sequences (a) and (b) are equally likely.
    5. All of the above sequences are equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. Since rolling a die is random, numbers should not repeat until most of the numbers appear.
    2. Every sequence of five rolls has exactly the same probability of occurring.
    3. There ought to be a random mixture of numbers.
    4. Any of the sequences could occur.
    5. Other ____________________________________

  7. If a fair coin is tossed four times, which of the following ordered sequences of heads (H) and tails (T), if any, is MOST LIKELY to occur?

    1. H T H T
    2. H H T H
    3. T H H T
    4. H H H H
    5. All sequences are equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. Since tossing a coin is random, you should not get a long string of head or tails.
    2. There ought to be roughly the same number of tails as heads.
    3. Since tossing a coin is random, the coin should not alternate between heads and tails.
    4. Every sequence of four tosses has exactly the same probability of occurring.
    5. Other ____________________________________

  8. If a fair coin is tossed eight times, which of the following ordered sequences of heads (H) and tails (T), if any, is MOST LIKELY to occur?

    1. T T H H H H T T
    2. H H H H H H H H
    3. H T H T H T H T
    4. H H T H T H H H
    5. All sequences are equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. Every sequence of eight tosses has exactly the same probability of occurring.
    2. Since tossing a coin is random, the coin should not alternate between heads and tails.
    3. There ought to be roughly the same number of tails as heads.
    4. Since tossing a coin is random, you should not get a long string of head or tails.
    5. Other ____________________________________

  9. If a fair coin is tossed six times, which of the following ordered sequences of heads (H) and tails (T), if any, is LEAST LIKELY to occur?

    1. H T H T H T
    2. T T H H T H
    3. H H H H T T
    4. H T H T H H
    5. All sequences are equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. Since tossing a coin is random, you should not get a long string of head or tails.
    2. Every sequence of six tosses has exactly the same probability of occurring.
    3. There ought to be roughly the same number of tails as heads.
    4. Since tossing a coin is random, the coin should not alternate between heads and tails.
    5. Other _____________________________________

  10. If a fair coin is tossed twelve times, which of the following ordered sequences of heads (H) and tails (T), if any, is LEAST LIKELY to occur?

    1. H T H T H T H T H T H T
    2. H H T H T H H H T T H H
    3. T T H H T H T T H H T H
    4. H H H H H H H H T T T T
    5. All sequences are equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. There ought to be roughly the same number of tails as heads.
    2. Since tossing a coin is random, you should not get a long string of head or tails.
    3. Every sequence of twelve tosses has exactly the same probability of occurring.
    4. Since tossing a coin is random, the coin should not alternate between heads and tails.
    5. Other ______________________________________

  11. If a fair die is rolled four times, which of the following ordered sequences of results, if any, is LEAST LIKELY to occur?

    1. 6 4 3 5
    2. 5 6 2 6
    3. 2 3 4 5
    4. 2 1 4 3
    5. All sequences are equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. Since rolling a die is a random event, a result like that is very unlikely.
    2. You are much more likely to get a mixture of different numbers than an ordered sequence.
    3. All sequences of rolls have exactly the same probability of occurring.
    4. You are much more likely to get a mixture of different numbers than numbers that repeat.
    5. Other _______________________________________

  12. If a fair die is rolled eight times, which of the following ordered sequences of results, if any, is LEAST LIKELY to occur?

    1. 2 3 4 5 6 1 2 3
    2. 6 4 3 2 4 1 5 6
    3. 5 6 2 6 3 5 4 2
    4. 2 1 4 3 1 5 4 6
    5. All sequences are equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. You are much more likely to get a mixture of different numbers than an ordered sequence.
    2. Since rolling a die is a random event, a result like that is very unlikely.
    3. All sequences of rolls have exactly the same probability of occurring.
    4. You are much more likely to get a mixture of different numbers than numbers that repeat.
    5. Other _______________________________________

  13. If a fair die is rolled six times, which of the following ordered sequences of results, if any, is MOST LIKELY to occur?

    1. 5 6 2 6 4 3
    2. 2 1 4 3 2 4
    3. 6 4 3 2 5 1
    4. 1 2 3 4 5 6
    5. All sequences are equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. All sequences of rolls have exactly the same probability of occurring.
    2. You are much more likely to get a mixture of different numbers than numbers that repeat.
    3. Since rolling a die is a random event, a result like that is very likely.
    4. You are much more likely to get a mixture of different numbers than an ordered sequence.
    5. Other ___________________________________________

  14. If a fair die is rolled eight times, which of the following ordered sequences of results, if any, is MOST LIKELY to occur?

    1. 5 6 2 6 3 5 4 2
    2. 2 1 4 3 1 5 4 6
    3. 6 4 3 2 4 1 5 6
    4. 2 3 4 5 6 1 2 3
    5. All sequences are equally likely.

    Which of the following best describes the reason for your answer to the preceding question?

    1. You are much more likely to get a mixture of different numbers than an ordered sequence.
    2. Since rolling a die is a random event, a result like that is very likely.
    3. You are much more likely to get a mixture of different numbers than numbers that repeat.
    4. All sequences of rolls have exactly the same probability of occurring.
    5. Other __________________________________________


References

Abelson, R. P. (1995), Statistics as Principled Argument, Mahwah, NJ: Lawrence Erlbaum Associates.

De Lisi, R., and Golbeck, S. (1999), "The Implications of Piagetian Theory for Peer Learning," in Cognitive Perspectives on Peer Learning, eds. A. M. O'Donnell and A. King, Mahwah, NJ: Lawrence Erlbaum Associates, pp. 3-37.

Derry, S. J., Levin, J. R., Osana, H. P., and Jones, M. S. (1998), "Developing Middle School Students' Statistical Reasoning Through Simulation Gaming," in Reflections on Statistics: Agendas for Learning, Teaching, and Assessment in K-12, ed. S. J. Lajoie, Mahwah, NJ: Lawrence Erlbaum Associates.

Educational Testing Service (1988), Mathematics Objectives 1990 Assessment (Booklet No. 21-M-10), Princeton, NJ: Author.

Garfield, J., and Ahlgren, A. (1988), "Difficulties in Learning Basic Concepts in Probability and Statistics: Implications for Research," Journal for Research in Mathematics Education, 19, 44-63.

Halpern, D. (1996), Thought and Knowledge (3rd ed.), Mahwah, NJ: Lawrence Erlbaum Associates.

Hambleton, R. K. (1996), "Advances in Assessment Models, Methods, and Practices," in Handbook of Educational Psychology, eds. D. C. Berliner and R. C. Calfee, New York: Macmillan, pp. 899-925.

Kahneman, D., Slovic, P., and Tversky, A. (1982), Judgment Under Uncertainty: Heuristics and Biases, New York: Cambridge University Press.

Kahneman, D., and Tversky, A. (1972), "Subjective Probability: A Judgment of Representativeness," Cognitive Psychology, 5, 430-454.

----- (1973), "Availability: A Heuristic for Judging Frequency and Probability," Cognitive Psychology, 5, 207-232.

Klatsky, R. L., Geiwitz, J., and Fisher, S. C. (1994), "Using Statistics in Clinical Practice: A Gap Between Training and Application," in Human Error in Medicine, ed. M. S. Bogner, Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 123-140.

Konold, C. (1989), "Informal Conceptions of Probability," Cognition and Instruction, 6, 59-98.

----- (1990), Test Items to Identify Misconceptions in Probability and Statistics, unpublished document, University of Massachusetts, Scientific Reasoning Research Institute, Amherst.

----- (1991), "Understanding Students' Beliefs About Probability," in Radical Constructivism in Mathematics Education, ed. E. von Glasersfeld, Netherlands: Kluwer, pp. 139-156.

----- (1995), "Issues in Assessing Conceptual Understanding in Probability and Statistics," Journal of Statistics Education [Online], 3(1). (http://www.amstat.org/publications/jse/v3n1/konold.html)

Konold, C., Pollatsek, A., Well, A. D., Hendrickson, J., and Lipson, A. (1990), "The Origin of Inconsistencies in Probabilistic Reasoning of Novices," presented at The Third International Conference on Teaching Statistics, Dunedin, New Zealand.

Konold, C., Pollatsek, A., Well, A. D., Lohmeier, J., and Lipson, A. (1993), "Inconsistencies in Students' Reasoning About Probability," Journal of Research in Mathematics Education, 24, 392-414.,

Lehman, D. R., Lempert, R. O., and Nisbett, R. E. (1988), "The Effects of Graduate Training on Reasoning: Formal Discipline and Reasoning About Everyday Life," American Psychologist, 43, 431-443.

Lipson, A. (1990), "Learning: A Momentary Stay Against Confusion," Teaching and Learning: The Journal of Natural Inquiry, 4, 2-11.

National Council of Teachers of Mathematics (1991), Professional Standards for Teaching Mathematics, Reston, VA: Author.

Nisbett, R. E., Fong, G. T., Lehman, D. R., and Cheng, P. W. (1987), "Teaching Reasoning," Science, 198, 625-631.

Novak, J. D. (1977), A Theory of Education, Ithaca, NY: Cornell University.

Pollatsek, A., Lima, S., and Well, A. D. (1981), "Concept or Computation: Students' Understanding of the Mean," Educational Studies in Mathematics, 12, 191-204.

Shaughnessy, J. M. (1977), "Misconceptions of Probability: An Experiment With a Small-Group, Activity-Based, Model Building Approach to Introductory Probability at the College Level," Educational Studies in Mathematics, 8, 295-316.

----- (1981), "Misconceptions of Probability: From Systematic Errors to Systematic Experiments and Decisions," in Teaching Statistics and Probability: 1981 Yearbook of the National Council of Teachers of Mathematics, ed. A. P. Shulte, Reston, VA: National Council of Teachers of Mathematics, pp. 90-99.

----- (1992), "Research in Probability and Statistics: Reflections and Directions," in Handbook of Research for Mathematics Education, ed. D. Grouws, New York: Macmillan, pp. 115-147.

Subkoviak, M. J. (1988), "A Practitioner's Guide to Computation and Interpretation of Reliability Indices for Mastery Tests," Journal of Educational Measurement, 25, 47-55.

Tversky, A., and Kahneman, D. (1971), "Belief in the Law of Small Numbers," Psychological Bulletin, 76, 105-110.

----- (1974), "Judgment Under Uncertainty: Heuristics and Biases," Science, 185, 1124-1131.


Linda S. Hirsch
Neuroscience Center
Rutgers University
197 University Avenue
Newark, NJ 07102

lshirsch@axon.rutgers.edu

Angela M. O'Donnell
Department of Educational Psychology
Rutgers University
10 Seminary Place
New Brunswick, NJ 08901-1183

angelao@rci.rutgers.edu


Volume 9 (2001) | Archive | Index | Data Archive | Information Service | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications