Dickinson College

Thomas H. Short

Villanova University

Journal of Statistics Education v.3, n.2 (1995)

Copyright (c) 1995 by Allan J. Rossman and Thomas H. Short, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

**Key Words**: Bayes' Theorem; Active learning;
Technology.

We demonstrate that one can teach conditional probability in a manner consistent with many features of the statistics education reform movement. Presenting a variety of applications of conditional probability to realistic problems, we propose that interactive activities and the use of technology make conditional probability understandable, interactive, and interesting for students at a wide range of levels of mathematical ability. Along with specific examples, we provide guidelines for implementation of the activities in the classroom and instructional cues for promoting curiosity and discussion among students.

1 The past decade has seen the development of a reform movement in statistics education. Some features common to many individual statistics reform projects are that they promote active learning on the part of students, emphasize students' conceptual understanding of fundamental statistical ideas, pose engaging applications involving genuine data for students to investigate, encourage students to work collaboratively with their peers, and utilize technology as an aid toward achieving each of these goals. The reader interested in overviews of statistics education reform might consult Cobb (1992), Cobb (1993), Gordon and Gordon (1992), and Hoaglin and Moore (1992).

2 Traditional topics in probability are often sacrificed in "reformed" introductory statistics texts and courses so that more data analysis and statistical inference can be included. Often a brief introduction to unions and intersections, along with corresponding rules for disjoint and independent events, is the only exposure provided to probabilistic concepts. Conditional probability and Bayes' Theorem are considered optional at best, because they do not seem to be necessary for the understanding of subsequent statistical content.

3 We strongly agree with the renewed emphasis on data as the focus of an introductory statistics course. We also contend that probabilistic thinking is essential for educated citizenship and therefore warrants inclusion in the statistics curriculum. One of us (Rossman) includes ideas of conditional probability in a liberal arts mathematics course titled "Quantitative Reasoning," while the other (Short) teaches them in introductory statistics courses for liberal arts and nursing students.

4 We aim to show in this paper that the characteristics of statistics education reform can be applied quite naturally and powerfully to a study of conditional probability. We present applications through which students can develop an intuitive understanding of conditional probability and Bayes' Theorem, use technology to explore their properties, and apply them thoughtfully to a variety of real-world problems.

5 Basing his argument on students' difficulties understanding conditional probability, Moore (1992) strongly suggests that Bayesian inference, which builds on a foundation of conditional probability, has no place in the introductory statistics course. Our first response is that the subtle but crucial distinction between Pr(A|B) and Pr(B|A) arises even when students study classical statistics. Many students succumb to the natural temptation to regard the p-value as the conditional probability that the null hypothesis is true given the sample data, rather than the probability of having obtained such extreme data if the null hypothesis were true. Exposure to examples of applied conditional probability would help to clarify the underlying logic, interpretation, and limitations of classical statistical inference.

6 Second, understanding this distinction in conditional probabilities is fundamental to analyzing categorical data presented in a two-way table. For example, the following table classifies members of the 1994 U.S. Senate according to their political party and gender:

| Men | Women | Row Total -------------+-----+-------+---------- Republicans | 42 | 2 | 44 Democrats | 51 | 5 | 56 -------------+-----+-------+---------- Column Total | 93 | 7 | 100

7 It is appropriate and important to ask students to assess the legitimacy of the statements "most Democratic senators are women" and "most women senators are Democrats." The ability to interpret these two statements is an essential skill for analyzing two-way tables of data; it is not an arcane exercise in conditional probability.

8 With the similarity between conditional probabilities and the analysis of two-way tables in mind, we argue that students can discover Bayes' Theorem, one of the more important and applicable results of conditional probability, for themselves by constructing two-way tables. A generic application, based on an example found in DeGroot (1986), involves identifying the source of a defective part.

9 Suppose that three machines at a factory are used to produce a large quantity of identical parts. The production machines have different capacities: Machine A has a large capacity and produces 60% of the parts, while Machines B and C produce 30% and 10% of the parts, respectively. Historical data indicate that 10% of the parts produced by Machine A are defective, compared to 30% for Machine B and 40% for Machine C. If a part is inspected and found to be defective, which machine is most likely to have produced it? Which is least likely? What are the conditional probabilities, updated in light of the evidence that the part is defective, of each machine's having produced it?

10 To develop their intuitive sense for conditional probability, we first ask students to guess their answers to these questions. Rather than present students with Bayes' Theorem and have them plug in the appropriate probabilities, we then ask students to construct a two-way table for a hypothetical population of parts in which the percentages hold exactly. (We emphasize that real data from a sample of parts would display variability and not follow the percentages perfectly.) The following questions guide the students in filling out the table:

| Defective | Not Defective | Row Total -------------+-----------+---------------+---------- Machine A | | | Machine B | | | Machine C | | | -------------+-----------+---------------+---------- Column Total | | | 100

(a) Of every 100 parts produced, how many were made Machine A? by Machine B? by Machine C? Fill these in as the row totals of the table.

(b) Of those parts produced by Machine A, how many would you expect to be defective? Repeat for Machines B and C, recording your results in the "Defective" column.

(c) How many of the total of 100 parts in your table are defective? Enter the result as the column total for the "Defective" column.

(d) Of the number of parts expected to be defective, what proportion were produced by Machine A? by Machine B? by Machine C?

11 The resulting table becomes:

| Defective | Not Defective | Row Total -------------+-----------+---------------+---------- Machine A | 6 | 54 | 60 Machine B | 9 | 21 | 30 Machine C | 4 | 6 | 10 -------------+-----------+---------------+---------- Column Total | 19 | 81 | 100

12 Students can read directly from this table that among defective parts, 6/19 are produced by Machine A, 9/19 by Machine B, and 4/19 by Machine C. These can also be understood as the updated probabilities of each machine's having produced a defective part given the information (data) that the part is defective. In this process students essentially apply Bayes' Theorem without realizing it.

13 Contrary to the intuition of many students, machine B is most likely to have produced the defective part. Despite being the least dependable, machine C is least likely to have produced it, thanks to its producing so few parts in the first place. Nevertheless, the probability of machine C's having produced the part more than doubles (from 10% to 4/19) in light of the evidence that the part is defective.

14 Conditional probability and Bayes' Theorem are sometimes introduced using probability trees. While trees can be constructed to represent the structure of conditional probability problems, we feel that the use of two-way tables is more conducive to the organization and interactive calculation of the appropriate probabilities. Two-way tables also connect the conditional probability ideas with data analysis of categorical variables.

15 Having discovered this two-way table analysis, students can apply the procedure to more interesting and relevant applications. Consider for example the interpretation of medical diagnostic test results. One common test for AIDS is the ELISA test. A study by Gastwirth (1987) estimates that when a person actually carries the AIDS virus, this test produces a positive result 97.7% of the time. When a person does not carry the AIDS virus, the test result is negative 92.6% of the time. These percentages are known as the test's sensitivity and specificity, respectively. The study further estimates that a base rate of about 0.5% of the American population carries the AIDS virus. This base rate provides an initial probability that a randomly selected individual carries the virus. Data in the form of test results enable one to update the initial probability for individuals who are tested.

16 A natural question to ask is the probability that a randomly selected American who tests positive actually carries the AIDS virus. Even students with only basic arithmetic skills can address this issue by constructing a two-way table for a hypothetical population of 1,000,000 people in which the percentages hold exactly. Students work through the following questions:

| Test Positive | Test Negative | Row Total -------------+---------------+---------------+----------- Carry AIDS | | | No AIDS | | | -------------+---------------+---------------+----------- Column Total | | | 1,000,000

(a) Use the base rate of the disease in the population to determine how many of these 1,000,000 people would carry AIDS. How many does that leave as non-carriers?

(b) Use the sensitivity of the test to determine how many of the AIDS carriers would test positive. How many does that leave testing negative?

(c) Use the specificity of the test to determine how many of the non-carriers would test negative. How many does that leave testing positive?

(d) What is the total number of people testing positive?

(e) Of those testing positive, what proportion are actually AIDS carriers?

17 The resulting table becomes:

| Test Positive | Test Negative | Row Total -------------+---------------+---------------+----------- Carry AIDS | 4,885 | 115 | 5,000 No AIDS | 73,630 | 921,370 | 995,000 -------------+---------------+---------------+----------- Column Total | 78,515 | 921,485 | 1,000,000

18 From the table students can easily see the counterintuitive result that most positive test results go to people who do not carry the disease. Only about 6.22% of the positive test results go to people who actually carry the AIDS virus. Students can confer with their peers to produce a written explanation for this surprising result.

19 Computer technology allows students to automate this analysis. We ask students to enter formulas into a spreadsheet that will produce this table for whatever base rate, sensitivity, and specificity the user enters. Students can then easily investigate the effects of changing the base rate, sensitivity, and specificity. For example, we ask students to use .0622 as the new base rate to find the probability of carrying AIDS for a person who tests positive twice (assuming that the tests are independent). We also ask students to use the spreadsheet to produce graphical displays of the initial and updated probabilities.

20 As a final exercise with this application, an instructor can challenge students to think about implications of this analysis such as employer-mandated AIDS testing and the screening of blood donations. Of particular importance is the choice of the base rate of AIDS in the population of interest. The 0.5% base rate in this example applies to the population of the United States, but the ideal base probabilities for individuals vary depending on their HIV risk factors.

21 Another important context which calls for Bayesian reasoning involves legal evidence of a quantitative nature. Judges and jurors are often asked to update their subjective assessments of a defendant's guilt based on the introduction of probabilistic evidence. Students with somewhat advanced mathematical abilities can derive that Bayes' Theorem indicates that

Pr(E|G)Pr(G) Pr(G|E) = ----------------------------------- Pr(E|G)Pr(G) + Pr(E|not G)Pr(not G)

where G represents the defendant's guilt and E the evidence in question.

22 Consider the case of Joseph Jamieson, who was tried in a 1987 criminal trial in Pittsburgh's Common Pleas Court on charges of raping seven women in the Shadyside district of the city over a period from April 18, 1985, to January 30, 1986. Fienberg (1990) reports that by analyzing body secretion evidence taken from the scenes of the crimes, a forensic expert concluded that the assailant had the blood characteristics and genetic markers of type B, secretor, PGM 2+1-. She further testified that only .32% of the male population of Allegheny County had these blood characteristics and that Jamieson himself was a type B, secretor, PGM 2+1-. The natural question to ask is how a juror should update the probability of Jamieson's guilt in light of this quantitative forensic evidence.

23 In this case Pr(E|G)=1 and Pr(E|not G)=.0032, since if Jamieson did not commit the crimes, then some other male in Allegheny County presumably did. Plugging these into Bayes' Theorem as presented above and simplifying leads to the expression

Pr(G) Pr(G|E) = --------------------- .9968 Pr(G) + .0032

where Pr(G) represents the juror's subjective assessment of Jamieson's guilt prior to hearing the forensic evidence. Students can use a spreadsheet package or a graphing calculator to graph this updated probability of guilt as a function of the prior probability. We also ask students to use technology to calculate the updated probability of guilt for certain values of the prior probability; these become:

Prior Prob. | .5 | .2 | .1 | .01 | .001 | .00000278 | --------------+-------+-------+-------+-------+-------+-----------+ Updated Prob. | .9968 | .9874 | .9720 | .7594 | .2383 | .0009 |

24 This table reveals that if the probability one would assign to Jamieson's guilt before hearing the forensic evidence is 50%, then one should be 99.68% convinced of his guilt after hearing this evidence. Even if one regards the probability of his guilt prior to hearing evidence as only 1 in 10, then this evidence still raises the guilt probability to 97.2%.

25 The last column of the table warrants a special explanation. The defense in this case argued that the prior probability of guilt should be 1 in 360,000, the estimated number of males in the appropriate age group in Allegheny County. The updated probability of guilt then becomes just 1 in 1150, the number of males with the same blood characteristics in the appropriate age group in Allegheny County. This column of the table highlights the importance of the choice of initial or base probability in this analysis.

26 Technology also enables students to explore another probabilistic facet of the defense's argument. The forensic expert arrived at the type B, secretor, PGM 2+1- characterization by pooling the blood evidence from the seven crimes. The table below reveals the genetic information that could be discerned from each crime scene. Students can use technology to investigate the updated probability of Jamieson's guilt for each separate crime and to discover that the case-by-case evidence is much less incriminating for the defendant.

| Genetic marker | Population | attributable | proportion having Victim | to assailant | genetic marker ----------+---------------------------+-------------------- A | B,secretor | .08 B | B or O,2+ or 2+1+ or 2+1- | .17 C | B,secretor | .08 D | 2+1- or 1+1- or 1- | .26 E | B,secretor,2+ or 2+1- | .0056 F | AB or B,secretor,2+1- | .0048 G | B,secretor | .08 ----------+---------------------------+--------------------- Composite | B,secretor,2+1- | .0032

27 This application also allows students to examine a host of ethical issues. Does the principle of "innocent until proven guilty" mean that the probability of guilt prior to hearing evidence must be zero? If so, then no evidence in the world can move that probability from zero. How does one quantify the legal criteria of "beyond a reasonable doubt" and "preponderance of the evidence"? Can or should the U.S. justice system expect jurors to apply Bayesian methods from the jury box? If not, then how are they to make sense of figures such as a forensic expert's testimony that .0032 of all males have the genetic markers of the assailant?

28 We have presented examples through which students can develop an intuitive understanding of conditional probability and Bayes' Theorem, applying them thoughtfully to a variety of applications involving genuine data. Moreover, we have demonstrated that technology can help students to appreciate the sequential relationships that are the foundation of conditional probability. Technology can also facilitate the exploration of sensitivity of results to the sample size of the study and the choice of initial probabilities. We believe that the teaching and learning of conditional probability can be enhanced by features such as active learning, conceptual understanding, genuine data, and use of technology that characterize statistics education reform. One need not leave conditional probability behind when considering important and interesting examples and activities to include in statistics courses.

Cobb, G. (1992), "Teaching Statistics," in Heeding the Call for Change: Suggestions for Curricular Action, ed. L. Steen, MAA Notes No. 22, Washington: Mathematical Association of America, pp. 3-43.

Cobb, G. (1993), "Reconsidering Statistics Education: A National Science Foundation Conference," Journal of Statistics Education, v.1, n.1.

DeGroot, M. (1986), Probability and Statistics (2nd ed.), Reading, MA: Addison-Wesley Publishing Co., Inc.

Fienberg, S. (1990), "Legal Likelihoods and A Priori Assessments: What Goes Where?," in Bayesian and Likelihood Methods in Statistics and Econometrics (Essays in Honor of George A. Barnard) (1990), eds. S. Geisser, J. S. Hodges, S. J. Press, and A. Zellner, North-Holland, pp. 141-162.

Gastwirth, J. (1987), "The Statistical Precision of Medical Screening Procedures: Application to Polygraph and AIDS Antibodies Test Data," Statistical Science, 2, 213-238.

Gordon, S., and Gordon, F. (eds.) (1992), Statistics for the Twenty-First Century, MAA Notes No. 26, Washington: Mathematical Association of America.

Hoaglin, D., and Moore, D., (eds.) (1992), Perspectives on Contemporary Statistics, MAA Notes No. 21, Washington: Mathematical Association of America.

Moore, D. (1992), "What is Statistics?," in Perspectives on Contemporary Statistics, eds. D. Hoaglin and D. Moore, MAA Notes No. 21, Washington: Mathematical Association of America, pp. 1-17.

Allan J. Rossman

Department of Mathematics and Computer Science

Dickinson College

P.O. Box 1773

Carlisle, PA 17013-2896 rossman@dickinson.edu

Thomas H. Short

Department of Mathematical Sciences

Villanova University

Villanova, PA 19085-1699 short@monet.vill.edu

Return to Table of Contents | Return to the JSE Home Page