college report
endorsed by the american statistical association
Copyright © 2010 American Statistical Association
published by american statistical association
This work is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
tufte-latex.googlecode.com
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this ﬁle except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/ LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “as is” basis, without warranties or conditions of any kind, either express or implied. See the License for the speciﬁc language governing permissions and limitations under the License.
First printing, February 2005
The American Statistical Association (ASA) funded the Guidelines for Assessment and Instruction in Statistics Education (GAISE) Project, which consists of two groups, one focused on K–12 education and one focused on introductory college courses. This report presents the recommendations developed by the college group.
The report includes a brief history of the introductory college course and summarizes the 1992 report^{1 }by George Cobb that has been considered a generally accepted set of recommendations for teaching these courses. Results of a survey on the teaching of introductory courses are summarized, along with a description of current versions of introductory statistics courses. We then offer a list of goals for students, based on what it means to be statistically literate. We present six recommendations for the teaching of introductory statistics that build on the previous recommendations from Cobb’s report. Our six recommendations include the following:
The report concludes with suggestions for how to make these changes and includes numerous examples in the appendices to illustrate details of the recommendations.
^{1 }George Cobb. Heeding the Call for Change: Suggestions for Curricular Action (MAA Notes No. 22), chapter Teaching Statistics, pages 3–43. The Mathematical Association of America, Washington DC, 1992
The GAISE project was funded by a member initiative grant from the ASA in 2003 to develop ASA-endorsed guidelines for assessment and instruction in statistics in the K–12 curriculum and for the introductory college statistics course.
Our work on the college course guidelines included many discussions over email and in-person small group meetings. Our discussions began by reviewing existing standards and guidelines, relevant research results from the studies of teaching and learning statistics, and recent discussions and recommendations regarding the need to focus instruction and assessment on the important concepts that underlie statistical reasoning.
The modern introductory statistics course has roots that go back a long way, to early books about statistical methods. R. A. Fisher’s Statistical Methods for Research Workers, which ﬁrst appeared in 1925, was aimed at practicing scientists. A dozen years later, the ﬁrst edition of George Snedecor’s Statistical Methods presented an expanded version of the same content, but there was a shift in audience. More than Fisher’s book, Snedecor’s became a textbook used in courses for prospective scientists who were still completing their degrees; statistics was beginning to establish itself as an academic subject, albeit with heavy practical, almost vocational emphasis. By 1961, with the publication of Probability with Statistical Applications by Fred Mosteller, Robert Rourke, and George Thomas, statistics had begun to make its way into the broader academic curriculum, but here again, there was a catch: In these early years, statistics had to lean heavily on probability for its legitimacy.
During the late 1960s and early 1970s, John Tukey’s ideas of exploratory data analysis brought a near-revolutionary pair of changes to the curriculum: freeing certain kinds of data analysis from ties to probability-based models so that the analysis of data could begin to acquire status as an independent intellectual activity and introducing
introduction
a collection of “quick-and-dirty” data tools so students could analyze data without having to spend hours chained to a bulky mechanical calculator. Computers would later complete the “data revolution” in the beginning statistics curriculum, but Tukey’s ideas of exploratory data analysis (EDA) provided both the ﬁrst technical breakthrough and the new ethos that avoided invented examples.
Two inﬂuential books appeared in 1978: Statistics, by David Freedman, Robert Pisani, and Roger Purves, and Statistics: Concepts and Controversies, by David S. Moore. The publication of these two books marked the birth of what we regard as the modern introductory statistics course.
The evolution of content has been paralleled by other trends. One of these is a striking and sustained growth in enrollments. Two sets of statistics sufﬁce here:
◾ At two-year colleges, according to the Conference Board of the Mathematical Sciences, statistics enrollments have grown from 27% of the size of calculus enrollments in 1970 to 74% of the size of calculus enrollments in 2000.
◾ The Advanced Placement exam in statistics was ﬁrst offered in 1997. There were 7,500 students who took it that ﬁrst year, more than in the ﬁrst offering of an AP exam in any subject at that time. The next year, more than 15,000 students took the exam. The next year, more than 25,000, and the next, 35,000. In 2004, more than 65,000 students took the AP statistics exam.
Both the changes in course content and the dramatic growth in enrollment are implicated in a third set of changes, a process of democratization that has broadened and diversiﬁed the backgrounds, interests, and motivations of those who take the courses. Statistics has gone from being a course taught from a book like Snedecor’s, for a narrow group of future scientists in agriculture and biology, to being a family of courses, taught to students at many levels, from pre-high school to post-baccalaureate, with very diverse interests and goals. A teacher in the 1940s, using Snedecor’s Statistical Methods, could assume that most students were both quantitatively skilled and adequately motivated by their career plans. A teacher of today’s beginning statistics courses works with a different group of students. Most take statistics earlier in their lives, increasingly often in high school; few are drawn to statistics by immediate practical need; and there is great variety in their levels of quantitative sophistication. As a result, today’s teachers face challenges of motivation and exposition that are substantially greater than those of a half century ago.
Not only have the “what, why, who, and when” of introductory statistics been changing, but so has the “how.” The last few decades
gaise college report
have seen an extraordinary level of activity focused on how students learn statistics, and on how we teachers can be more effective in helping them learn.
In the spring of 1991, George Cobb, in order to highlight important issues to the mathematics community, coordinated an email focus group on statistics education as part of the Curriculum Action Project of the Mathematics Association of America (MAA). The report was published in the MAA volume Heeding the Call for Change^{2 }. It included the following recommendations:
Any introductory course should take as its main goal helping students to learn the basic elements of statistical thinking. Many advanced courses would be improved by a more explicit emphasis on those same basic elements, namely:
◾ The need for data. Recognizing the need to base personal decisions on evidence (data) and the dangers inherent in acting on assumptions not supported by evidence.
◾ The importance of data production. Recognizing that it is difﬁcult and time-consuming to formulate problems and to get data of good quality that really deal with the right questions. Most people don’t seem to realize this until they go through this experience themselves.
◾ The omnipresence of variability. Recognizing that variability is ubiquitous. It is the essence of statistics as a discipline and not best understood by lecture. It must be experienced.
◾ The quantiﬁcation and explanation of variability. Recognizing that variability can been measured and explained, taking into consideration the following: (a) randomness and distributions;
(b) patterns and deviations (ﬁt and residual); (c) mathematical models for patterns; (d) model-data dialogue (diagnostics).
Almost any course in statistics can be improved by more emphasis on data and concepts, at the expense of less theory and fewer recipes. To the maximum extent feasible, calculations and graphics should be automated.
^{2 }George Cobb. Heeding the Call for Change: Suggestions for Curricular Action (MAA Notes No. 22), chapter Teaching Statistics, pages 3–43. The Mathematical Association of America, Washington DC, 1992
introduction
As a rule, teachers of statistics should rely much less on lecturing and much more on alternatives such as projects, lab exercises, and group problem-solving and discussion activities. Even within the traditional lecture setting, it is possible to get students more actively involved.
The three recommendations were intended to apply quite broadly (e.g., whether or not a course has a calculus prerequisite and regardless of the extent to which students are expected to learn speciﬁc statistical methods). Although the work of the focus group ended with the completion of their report, many members of the group continued to work on these issues, especially on efforts at dissemination and implementation, as members of the joint ASA/MAA Committee on Undergraduate Statistics.
Over the decade that followed the publication of the Cobb report, many changes were implemented in the teaching of statistics. In recent years, many statisticians have become involved in the reform movement in statistical education aimed at the teaching of introductory statistics, and the National Science Foundation has funded numerous projects designed to implement aspects of this reform^{3 }. Moore^{4 }describes the reform in terms of changes in content (more data analysis, less probability), pedagogy (fewer lectures, more active learning), and technology (for data analysis and simulations).
In 1998 and 1999, Garﬁeld^{5 }surveyed a large number of statistics instructors from mathematics and statistics departments and a smaller number of statistics instructors from departments of psychology, sociology, business, and economics to determine how the introductory course is being taught and to begin to explore the impact of the educational reform movement.
The results of this survey suggested that major changes were being made in the introductory course, that the primary area of change was in the use of technology, and that the results of course revisions generally were positive, although they required more time of the course instructor. Results were surprisingly similar across departments, with the main differences found in the increased use of graphing calculators, active learning and alternative assessment methods in courses taught in math departments in two-year colleges, the increased use of web resources by instructors in statistics departments, and the reasons cited for why changes were made (more math instructors were inﬂuenced by recommendations from statistics education). The results were also consistent in reporting that more changes were to be
^{3 }George Cobb. Reconsidering statistics education: A national science foundation conference. Journal of Statistics Education, 1(1), 1993. URL
http://www.amstat.org/publications/ jse/v1n1/cobb.html
^{4 }David Moore. New pedagogy and new content: The case of statistics. International Statistical Review, 65:123– 165, 1997
^{5 }Joan Garﬁeld. Evaluating the statistics education reform. Final report to the national science foundation, 2000. URL
http://education.umn.edu/EdPsych/ Projects/Impact.html
made, particularly as more technological resources became available.
Today’s introductory statistics course is actually a family of courses taught across many disciplines and departments. The students enrolled in these courses have different backgrounds (e.g., in mathematics, psychology) and goals (e.g., some hope to do their own statistical analyses in research projects, some are fulﬁlling a general quantitative reasoning requirement).
As in the past, some of these courses are taught in large classes and some are taught in small classes (or even freshman seminars). Some students are taught statistics in computer labs, some students take the course using only a simple calculator, and some take the course via distance learning without ever seeing their classmates or instructor in person. Some classes are taught over a 10-week quarter and some are taught over a 15-week semester. Each of these classes might range from three to six hours per week.
Today’s goals for students tend to focus more on conceptual understanding and attainment of statistical literacy and thinking, and less on learning a set of tools and procedures. While demands for dealing with data in an information age continue to grow, advances in technology and software make tools and procedures easier to use and more accessible to more people, thus decreasing the need to teach the mechanics of procedures, but increasing the importance of giving more people a sounder grasp of the fundamental concepts needed to use and interpret those tools intelligently. These new goals, described in the following section, reinforce the need to reexamine and revise many introductory statistics courses to help achieve the important learning goals for students.
Some people teach courses that are heavily slanted toward teaching students to become statistically literate and wise consumers of data; this is somewhat similar to an art appreciation course. Some teach courses more heavily slanted toward teaching students to become producers of statistical analyses; this is analogous to the studio art course. Most courses are a blend of consumer and producer components, but the balance of that mix will determine the importance of each recommendation we present.
The desired result of all introductory statistics courses is to produce statistically educated students, which means that students should develop statistical literacy and the ability to think statistically. The following goals represent what such a student should know and understand. Achieving this knowledge will require learning some statistical techniques, but the speciﬁc techniques are not as important as the knowledge that comes from going through the process of learning them. Therefore, we are not recommending speciﬁc topical coverage.
Finding no statistically signiﬁcant difference or relationship does not necessarily mean there is no difference or no relationship in the population, especially for studies with small sample sizes
Common sources of bias in surveys and experiments
How to determine the population to which the results of statistical inference can be extended, if any, based on how the data were collected
How to determine when a cause-and-effect inference can be drawn from an association based on how the data were collected (e.g., the design of the study)
That words such as “normal," “random,” and “correlation” have speciﬁc meanings in statistics that may differ from common usage
How to obtain or generate data
How to graph the data as a ﬁrst step in analyzing data, and how to know when that’s enough to answer the question of interest
How to interpret numerical summaries and graphical displays of data—both to answer questions and to check conditions (to use statistical procedures correctly)
How to make appropriate use of statistical inference
How to communicate the results of a statistical analysis
The concept of a sampling distribution and how it applies to making statistical inferences based on samples of data (including the idea of standard error)
The concept of statistical signiﬁcance, including signiﬁcance
levels and p-values
The concept of conﬁdence interval, including the interpretation of conﬁdence level and margin of error
How to interpret statistical results in context
How to critique news stories and journal articles that include statistical information, including identifying what’s missing in the presentation and the ﬂaws in the studies or methods used to generate the information
When to call for help from a statistician
We endorse the ideas in the three original goals found in the Cobb report^{6 }and have expanded them in light of today’s situation. The intent of these recommendations is to help students attain the list of learning goals described in the previous section.
Recommendation 1: Emphasize statistical literacy and develop statistical thinking.
We define statistical literacy as understanding the basic language of statistics (e.g., knowing what statistical terms and symbols mean and being able to read statistical graphs) and fundamental ideas of statistics. For readings on statistical literacy, see Gal^{7}, Rumsey^{8}, and Utts^{9 }.
Statistical thinking has been deﬁned as the type of thinking that statisticians use when approaching or solving statistical problems. Statistical thinking has been described as understanding the need for data, the importance of data production, the omnipresence of variability, and the quantiﬁcation and explanation of variability^{10}. We provide illustrations of statistical thinking in the following example and analogy.
Think of a funnel that is wide at the top, corresponding to a great many situations, and narrow at the bottom, corresponding to a few specialized cases. Statisticians are practical problem-solvers. When a client presents a problem (e.g., Is there a treatment effect present?), the statistician tries to provide a practical answer that addresses the problem efﬁciently. Quite often, a simple graph is sufﬁcient to tell the story. Perhaps a more detailed plot will answer the question at hand. If not, then some calculations may be needed. A simple test based on a gross simpliﬁcation of the situation may conﬁrm that a treatment effect is present. If simplifying the situation is troublesome, then a more reﬁned test may be used, capturing more of the speciﬁcs of the
^{6 }George Cobb. Heeding the Call for Change: Suggestions for Curricular Action (MAA Notes No. 22), chapter Teaching Statistics, pages 3–43. The Mathematical Association of America, Washington DC, 1992
^{7 }Iddo Gal. Adults’ statistical literacy: Meanings, components, responsibilities. International Statistical Review, 70:1–51, 2002
^{8 }D. J. Rumsey. Statistical literacy as a goal for introductory statistics courses. Journal of Statistics Education, 10(3), 2002. URL http://www.amstat.org/ publications/jse/v10n3/rumsey2.html
^{9 }Jessica Utts. What educated citizens should know about statistics and probability? The American Statistician, 57 (2):74–79, 2003
^{10 }George Cobb. Heeding the Call for Change: Suggestions for Curricular Action (MAA Notes No. 22), chapter Teaching Statistics, pages 3–43. The Mathematical Association of America, Washington DC, 1992
modeling situation at hand. Different statisticians may come up with somewhat different analyses of a given set of data, but will usually agree on the main conclusions and only worry about minor points if those points matter to the client. If there is no standard procedure to answer the question, then and only then will the statistician use ﬁrst principles to develop a new tool. We should model this type of thinking for our students, rather than showing them a set of skills and procedures and giving them the impression that, in any given situation, there is one best procedure to use and only that procedure is acceptable.
In week 1 of the carpentry (statistics) course, we learned to use various kinds of planes (summary statistics). In week 2, we learned to use different kinds of saws (graphs). Then, we learned about using hammers (conﬁdence intervals). Later, we learned about the characteristics of different types of wood (tests). By the end of the course, we had covered many aspects of carpentry (statistics). But I wanted to learn how to build a table (collect and analyze data to answer a question) and I never learned how to do that. We should teach students that the practical operation of statistics is to collect and analyze data to answer questions.
. Model statistical thinking for students, working examples and explaining the questions and processes involved in solving statistical problems from conception to conclusion.
. Use technology and show students how to use technology effectively to manage data, explore data, perform inference, and check conditions that underlie inference procedures.
. Give students practice developing and using statistical thinking. This should include open-ended problems and projects.
. Give students plenty of practice with choosing appropriate questions and techniques, rather than telling them which technique to use and merely having them implement it.
. Assess and give feedback on students’ statistical thinking.
In the appendices, we present examples of projects, activities, and assessment instruments that can be used to develop and evaluate statistical thinking.
Recommendation 2: Use real data.
It is important to use real data in teaching statistics to be authentic to consider issues related to how and why the data were produced or collected, and to relate the analysis to the problem context. Using real data sets of interest to students is also a good way to engage them in thinking about the data and relevant statistical concepts. There are many types of real data, including archival data, classroom-generated data, and simulated data. Sometimes, hypothetical data sets may be used to illustrate a particular point (e.g., The Anscombe data illustrates how four data sets can have the same correlation but strikingly different scatterplots.) or to assess a speciﬁc concept. It is important to only use created or realistic data for this speciﬁc purpose and not for general data analysis and exploration. An important aspect of dealing with real data is helping students learn to formulate good questions and use data to answer them appropriately based on how the data were produced.
. Search for good, raw data to use from web data repositories, textbooks, software packages, and surveys or activities in class. If there is an opportunity, seek out real data directly from a practicing research scientist (through a journal or at one’s home institution). Using such data can enliven your class and increase the store of good data sets for other teachers by communicating the newly found data to others. Search for and use summaries based on real data, from data summary web-sites, journal articles, websites with surveys and polls, and textbooks.
. Use data to answer questions relevant to the context and generate new questions.
. Make sure questions used with data sets are of interest to students—if no one cares about the questions, it’s not a good data set for the introductory class. (Example: physical measurements on species no one has heard of.) Note: Few data sets interest all students, so one should use data from a variety of contexts.
. Use class-generated data to formulate statistical questions and plan uses for the data before developing the questionnaire and collecting the data. (Example: Ask questions likely to produce different shaped histograms, use interesting categorical variables to investigate relationships.) It is important that data gathered from students in class not contain information that could be embarrassing to students and that students’ privacy is maintained.
. Get students to practice entering raw data using a small data set or a subset of data, rather than spending time entering a large data set. Make larger data sets available electronically.
. Use subsets of variables in different parts of the course, but integrate the same data sets throughout. (Example: Do side- by-side boxplots to compare two groups, then do two-sample t-tests on the same data. Use histograms to investigate shape, then to verify conditions for hypothesis tests.)
The appendices include examples of good ways (and not-so-good ways) to use data in homework, projects, tests, etc.
Recommendation 3: Stress conceptual understanding, rather than mere knowledge of procedures.
Many introductory courses contain too much material, and students end up with a collection of ideas that are understood only at surface level, are not well-integrated, and are quickly forgotten. If students don’t understand the important concepts, there’s little value in knowing a set of procedures. If they understand the concepts well, then particular procedures will be easy to learn. In the student’s mind, procedural steps too often claim attention that an effective teacher could otherwise direct toward concepts.
Recognize that giving more attention to concepts than to procedures may be difﬁcult politically, both with students and client disciplines. However, students with a good conceptual foundation from an introductory course are well-prepared to study additional statistical techniques such as research methods, regression, experimental design, or statistical methods in a second course.
. View the primary goal as not to cover methods, but to discover concepts.
. Focus on students’ understanding of key concepts, illustrated by a few techniques, rather than covering a multitude of techniques with minimal focus on underlying ideas.
. Pare down content of an introductory course to focus on core concepts in more depth. Examples of syllabi focused on concepts, compared to a syllabus focused on a list of topics, are in the appendices.
Perform routine computations using technology to allow greater emphasis on interpretation of results. Although the language of mathematics provides compact expression of key ideas, use formulas that enhance the understanding of concepts, and avoid computations
√
Σ(y−y¯)^{2 }
that are divorced from understanding. For example, s =
n−1 helps students understand the role of standard deviation as a mea
sure of spread and to see the impact of individual y values on s,
√
Σy2−1 (Σy)^{2 }
n
whereas s = has no redeeming pedagogical value.
n−1
Recommendation 4: Foster active learning in the classroom.
Using active learning methods in class is a valuable way to promote collaborative learning, allowing students to learn from each other. Active learning allows students to discover, construct, and understand important statistical ideas and to model statistical thinking. Activities have an added beneﬁt in that they often engage students in learning and make the learning process fun. Other beneﬁts of active learning methods are the practice students get communicating in the statistical language and learning to work in teams. Activities offer the teacher an informal method of assessing student learning and provide feedback to the instructor on how well students are learning. It is important that teachers not underestimate the ability of activities to teach the material or overestimate the value of lectures, which is why suggestions are provided for incorporating activities, even in large lecture classes.
TYPES OF ACTIVE LEARNING INCLUDE:
◾ Group or individual problem-solving, activities, and discussion
◾ Lab activities (physical and computer-based)
◾ Demonstrations based on data generated on the spot from the students
. Ground activities in the context of real problems. Therefore, data should be collected to answer a question, not “collect data to collect data” (without a question).
. Mix lectures with activities, discussions, and labs.
. Precede computer simulations with physical explorations (e.g., die rolling, card shufﬂing).
. Collect data from students (anonymously).
. Encourage predictions from students about the results of a study that provides the data for an activity before analyzing the data. This motivates the need for statistical methods. (If all results were predictable, we wouldn’t need either data or statistics.)
. Do not use activities that lead students step by step through a list of procedures, but allow students to discuss and think about the data and the problem.
. Plan ahead to make sure there is enough time to explain the problem, let the students work through the problem, and wrap up the activity during the same class. It is hard to complete the activity in the next class period. Make sure there is time for recap and debrieﬁng, even if at the beginning of the next class period.
. Provide a lot of feedback to students on their performance and learning.
. Include assessment as an important component of an activity.
. Take advantage of large classes providing opportunities for large sample sizes for student-generated data.
. In large classes, it may be easier to have students work in pairs, rather than in larger groups.
. Use a separate lab/discussion section for activities, if possible.
Recommendation 5: Use technology for developing concepts and analyzing data.
Technology has changed the way statisticians work and should See the Appendices for an example illustrating technology uses.
change what and how we teach. For example, statistical tables such as a normal probability table are no longer needed to ﬁnd p-values, and we can implement computer-intensive methods. We think technology should be used to analyze data, allowing students to focus on interpretation of results and testing of conditions, rather than on computational mechanics.Technology tools should also be used to help students visualize concepts and develop an understanding of abstract ideas by simulations. Some tools offer both types of uses, while, in other cases, a statistical software package may be supplemented by web applets. Regardless of the tools used, it is important to view the use of technology not just as a way to compute numbers but as a way to explore conceptual ideas and enhance student learning as well. We caution against using technology merely for the sake of using technology (e.g., entering 100 numbers in a graphing calculator and calculating statistical summaries) or for pseudo-accuracy (carrying out results to multiple decimal places). Not all technology tools will have all desired features. Moreover, new ones appear all the time.
◾ Graphing calculators ◾ Statistical packages ◾ Educational software ◾ Applets ◾ Spreadsheets ◾ Web-based resources, including data sources, online texts, and
data analysis routines ◾ Classroom response systems
. Access large, real data sets
. Automate calculations
. Generate and modify appropriate statistical graphics
. Perform simulations to illustrate abstract concepts
. Explore “what happens if . . . ” questions
. Create reports
◾ Ease of data entry, ability to import data in multiple formats ◾ Interactive capabilities ◾ Dynamic linking between data, graphical, and numerical anal
yses ◾ Ease of use for particular audiences
◾ Availability to students, portability
Recommendation 6: Use assessments to improve and evaluate student learning.
Students will value what you assess. Therefore, assessments See the Appendices for examples of good assessment items and suggestions
need to be aligned with learning goals. Assessments need to focus on
for improving weak items.
understanding key ideas, and not just on skills, procedures, and com
puted answers. This should be done with formative assessments used
during a course (e.g., quizzes, midterm exams, and small projects)
and with summative evaluations (course grades). Useful and timely
feedback is essential for assessments to lead to learning. Types of as
sessment may be more or less practical in different types of courses.
However, it is possible, even in large classes, to implement good as
sessments.
◾ Homework
◾ Quizzes and exams
◾ Projects
◾ Activities
◾ Oral Presentations
◾ Written reports
◾ Minute papers
◾ Article critiques
. Integrate assessment as an essential component of the course. Assessment tasks that are well-coordinated with what the teacher is doing in class are more effective than tasks that focus on what happened in class two weeks earlier.
. Use a variety of assessment methods to provide a more complete evaluation of student learning.
. Assess statistical literacy using assessments such as interpreting or critiquing articles in the news and graphs in media.
. Assess statistical thinking using assessments such as student projects and open-ended investigative tasks.
. Use small group projects instead of individual projects.
. Use peer review of projects to provide feedback and improve projects before grading.
. Use items that focus on choosing good interpretations of graphs or selecting appropriate statistical procedures.
. Use discussion sections for student presentations.
Statistics education has come a long way since Fisher and Snedecor. Moreover, teachers of statistics across the country have generally been enthusiastic about adopting modern methods and approaches. Nevertheless, changing the way we teach isn’t always easy. In a way, we are all teachers and learners, a bit like hermit crabs: To grow, we must ﬁrst abandon the protective shell of what we are used to and endure a period of vulnerability until we can settle into a new and larger set of habits and expectations.
We have presented many ideas in this report. We advise readers to move in the directions suggested by taking small steps at ﬁrst. Examples of small steps include the following:
◾ Adding an activity to your course
◾ Having your students do a small project
◾ Integrating an applet into a lecture
◾ Demonstrating the use of software to your students
◾ Increasing the use of real data sets
◾ Deleting a topic from the list you currently try to cover to focus more on understanding concepts
Your teaching philosophy will inform your choice of textbook, but the recommendations in this report are not about choosing a text. They are about a way of teaching.
There are many resources available, including the MAA Notes volumes that deal with teaching statistics, the Consortium to Advance Undergraduate Statistics Education (CAUSE) (causeweb.org), the Iso stat discussion list (www.lawrence.edu/fac/jordanj/isostat.html), the SIGMAA-Stat Ed group within the MAA (www.pasles.org/ sigmaastat), and the ASA website, especially the Center for Statistics Education (www.amstat.org/education) and the Statistical Education Section (www.amstat.org/sections/educ).
A good deal of progress has been made, but there is still plenty of room to improve the introductory statistics course. Moreover, this course must be ﬂexible and adapt to change as more students enter college having learned aspects of statistics in elementary and secondary school. The Advanced Placement course continues to change the statistics education landscape. Although we have been addressing the general introductory course, we must be mindful of other courses, such as business statistics and mathematical statistics, and of the content and goals of good second courses in statistics that build on the solid conceptual understanding developed in the ﬁrst course.
examples and commentary in these appendices are provided for additional guidance, clarification, and illustration of the guidelines in the main report.
A Technology-Based Simulation to Examine the Effectiveness of
Treatments for Cocaine Addiction
◾ The activity should mimic a real-world situation. It should not seem like “busy work.” For instance, if you use coins or cards to conduct a binomial experiment, explain real-world binomial experiments they could represent.
◾ The class should be involved in some of the decisions about how to conduct the activity. Students don’t learn much from following a detailed “recipe” of steps.
◾ The decisions made by the class should require knowledge learned in the class. For instance, if they are designing an experiment, they should consider principles of good experimental design learned in class, rather than “intuitively” deciding how to conduct the experiment.
◾ If possible, the activity should include design, data collection, and analysis so students can see the whole process at work.
◾ It is sometimes better to have students work in teams to discuss how to design the activity and then reconvene the class to discuss how it will be done, but it is sometimes better to have the class work together for the initial design and other decisions. It depends on how difﬁcult the issues to be discussed are and whether each team will need to do things in exactly the same way.
◾ The activity should begin and end with an overview of what is being done and why.
◾ The activity should be fun!
Today, we will test whether Pepsi or Coke tastes better. Divide into groups of four. Choose one person in your group to be the experimenter. Note: If you are not the experimenter, please refrain from looking at the front of the classroom.
The purpose of this exercise is to verify the Central Limit Theorem. Remember that this theorem tells us that the mean of a large sample is:
◾ Approximately bell-shaped
◾ Has mean equal to the mean of the population
◾ Has standard deviation equal to the population standard deviation/ sqrt(n) — σ/^{√ }n
Critique: The test is not double blind. There is no reason why the experimenter can’t be blind to which drink is which. The person who initially sets up the experiment could cover or remove the labels from the drink containers and call them drinks 1 and 2. The drinks could then be prepared in advance into cups labeled A and B. The order of presentation should be randomized for each taster.
Critique: This is not a good activity for at least two reasons. First, it has absolutely no real-world motivation and reinforces the myth that statistics is boring and useless. Second, the instructions are too complete. There is no room for exploration on the part of the students; they are simply given a “recipe” to follow.
Please follow these instructions to verify that the Central Limit Theorem holds.
The “Cents and the Central Limit Theorem” activity from Activity-Based Statistics (Scheaffer et al.) provides an example for illustrating the Central Limit Theorem that is more aligned with the guidelines. Some other good examples from Activity-Based Statistics:
◾ The introduction to hypothesis testing activity (where you draw cards at random from a deck and always get the same color) works well.
◾ Matching Graphs to Variables generates a lot of discussion and learning.
◾ Random Rectangles has become a standard, for good reason.
◾ Randomized Response is not central to the introductory course, but it does involve some statistical thinking.
The idea for projects such as the ones described here comes from Robert Wardrop’s Statistics: Learning in the Presence of Variability (Dubuque, IA: William C. Brown, 1995). These projects, in turn, are based on a study by cognitive psychologists Daniel Kahneman and Amos Tversky.
Consider two versions of the “General’s Dilemma”:
Version 1: Threatened by a superior enemy force, the general faces a dilemma. His intelligence ofﬁcers say his soldiers will be caught in an ambush in which 600 of them will die unless he leads them to safety by one of two available routes. If he takes the ﬁrst route, 200 soldiers will be saved. If he takes the second, there is a two-thirds chance that 600 soldiers will be saved and a two-thirds chance that none will be saved. Which route should he take?
Version 2: Threatened by a superior enemy force, the general faces a dilemma. His intelligence ofﬁcers say his soldiers will be caught in an ambush in which 600 of them will die unless he leads them to safety by one of two available routes. If he takes the ﬁrst route, 400 soldiers will die. If he takes the second, there is a one-third chance that no soldiers will die and a two-thirds chance that 600 will die. Which route should he take?
Both versions of the question have the same two answers; both describe the same situation. The two questions differ only in their wording: One speaks of lives lost, the other of lives saved.
A pair of questions of this form leads easily to a simple randomized comparative experiment with the two questions as “treatments”: Recruit a set of subjects, sort them into two groups using a random number table, and assign one version of the question to each group. The results can be summarized in a 2x2 table of counts:
AB Version 1 Version 2
The data can be analyzed by comparing the two proportions. Using Fisher’s exact test or the chi-square test with continuity correction, for example.
Exercise Set 1.2 in Wardrop’s book lists a large number of varia
tions on this structure, many of them carried out by students. Here
are abbreviated versions of just four:
Ask people in a history library whether they ﬁnd a particular argument from a history book persuasive; the argument was presented with and without a table of supporting data.
Ask women at the student union whether they would accept if approached by a male stranger and invited to have a drink; the male was/was not described as “attractive.”
Ask customers ordering an ice cream cone whether they want a regular or wafﬂe cone; the wafﬂe cone was/was not described as “homemade.”
Ask college students either (1) Would you recommend the counseling service for a friend who was depressed? Or (2) Would you go to the counseling service if you were depressed?
Projects based on two versions of a two-answer question offer a number of advantages:
◾ Data collection can be completed in a reasonable length of time.
◾ Randomization ensures that the results will be suitable for formal inference.
◾ Randomization makes explicit the connection between chance in data gathering and the use of a probability model for analysis.
◾ The method of analysis is comparatively simple and straightforward.
◾ The structure (a 2x2 table of counts) is one with very broad applicability.
◾ Finally, the format is very open-ended, which affords students a wide range of areas of application from which to choose and offers substantial opportunities for imagination and originality in choosing subjects and the pair of questions.
These instructions are for the teacher. Instructions for students are on the Project 4 Team Form.
Goal: Provide students with experience in formulating a research question, then collecting and describing data to help answer it
Supplies: (N = number of students; T = number of teams)
◾ N index cards or slips of paper of each of T colors (or use board space; see below)
◾ T or 2T overhead transparencies and pens (see Step 3 for the reason for 2T of them)
◾ T calculators
Students should work in teams of 4 to 6. See the Sample Project 4 Team Form.
Step 1: Each team formulates two categorical variables for which they want to know if there is a relationship, such as whether someone is a ﬁrstborn (or only) child and whether they prefer indoor or outdoor activities (recent research suggests that ﬁrstborns prefer indoor activities and later births prefer outdoor activities); male/female and opinion on something; class (senior, junior, etc.), and whether they own a car, etc. To make it easier to ﬁnish in time, you may want to restrict them to two categories per variable.
There are two possible methods for collecting data—using index cards (or paper) or using the board. Each of the next few steps will be described for both methods.
Step 2: Cards: Each team is assigned a color from the T colors of index cards. For instance, Team 1 might be blue, Team 2 is pink, and so on. Board: Assign each team space on the chalkboard to write their questions.
Step 3: Each team asks the whole class its two questions. Cards: The team writes the questions on an overhead transparency and displays them, with each team taking a turn to go to the front of the room. Students write their answers on the index card corresponding to that team’s color and the team collects them. For instance, all students in the class write their answers to Team 1’s questions on the blue index card, their answers to Team 2’s questions on the pink card, and so on. Board: A team member writes the questions on the board
Adapted from Project 2.2, Instructors’ Resource Manual, Mind On Statistics, Utts and Heckard
NOTE: This can also be done with one categorical and one quantitative variable and the data retained for use when doing two-sample inference.
along with a two-way table where each student can put a hash mark in the appropriate cell.
Step 4: Cards: After each team has asked its questions and students have written their answers, the cards are collected and given to the appropriate team. For instance, Team 1 receives all the blue cards. Board: All class members go to each segment of the board and put a hash mark in the cell of the table that ﬁts them.
Step 5: Each team tallies, summarizes, and prepares a graphical display of the data for their questions. The results are written on an overhead transparency.
Step 6: Each team presents the results to the class.
Step 7: Results can be retained for use when covering chi-square tests for independence if you are willing to pretend that the data are a random sample from a larger population.
PROJECT 4 : TEAM FORM
TEAM MEMBERS:
INSTRUCTIONS:
◾ Explantory variable:
◾ Response variable:
Response Variable
Explanatory Variable | Category 1 | Category 2 | Total |
Category 1 | |||
Category 2 | |||
Total |
These instructions are for the teacher. Instructions for students are on the “Project 5 Team Form.”
Goal: Provide students with experience in designing, conducting, and analyzing an experiment
Supplies: (N = number of students, T = number of teams)
◾ T bowls ﬁlled with about 30 of each of two distinct colors of dried beans
◾ 2T empty paper cups or bowls
◾ T stop watches or watches with a second hand
The Story: A company has many workers whose job is to sort two types of small parts. Workers are prone to get repetitive strain injury, so the company wonders if there would be a big loss in productivity if the workers switch hands, sometimes using their dominant hand and sometimes using their nondominant hand. (Or if you are using latex gloves, the story can be that, for health reasons, they might want to require gloves.) Therefore, you are going to design, conduct, and analyze an experiment making this comparison. Students will be timed to see how long it takes to separate the two colors of beans by moving them from the bowl into the two paper cups, with one color in each cup. A comparison will be done after using dominant and nondominant hands. An alternative is to time students for a ﬁxed time, such as 30 seconds, and see how many beans can be moved in that amount of time.
Step 1: As a class, discuss how the experiment will be done. This could be done in teams ﬁrst. See below for suggestions.
◾ What are the treatments? What are the experimental units?
◾ Principles of experimental design to consider are as follows. Use as many of them as possible in designing and conducting this experiment. Discuss why each one is used.
Blocking or creating matched-pairs
Randomization of treatments to experimental units, or randomization of order of treatments Blinding or double blinding Control group Placebo
Adapted from Project 12.2, Instructors’ Resource Manual, Mind On Statistics, Utts and Heckard
NOTE: A variation is to have them do the task with and without wearing a latex glove instead of with the dominant and nondominant hand. In that case, you will need N pairs of latex gloves
Learning effect or getting tired ◾ What is the parameter of interest? ◾ What type of analysis is appropriate—hypothesis test,
conﬁdence interval, or both?
The class should decide that each student will complete the task once with each hand. Why is this preferable to randomly assigning half of the class to use their dominant hand and the other half to use their nondominant hand? How will the order be decided? Should it be the same for all students? Will practice be allowed? Is it possible to use a single or double-blind procedure?
The Project 5 Team Form shows one way to assign tasks to
team members.
Step 3: Descriptive statistics and preparation for inference. Convene the class and create a stemplot of the differences. Discuss whether the necessary conditions for this analysis are met. Were there any outliers? If so, can they be explained? Have someone compute the mean and standard deviation for the differences.
Step 4: Inference. Have teams reconvene. Each team is to ﬁnd a conﬁdence interval for the mean difference and conduct the hypothesis test.
Step 5: Reconvene the class and discuss conclusions.
Blocking or creating matched-pairs: Each student should be used as a matched-pair, doing the task once with each hand.
Randomization of treatments to experimental units, or randomization of order of treatments: Randomize the order of which hand to use for each student.
Blinding or double blinding: Obviously, the student knows which hand is being used, but the time-keeper doesn’t need to know.
Control group: Not relevant for this experiment.
Placebo: Not relevant for this experiment.
Learning effect or getting tired: There is likely to be a learning effect, so you may want to build in a few practice rounds. Also, randomizing the order of the two hands for each student will help with this.
Have each student ﬂip a coin. Heads, start with dominant hand. Tails, start with nondominant hand. Time them to see how long it takes to separate the beans. The person timing them could be blind to the condition by not watching.
• what is the parameter of interest? •
Answer: Deﬁne the random variable of interest for each person to be a “manual dexterity difference” of
d =number of extra seconds required with nondominant hand d =time with non-dominant hand −time with dominant hand
Deﬁne µ_{d }= population mean manual dexterity difference.
• what are the null and alternative hypotheses? •
H0 :µd = 0 HA :µ_{d }> 0 (faster with dominant hand)
• is a confidence interval appropriate? •
Yes, it will provide information about how much faster workers can accomplish the task with their dominant hands. The formula for the conﬁdence interval is
d^{¯}±t ∗(√^{s}^{d })
n
where t∗ is the critical t-value with df = n −1 and sd is the standard deviation of the difference scores. To carry out the test, compute
¯
d−0
t = , then compare to the critical t-value to ﬁnd the p-value.
sd/^{√ }n
PROJECT 5 : TEAM FORM
TEAM MEMBERS:
INSTRUCTIONS:
You will work in teams. Each team should take a bowl of beans and two empty cups. You are each going to separate the beans by moving them from the bowl to the empty cups, with one color to each cup. You will be timed to see how long it takes. You will each do this twice, once with each hand, with order randomly determined.
1. Designate these jobs. You can trade jobs for each round if you wish.
◾ Coordinator — runs the show
◾ Randomizer — ﬂips a coin to determine which hand each person will start with, separately for each person
◾ Time-keeper — must have watch with second hand to time each person for the task
◾ Recorder — records the results in the table below
Record the data here:
NAME | Time for non dominant hand | Time for dominant hand | d = difference nondominant − dominant hand |
---|---|---|---|
Parameter to be tested and estimated is:
Conﬁdence interval:
Hypothesis test—hypotheses and results:
We ﬁrst give some examples of assessment items with problems and commentary about the nature of the difﬁculty.
Assessment items to avoid using on tests: True/False, pure computation without a context or interpretation, items with too much data to enter and compute or analyze, items that only test memorization of deﬁnitions or formulas
A teacher taught two sections of elementary statistics last semester, Critique: The teacher has all the population data so there is no need to do statistical
each with 25 students, one at 8:00 a.m. and one at 4:00 p.m. The
inference.
means and standard deviations for the ﬁnal exams were 78 and 8 for the 8:00 a.m. class and 75 and 10 for the 4:00 p.m. class. In examining these numbers, it occurred to the teacher that the better students probably sign up for 8:00 a.m. class. So she decided to test whether the mean ﬁnal exam scores were equal for her two groups of students. State the hypotheses and carry out the test.
An economist wants to compare the mean salaries for male and Critique: The question doesn’t address the conditions necessary for a t-test, and with
female CEOs. He gets a random sample of 10 of each and does a
the small sample sizes, they are almost t-test. The resulting p-value is .045. surely violated here. Salaries are almost
surely skewed.
Which of the following gives the deﬁnition of a p-value?
A. It’s the probability of rejecting the null hypothesis when the null hypothesis is true.
B. It’s the probability of not rejecting the null hypothesis when the null hypothesis is true.
C. It’s the probability of observing data as extreme as that observed.
D. It’s the probability that the null hypothesis is true.
True/False items, even when well-written, do not provide much information about student knowledge because there is always a 50% chance of getting the item right without any knowledge of the topic. One current approach is to change the items into forced-choice questions with three or more options.
Item 4
The size of the standard deviation of a data set depends on where the center is. True or False
changed to:
Does the size of the standard deviation of a data set depend on
where the center is located?
A. Yes, the higher the mean, the higher the standard deviation.
B. Yes, because you have to know the mean to calculate the standard deviation.
C. No, the size of the standard deviation is not affected by the location of the distribution.
D. No, because the standard deviation only measures how the values differ from each other, not how they differ from the mean.
Critique: None of these answers is quite correct. Answers B and D are clearly wrong; answer A is the level of signiﬁcance; and answer C would be correct if it continued “. . . or more extreme, given that the null hypothesis is true.”
Item 5
A correlation of +1 is stronger than a correlation of −1. True or False
rewritten as: A recent article in an educational research journal reports a correlation of +0.8 between math achievement and overall math aptitude. It also reports a correlation of −0.8 between math achievement and a math anxiety test. Which of the following interpretations is the most correct?
A. The correlation of +0.8 indicates a stronger relationship than the correlation of −0.8.
B. The correlation of +0.8 is just as strong as the correlation of −0.8.
C. It is impossible to tell which correlation is stronger.
Item 6
Once it is established that X and Y are highly correlated, what type of study needs to be done to establish that a change in X causes a change in Y?
a context is added: A researcher is studying the relationship between an experimental medicine and T4 lymphocyte cell levels in HIV/AIDS patients. The T4 lymphocytes, a part of the immune system, are found at reduced levels in patients with the HIV infection. Once it is established that the two variables, dosage of medicine, and T4 cell levels are highly correlated, what type of study needs to be done to establish that a change in dosage causes a change in T4 cell levels?
A. correlational study
B. controlled experiment
C. prediction study
D. survey
Item 7
A ﬁrst-year program course used a ﬁnal exam that contained a 20point essay question asking students to apply Darwinian principles to analyze the process of expansion in major league sports franchises. To check for consistency in grading among the four professors in the course, a random sample of six graded essays were selected from each instructor. The scores are summarized in the table below. Construct an ANOVA table to test for a difference in means among the four instructors.
Instructor Scores
Afﬁnger | 18 | 11 | 10 | 12 | 15 | 12 |
---|---|---|---|---|---|---|
Beaulieu | 14 | 14 | 11 | 14 | 11 | 14 |
Cleary | 19 | 20 | 15 | 19 | 19 | 16 |
Dean | 17 | 14 | 17 | 15 | 18 | 15 |
A ﬁrst-year program course . . . (same intro as above) . . . The scores are summarized in the table below, along with some descriptive statistics for the entire sample and a portion of the one-way ANOVA output.
Descriptive | Statistics | ||||||
---|---|---|---|---|---|---|---|
Variable | N | Mean | Median | TrMean | StDev | SEMean | |
Score | 24.00 | 15.00 | 15.00 | 15.00 | 2.92 | 0.60 |
One-way Analysis of Variance
***ANOVA TABLE OMITTED ***
Level N Mean StDev Affinger 6 13.00 2.97 Beaulieu 6 13.00 1.55 Cleary 6 18.00 2.00 Dean 6 16.00 1.55
Pooled StDev = 2.098
Critique: The version of the question above requires a fair amount of pounding on the calculator to get the results and never even asks for an interpretation. The revision below still requires some calculation (which can be adjusted depending on the amount of computer output provided) but the calculations can be done relatively efﬁciently—especially by students who have a good sense of what the computer output is providing.
Item 8
Let Y denote the amount a student spends on textbooks for one semester. Suppose Nancy, who is statistically savvy, wants to know how fall, semester 1, and spring, semester 2, compare. In particular, suppose she is interested in the averages ?1 and ?2. You may assume that Nancy has taken several statistics courses and knows a lot about statistics, including how to interpret conﬁdence intervals and hypothesis tests. You have random samples from each semester and are to analyze the data and write a report. You seek advice from four persons:
Rank the four pieces of advice from worst to best and explain why you rank them as you do. That is, explain what makes one better than another.
Item 9
Researchers took random samples of subjects from two populations and applied a Wilcoxon-Mann-Whitney test to the data; the P-value for the test, using a nondirectional alternative, was .06. For each of the following, say whether the statement is true or false and why.
Item 10
An article on the CNN web page (www.cnn.com/HEALTH/9612/16/ faith.healing/index.html) on Monday begins with the sentence, “Family doctors overwhelmingly believe that religious faith can help patients heal, according to a survey released Monday.” Later, the article states, “Medical researchers say the beneﬁts of religion may be as simple as helping the immune system by reducing stress,” and Dr. Harold Koenig is reported to say that “people who regularly attend church have half the rate of depression of infrequent churchgoers.”
Use the language of statistics to critique the statement by Dr. Koenig and the claim, suggested by the article, that religious faith and practice help people ﬁght depression. You will want to select some of the following words in your critique: observational study, experiment, blind, double-blind, precision, bias, sample, spurious, confounding, causation, association, random, valid, reliable.
Item 11
Francisco Franco (Class of ’98) weighed 100 Hershey’s Kisses (with almonds). He found that the sample average was 4.80 grams and the SD was .28 grams. In the context of this setting, explain what is meant by the sampling distribution of an average.
Item 12
A gardener wishes to compare the yields of three types of pea seeds—type A, type B, and type C. She randomly divides the type A seeds into three groups and plants some in the east part of her garden, some in the central part of the garden, and some in the west part of the garden. Then, she does the same with the type B seeds and type C seeds.
Item 13
The scatterplot shows how divorce rate, y, and marriage rate, x, are related for a collection of 10 countries. The regression line has been added to the plot.
1. The U.S. is not one of the 10 points in the original collection of countries. It happens that the U.S. has a higher marriage rate than any of the 10 countries. Moreover, the divorce rate for the
U.S. is higher than one would expect, given the pattern of the other countries. How would adding the U.S. to the data set affect the regression line? Why?
2. Think about the scatterplot and regression line after the U.S. has been added to the data set. Provide a sketch of the residual plot. Label the axes and identify the U.S. on your plot with a triangle.
Item 14
Researchers wanted to compare two drugs, formoterol and salbutamol, in aerosol solution to a placebo for the treatment of patients who suffer from exercise-induced asthma. Patients were to take a drug or the placebo, do some exercise, and then have their “forced expiratory volume” measured. There were 30 subjects available. (Based on A.N. Tsoy, et al., European Respiratory Journal 3 (1990): 235; via Berry, Statistics: A Bayesian Perspective.)
Item 15
I noticed that eight students from the 114 class attended the review session prior to the second exam (in April). The average score among those eight students was lower than the average for the 21 students who did not attend the review session. Suppose I want to use this information in a study of the effectiveness of review sessions.
Item 16
For each of the following three settings, state the type of analysis you would conduct (e.g., one-sample t-test, regression, chi-square test of independence, chi-square goodness-of-ﬁt test, etc.) if you had all the raw data and specify the roles of the variable(s) on which you would perform the analysis, but do not actually carry out the analysis.
Item 17
I had Data Desk construct parallel dotplots of the data from four samples. I then conducted a test of H0 : µ1 = µ2 = µ3 = µ4 and rejected H0 at the α = .05 level. I also tested H0 : µ1 = µ2 = µ3 and rejected H0 at the α = .05 level. However, when I testedH0 :µ2 = µ3 using α = .05, I did not reject H0. Likewise, when I tested H0 : µ1 = µ4 using α = .05, I did not reject H0.
Item 18
Atley Chock (Class of ‘02) collected data on a random sample of 12 breakfast cereals. He recorded x = ﬁber (in grams/ounce) and y = price (in cents/ounce). A scatterplot of the data shows a linear relationship. The ﬁtted regression model is
yˆ = 17.42 +0.62x
The sample correlation coefﬁcient, r, is 0.23. The SE of b1 is.81. Also, s_{ylx }= 3.1.
Item 19
Give a rough estimate of the sample correlation for the data in each of the scatterplots below.
Item 20
A matched pairs experiment compares the taste a regular cheese pizza of Pizza Joe’s to Domino’s. Each subject tastes two unmarked pieces of pizza, one of each type, in random order and states which he or she prefers. Of the 50 subjects who participate in the study, 21 prefer Pizza Joe’s.
Item 21
It was claimed that 1 out of 5 cardiologists takes an aspirin a day to prevent hardening of the arteries. Suppose the claim is true. If 1,500 cardiologists are selected at random, what is the probability that at least 275 of the 1,500 take an aspirin a day?
Item 22
Identify whether a scatterplot would or would not be an appropriate visual summary of the relationship between the variables. In each case, explain your reasoning.
Item 23
The paragraphs that follow each describe a situation that calls for some type of statistical analysis. For each, you should:
Conﬁdence interval (for a mean, p, . . . ) | Normal distribution |
Determining sample size | Correlation |
Test for a mean | Simple linear regression |
Test for proportion | Multiple regression |
Difference in means (paired data) | Two-way table (chi-square test) |
Difference in means (two independent samples) | ANOVA for difference in means |
Difference in proportions | Two-way ANOVA for means |
Some statistical procedures you might choose:
A. Anthropologists have found two burial mounds in the same region. They know several tribes lived in the region and that the tribes have been classiﬁed according to different lengths of skulls. They measure a random sample of skulls found in each burial mound and wish to determine if the two mounds were made by different tribes. (p-value = 0.0082)
B. The Hawaiian Planters Association is developing three new strains of pineapple (call them A, B, and C) to yield pulp with higher sugar content. Twenty plants of each variety (60 plants in all) are randomly distributed into a two-acre ﬁeld. After harvesting, the resulting pineapples are measured for sugar content and the yields are recorded for each strain. Are there signiﬁcant differences in average sugar content between the three strains? (p-value = 0.987)
C. Researchers were commissioned by the Violence In Children’s Television Investigative Monitors (VICTIM) to study the frequency of depictions of violent acts in Saturday morning TV fare. They selected a random sample of 40 shows that aired during this time period over a 12-week period. Suppose 28 of the 40 shows in the sample were judged to contain scenes depicting overtly violent acts. How should they use this information to make a statement about the population of all Saturday morning TV shows?
D. The Career Planning Ofﬁce is interested in seniors’ plans and how they might relate to their majors. A large number of students are surveyed and classiﬁed according to their MAJOR (Natural Science, Social Science, Humanities) and FUTURE plans (Graduate School, Job, Undecided). Are the type of major and future plans related? (p-value = 0.047)
E. Sophomore Magazine asked a random sample of 15 year olds if they were sexually active (yes or no). They would like to see if there is a difference in the responses between boys and girls. (p-value = 0.029)
F. Every week during the Vietnam War, a body count (number of enemy killed) was reported by each army unit. The last digits of these numbers should be fairly random. However, suspicions arose that the counts might have been fabricated. To test this, a large random sample of body count ﬁgures was examined and the frequency with which the last digit was a 0 or a 5 was recorded. Psychologists have shown that people making up their own random numbers will use these digits less often than random chance would suggest (i.e., 103 sounds like a more “real" count than 100). If the data were authentic counts, the proportion of numbers ending in 0 or 5 should be about 0.20.(p-value=0.002)
G. In one of his adventures, Sherlock Holmes found footprints made by the criminal at the scene of a crime and measured the distance between them. After sampling many people, measuring their height and length of stride, he conﬁdently announced that he could predict the height of the suspect. How?
Item 24
How accurate are radon detectors of a type sold to homeowners? To answer this question, university researchers placed 12 detectors in a chamber that exposed them to 105 picocuries per liter of radon. The detector readings found are below. A printout of the descriptive statistics from Minitab follows.
91.9 97.8 111.4 122.3 105.4 95.0
103.8 99.6 96.6 119.3 104.8 101.7
Variable N Mean Median TrMean StDev SE Mean Minimum readings 12 104.13 102.75 103.54 9.40 2.71 91.90
Item 25
According a U.S. Food and Drug Administration (FDA) study, a cup of coffee contains an average of 115 mg of caffeine, with the amount per cup ranging from 60 to 180 mg depending on the brewing method. Suppose you want to repeat the FDA experiment to obtain an estimate of the mean caffeine content to within 5 mg with 95% using your favorite brewing method. In problems such as this, we can estimate the standard deviation of the population to be ^{1 }of
4 the range. How many cups of coffee must you brew?
Item 26
An advertisement claims that by applying a particular drug, hair is restored to bald-headed men. Outline the design of an experiment you would use to examine this claim. Assume you have money to use 20 bald men in this experiment.
Maximum Q1 Q3
122.30 96.90 109.90
Item 27
A study of iron deﬁciency among infants compared samples of infants following different feeding regimens. One group contained breast-fed infants, while the children in another group were fed a standard baby formula without any iron supplements. Here are the summary results on blood hemoglobin levels at 12 months of age:
Group | N | ¯X | s |
Breast-fed | 23 | 13.3 | 1.7 |
Formula | 19 | 12.4 | 1.8 |
Assume that the blood hemoglobin levels in children (both breast-fed and formula-fed) are normally distributed. Do a signiﬁcance test to determine the statistical signiﬁcance of the observed difference.
Item 28
Which implies a stronger linear relationship, a correlation of +0.4 or a correlation of −0.6? Brieﬂy explain your choice.
Item 29
A group of physicians subjected the polygraph to the same careful testing given to medical diagnostic tests. They found that if 1,000 people were subjected to the polygraph and 500 told the truth and 500 lied, the polygraph would indicate that approximately 185 of the truth-tellers were liars and 120 of the liars were truth-tellers. In the application of the polygraph test, an individual is presumed to be a truth-teller until indicated that s/he is a liar. What is a type I error in the context of this problem? What is the probability of a type I error in the context of this problem? What is a type II error in the context of this problem? What is the probability of a type II error in the context of this problem?
Item 30
Audiologists recently developed a rehabilitation program for hearing-impaired patients in a Canadian program for senior citizens. A simple random sample of the 30 residents of a particular senior citizens home and the seniors were diagnosed for degree and type of sensorineural hearing loss which was coded as follows: 1 = hear within normal limits, 2 = high-frequency hearing loss, 3 = mild loss, 4 = mild-to-moderate loss, 5 = moderate loss, 6 = moderate-to-severe loss, and 7 = severe-to-profound loss. The data are as follows:
671126464252515 466555253646642
Item 31
A utility company was interested in knowing if agricultural customers would use less electricity during peak hours if their rates were different during those hours. Customers were randomly assigned to continue to get standard rates or to receive the time-of-day structure. Special meters were attached that recorded usage during peak and off-peak hours; the technician who read the meter did not know what rate structure each customer had.
Item 32
At the beginning of the semester, we measured the width of a page in our statistics book. Below is the scatterplot of the ﬁrst measurement vs. the second measurement.
Item 33
A study in the Journal of Leisure Research investigated the relationship between academic performance and leisure activities. Each in a sample of 159 high-school students was asked to state how many leisure activities they participated in weekly. From the list, activities that involved reading, writing, or arithmetic were labeled “academic leisure activities.” Some of the results are in the table below:
Mean | Standard Deviation | |
---|---|---|
GPA | 2.96 | 0.71 |
Number of leisure activities | 12.38 | 5.07 |
Number of academic leisure activities | 2.77 | 1.97 |
Based on these numbers (and knowing that the GPA is a value between 0 and 4 and the number of activities cannot be negative), discuss the potential skewness of each of the above variables.
Item 34
Events A and B are disjoint. Discuss whether or not A and B can be independent.
Item 35
A sample of 200 mothers and a sample of 200 fathers were taken. The age of the mother when she had her ﬁrst child and the age of the father when he had his ﬁrst child were recorded.
this example starts with a real-world situation, has students do a physical simulation using cards, and then brings in computer technology to automate the simulation.
A study on the treatment of cocaine addiction described the results of an experiment comparing two drugs for helping addicts stay off cocaine (D.M. Barnes, “Breaking the Cycle of Cocaine Addiction”, Science, Vol. 241, 1988, pp. 1029-1030). A group of 48 cocaine addicts who were seeking treatment were randomly divided into two groups of 24. One group was treated with a new drug called desipramine, while the other group was given lithium. The results are summarized in the table below, where we consider patients who do not relapse as successfully treated.
No Relapse | Relapse | |
---|---|---|
Desipramine | 14 | 10 |
Lithium | 6 | 18 |
While we observe that desipramine was more successful than lithium in this particular experiment, can we conclude that the improvement is statistically signiﬁcant. (i.e., Would we expect to see such a large difference if the drugs were equally effective and it was just the random assignment process that happened to get so many more successful cases in the desipramine group?) We will address this question through simulation, ﬁrst using a physical demonstration based on shufﬂing cards, then with a computer simulation that allows us to see the differences for many random assignments of the addicts to the treatment groups.
Take a deck of 54 playing cards (including two jokers) and remove six of the black cards (spades or clubs). The remaining deck should match the subjects in the cocaine experiment with all the red cards and the jokers representing patients who relapsed and the 20 black cards representing patients who were treated successfully. If we shufﬂe the deck and deal out two piles of 24 cards each, we will simulate the assignment of addicts to the two treatment groups when the success does not depend on which drug they take. Do so and ﬁll in the two-way table with the “success” (black cards) and “relapse” (red/jokers) counts for each group.
No Relapse | Relapse | |
---|---|---|
Desipramine | ||
Lithium |
Note that once you know one number in the table, you can ﬁll in the rest, as you know there are 24 in each treatment group and 20 will not relapse while 28 will relapse (that is why we sometimes say there is just one degree of freedom in the 2x2 table). To keep things simple then, we can just keep track of one count, such as the number of “no relapse” in the desipramine group.
Shufﬂe all the cards again, deal 24 for the desipramine group, and count the number of black cards.
Number of “no relapse” in desipramine group =
Pool the results for your class (counting # of black cards in each random group of 24 cards assigned to the “desipramine” group) in a dotplot. How often was the number of black cards as large as (or larger than) the 14 cases observed in the actual experiment?
The p-value of the original data is the proportion, assuming both drugs are equally effective, of random assignments that have 14 or more “no relapse” cases going to the desipramine group. Estimate this proportion using the data in your class dotplot.
To get a more accurate estimate of the proportion of random assignments that put 14 or more no relapse cases into the desipramine group, we’ll turn to a computer simulation.
Start with a data set (provided online) consisting of two columns and 48 rows. The ﬁrst column (Treatment) has the value “desipramine” in the ﬁrst 24 rows and “lithium” in the remaining 24 rows. The second column (Result) has the values “no relapse” and “relapse” to match the data in the original 2x2 table from the cocaine experiment.
Have the computer permute the values in the “Result” column to represent a new random assignment of subjects to the treatment groups where the outcome does not depend on which drug was taken. Count the number of “no relapse” cases in the desipramine treatment group and have the result stored somewhere. Automate this process to repeat 1,000 times^{11 }.
Look at a histogram or dotplot of the distribution of counts for the 1,000 simulations. Does it seem unusual to have as many as 14 “no relapse” cases in the desipramine group?
Count the number of simulations that have 14 or more successes in the desipramine group (either from the graph, if feasible, or by sorting the simulated counts column) and divide by 1,000 to get another approximation of the p-value for the original data.
Does it seem reasonable that the larger number (14) of successful cases appeared in the desipramine group by chance, or would it be more appropriate to conclude that desipramine probably works better than lithium at treating cocaine addiction?
^{11 }Some technology alternatives: The most difﬁcult step here is to automate the simulations to record the counts for many random assignments. Some packages, such as Fathom, have easy-to-use tools designed for exactly such purposes. Others, such as Minitab, allow a bit of programming through macros that can be built in advance and repeated in a loop. A somewhat less enlightening simulation could be accomplished with a stat package that allows generation of random data from a hypergeometric distribution, although students would then lose the connection to the physical randomizations. Finally, an ambitious instructor could construct (or possibly ﬁnd on the web) an applet to perform the required simulations and collect the results.
Find the least squares line for the data below. Use it to predict Y Critique: Made-up data with no context (not recommended). The problem is purely
when X=5.
computational with no possibility of meaningful interpretation.
X | 1 | 2 | 3 | 4 | 6 | 8 |
---|---|---|---|---|---|---|
Y | 3 | 4 | 6 | 7 | 14 | 20 |
The data below show the number of customers in each of six tables at Critique: A context has been added that makes the exercise more appealing and
a restaurant and the size of the tip left at each table at the end of the
shows students a practical use of statistics.
meal. Use the data to ﬁnd a least squares line for predicting the size of the tip from the number of diners at the table. Use your result to predict the size of the tip at a table that has ﬁve diners.
Diners | 1 | 2 | 3 | 4 | 6 | 8 |
---|---|---|---|---|---|---|
Tip | $3 | $4 | $6 | $7 | $14 | $20 |
The data below show the quiz scores (out of 20) and the grades on Critique: Data are from a real situation that should be of interest to students taking the
the midterm exam (out of 100) for a sample of eight students who
course.
took this course last semester. Use these data to ﬁnd a least squares line for predicting the midterm score from the quiz score. Assuming the quiz and midterm are of equal difﬁculty this semester and the same linear relationship applies this year, what is the predicted grade on the midterm for a student who got a score of 17 on the quiz?
Quiz | 20 | 15 | 13 | 18 | 18 | 20 | 14 | 16 |
---|---|---|---|---|---|---|---|---|
Midterm | 92 | 72 | 72 | 95 | 88 | 98 | 65 | 77 |