Journal of Statistics Education v.2, n.1 (1994)
Copyright (c) 1994 by Joan B. Garfield, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.
Key Words: Assessment; Testing; Evaluation; Student learning.
Changes in educational assessment are currently being called for, both within the fields of measurement and evaluation as well as in disciplines such as statistics. Traditional forms of assessment of statistical knowledge provide a method for assigning numerical scores to determine letter grades but rarely reveal information about how students actually understand and can reason with statistical ideas or apply their knowledge to solving statistical problems. As statistics instruction at the college level begins to change in response to calls for reform (e.g., Cobb 1992), there is an even greater need for appropriate assessment methods and materials to measure students' understanding of probability and statistics and their ability to achieve more relevant goals, such as being able to explore data and to think critically using statistical reasoning. This paper summarizes current trends in educational assessment and relates these to the assessment of student outcomes in a statistics course. A framework is presented for categorizing and developing appropriate assessment instruments and procedures.
1 The term "assessment" is often used in different contexts and means different things to different people. Most statistics faculty think of assessment in terms of testing and grading: scoring quizzes and exams and assigning course grades to students. We typically use assessment as a way to inform students about how well they are doing or how well they did in the courses we teach. An emerging vision of assessment is that of a dynamic process that continuously yields information about student progress toward the achievement of learning goals (NCTM 1993). This vision of assessment acknowledges that when the information gathered is consistent with learning goals and is used appropriately to inform instruction, it can enhance student learning as well as document it (NCTM 1993). Rather than being an activity separate from instruction, assessment is now being viewed as an integral part of teaching and learning, and not just the culmination of instruction (MSEB 1993).
2 Because learning statistics has often been viewed as mastering a set of skills, procedures, and vocabulary, student assessment has focused on whether these have been mastered, by testing students' computational skills or their ability to retrieve information from memory (Hawkins, Jolliffe, and Glickman 1992). Statistics items that appear on traditional tests typically test skills in isolation of a problem context and do not test whether or not students understand statistical concepts, are able to integrate statistical knowledge to solve a problem, or are able to communicate effectively using the language of statistics. Research has shown that some students who produce a correct "solution" on a test item may not even understand this solution or the underlying question behind it (Jolliffe 1991).
3 As goals for statistics education change to broader and more ambitious objectives, such as developing statistical thinkers who can apply their knowledge to solving real problems, a mismatch is revealed between traditional assessment and the desired student outcomes. It is no longer appropriate to assess student knowledge by having students compute answers and apply formulas, because these methods do not reveal the current goals of solving real problems and using statistical reasoning.
4 The current reform movement in educational assessment encourages teachers to think about assessment more broadly than "testing" and using test results to assign grades and rank students (e.g., Romberg 1992, Lesh and Lamon 1992). The recent report on assessment, Measuring What Counts (MSEB 1993), offers some basic principles of mathematics assessment. Two of these principles, rephrased to focus on statistics instead of mathematics, are:
5 These principles directly lead to the use of alternative forms of assessment to provide more complete information about what students have learned and are able to do with their knowledge, and to provide more detailed and timely feedback to students about the quality of their learning. Assessment approaches now being used in mathematics better capture how students think, reason, and apply their learning, rather than merely having students "tell" the teacher what they have remembered or show that they can perform calculations or carry out procedures correctly (e.g., EQUALS 1989). Some of these alternative methods -- portfolio assessment, authentic assessment, and performance assessment -- are described below.
6 Before selecting these or other alternatives to traditional testing, it is important to consider criteria for their appropriate use. In reviewing the National Council of Teachers of Mathematics (NCTM) standards for assessment of mathematics learning, Webb and Romberg (1992) provide criteria for assessment instruments and procedures that are relevant to the development or selection of statistical assessment materials as well. These criteria specify that good assessment should:
7 In considering these criteria, a broader view of assessment emerges, beyond that of testing and grading. In this view, assessment becomes an integral part of instruction, consists of multiple methods yielding complementary sources of information about student learning, and provides both the student and instructor with a more complete analysis of what has happened in a particular course.
8 Why should a statistics instructor consider implementing assessment methods other than traditional tests and quizzes in a college statistics course? I feel the most compelling reason is because traditional forms of assessment rarely lead to improved teaching and learning and offer us limited understanding of our students: what attitudes and beliefs they bring to class, how they think about and understand statistics, and how well they are able to apply their knowledge. Without this knowledge it is difficult to determine how to make changes or design instruction to improve student learning.
9 The primary purpose of any student assessment should be to improve student learning (NCTM 1993). Some secondary purposes for gathering assessment information include:
10 Selection of appropriate assessment methods and instruments depends on the purpose of assessment: why the information is being gathered and how it will be used. If the purpose of a particular assessment activity is to determine how well students in the class have learned some important concepts or skills, this may result in a different instrument or approach than if the purpose is to provide quick feedback to students so that they may review material on a particular topic.
11 Regardless of the specific purpose of an assessment procedure, incorporating an assessment program in our classes offers us a way to reflect about what we are doing and to find out what is really happening in our classes. It provides us with a systematic way to gather and evaluate information to use to improve our knowledge, not only of students in a particular course, but our general knowledge of teaching statistics. By using assessment to identify what is not working, as well as what is working, we can help our students become more aware of their own success in learning statistics, as well as become better at assessing their own skills and knowledge.
12 Because assessment is often viewed as driving the curriculum, and students learn to value what they know they will be tested on, we should assess what we value. First we need to determine what students should know and be able to do as a result of taking a statistics course. This information should be translated into clearly articulated goals and objectives (both broad and narrow) in order to determine what types of assessment are appropriate for evaluating attainment of these goals. One way to begin thinking about the main goals for a course is to consider what students will need to know and do to succeed in future courses or jobs. Wiggins (1992) suggests that we think of students as apprentices who are required to produce quality work, and are therefore assessed on their real performance and use of knowledge. Another way to determine important course goals is to decide what ideas you really want students to retain six months after completing your statistics class.
13 I believe that the main goals of an introductory statistics course are to develop an understanding of important concepts such as mean, variability, and correlation. We also want students to understand ideas such as the variability of sample statistics, the usefulness of the normal distribution as a model for data, and the importance of considering how a sample was selected in evaluating inferences based on that sample. We would like our students to be able to intelligently collect, analyze, and interpret data; to use statistical thinking and reasoning; and to communicate effectively using the language of statistics.
14 In addition to concepts, skills, and types of thinking, most instructors have general attitude goals for how we would like students to view statistics as a result of our courses. Such attitude goals include understanding how the discipline of statistics differs from mathematics, realizing that you do not have to be a mathematician to learn and understand statistics, believing that there are often different ways to solve a statistical problem, and recognizing that people may come to different conclusions based on the same data if they have different assumptions and use different methods of analysis (Garfield, in press).
15 Once we have articulated goals for students in our statistics classes, we are better able to specify what to focus on to determine what is really happening to students as they experience our courses. Are they learning to use statistical thinking and reasoning, to collect and analyze data, to write up and communicate the results of solving real statistical problems? Some goals may not be easy to assess individually, and may be more appropriately evaluated in the context of clusters of concepts and skills. For example, in order to evaluate whether students use statistical reasoning in drawing conclusions about a data set, students may need to be given the context of a report of a research study that requires them to evaluate several related pieces of information (e.g., distributions of variables, summary statistics, and inferences based on that data set). Determining if students have achieved the goal of understanding how to best represent a data set with a single number may require that students examine and evaluate several distributions of data.
16 There are several ways to gather assessment information, and it is often recommended that multiple methods be used to provide a richer and more complete representation of student learning (e.g., NCTM 1989). What all types of assessment have in common is that they consist of a situation, task, or questions; a student response; an interpretation (by the teacher or one who reviews the assessment information); an assignment of meaning to the interpretation; and reporting and recording of results (Webb 1993).
17 Different assessment methods to use in a statistics class include:
18 How are these different types of assessment evaluated? Quizzes and essay questions may be graded and assigned a single grade or score. More complex assessments such as projects and written reports may be evaluated using alternative scoring procedures. Although these procedures may be used to assign a grade, they may also be used to help students learn how to improve their performance, either on this task or future ones. Evaluation procedures for projects and reports may consist of:
19 For a list of other guidelines to use in scoring student projects or reports, see Hawkins et al. (1992).
20 For any type of assessment used to assign student grades, it is recommended that the scoring rubrics to be used, some model papers, and exemplars of good performance be shared with students in advance. These samples help provide students with insights into what is expected as good performance, allowing them to acquire standards comparable to the instructor's standards of performance (Wiggins 1992). Other assessment information such as minute papers or attitude surveys need not be given a score or grade, but can be used to inform the teacher about student understanding and feelings, as input for modifying instruction.
21 An assessment framework emerges from the different aspects of assessment: what we want to have happen to students in a statistics course, different methods and purposes for assessment, along with some additional dimensions. (This framework is based on an earlier version developed in collaboration with Iddo Gal.)
22 The first dimension of this framework is WHAT to assess, which may be broken down into: concepts, skills, applications, attitudes, and beliefs.
23 The second dimension of the framework is the PURPOSE of assessment: why the information is being gathered and how the information will be used (e.g., to inform students about strengths and weaknesses of learning, or to inform the teacher about how to modify instruction).
24 The third dimension is WHO will do the assessment: the student, peers (such as members of the student's work group), or the teacher. It is important to point out that engaging the student in self-assessment is a critical and early part of the assessment process, and that no major piece of work should be turned in without self-criticism (Wiggins 1992). Students need to learn how to take a critical look at their own knowledge, skills, and applications of their knowledge and skills. They need to be given opportunities to step back from their work and think about what they did and what they learned (Kenney and Silver 1993). This does not imply that a grade from a self-rating given by a student is to be recorded and used by the teacher in computing a course grade, but rather that students should have opportunities to apply scoring criteria to their own work and to other students so that they may learn how their ratings compare to those of their teacher.
25 The fourth dimension of the framework is the METHOD to be used (e.g., quiz, report, group project, individual project, writing, or portfolio).
26 The fifth dimension is the ACTION that is taken and the nature of the FEEDBACK given to students. This is a crucial component of the assessment process that provides the link between assessment and improved student learning.
27 This framework is not intended to imply that an intersection of categories for each of the four dimensions will yield a meaningful assessment technique. For example, measuring students' understanding of the concept of variability (WHAT to assess) for the PURPOSE of finding out if students understand this concept, using students in the group as assessors (WHO), with the METHOD being a quiz, and the ACTION/FEEDBACK being a numerical score, may not yield particularly meaningful and useful results. It also doesn't make sense to assess student attitudes towards computer labs (WHAT) by having peers (WHO) read and evaluate student essays (METHOD). Obviously, some categories of dimensions are more appropriately linked than others.
28 Another important point in applying this framework is that it is often difficult to assess a single concept in isolation of other concepts and skills. It may not be possible to assess understanding of standard deviation without understanding the concepts of mean, variability, and distribution. When given the task last fall, a group of statistics educators were unable to design an appropriate assessment for understanding the concept of "average" without bringing in several other concepts and skills.
29 Here are four examples of assessment activities illustrating the dimensions of the framework.
30 Example 1:
WHAT: Students' understanding of the Central Limit Theorem.
PURPOSE: To find out if students need to review text material or if the teacher needs to introduce additional activities designed to illustrate the concept (e.g., computer simulations of sampling distribution).
METHOD: An essay question written in class as a quiz, asking students to explain the theorem and illustrate it using an original example.
WHO: The instructor will evaluate the written responses.
ACTIONS/FEEDBACK: The instructor reads the essay responses and assigns a score of 0 (shows no understanding) to 3 (shows very clear understanding). Students with scores of 0 and 1 are assigned additional materials to read or activities to complete. Students with scores of 2 are given feedback on where their responses could be strengthened.
31 Example 2:
WHAT: Students' ability to apply basic exploratory data analysis skills.
PURPOSE: To determine if students are able to apply their skills to the collection, analysis, and interpretation of data.
METHOD: A student project, where instructions are given as to the sample size, format of report, etc. (e.g., Garfield 1993).
WHO: First the student completes a self-assessment using a copy of the rating sheet the instructor will use, which has been distributed prior to completing the project. Then, the instructor evaluates the project using an analytic scoring method (adapted from the holistic scoring method for evaluating student solutions to mathematical problems offered by Charles, Lester, and O'Daffer (1987)). A score of O to 2 points for each of six categories is assigned, where 2 points indicates correct use, 1 point indicates partially correct use, and O points indicates incorrect use. Explanations of each category are given below:
ACTIONS/FEEDBACK: Scores are assigned to each category and given back to students along with written comments, early enough in the course so that they may learn from this feedback in working on their next project.
32 Example 3:
WHAT: Students' perceptions of the use of cooperative group activities in learning statistics.
PURPOSE: To find out how well groups are working and to determine if groups or group procedures need to change.
METHOD: A "minute paper" is assigned during the last five minutes of class, where students are asked to write anonymously about their perceptions of what they like best and like least about their experience with group activities.
WHO: The teacher reads the minute papers.
ACTIONS/FEEDBACK: The teacher summarizes the responses and shares them with the class, and makes changes in groups or group methods as necessary.
33 Example 4:
WHAT: Students' understanding of statistical inference.
PURPOSE: To evaluate students' understanding of statistical inference and give a grade to students for a major portion of course work on this topic.
METHOD: A portfolio. Students are asked to select samples of their work from a three-week unit on inference to put in a portfolio folder. They select examples of written assignments, computer lab write-ups, group activities, and writing assignments, making sure that particular topics are represented (such as constructing and interpreting confidence intervals). Students select samples of their work, write a brief summary describing why they selected each piece, and give their own rating of the overall quality of their work in this unit.
WHO: The teacher reviews the portfolios and completes a rating sheet for each one. A scoring rubric is used, including categories such as the following:
ACTIONS/FEEDBACK: Portfolios are returned to students with completed rating sheets. The students are asked to review areas of weakness or errors made and may submit a follow-up paper demonstrating their understanding of these topics. The teacher may address some common mistakes/weaknesses in class before going on to the next topic. (For more information on portfolios, see Crowley 1993.)
34 Given the calls for reform of statistical education and the new goals envisioned for students, it is crucial that we look carefully at what is happening to students in our classes. Without scrutinizing what is really happening to our students and using that information to make changes, it is unlikely that instruction will improve and we will be able to achieve our new vision of statistical education. I would like to offer some suggestions for instructors contemplating alternative assessment procedures for their classes:
35 Finally, remember that assessment drives instruction, so be careful to assess what you believe is really important for students to learn. Use assessment to confirm, reinforce, and support your ideas of what students should be learning. Never lose track of the main purpose of assessment: to improve learning.
Angelo, T., and Cross, K. (1993), A Handbook of Classroom Assessment Techniques for College Teachers, San Francisco: Jossey-Bass.
Archbald, D., and Newmann, F. (1988), Beyond standardized testing: Assessing authentic academic achievement in the secondary school, Reston, VA: National Association of Secondary School Principals.
Charles, R., Lester, F., and O'Daffer, P. (1987), How to Evaluate Progress in Problem Solving, Reston, VA: National Council of Teachers of Mathematics.
Cobb, G. (1992), "Teaching Statistics," in Heeding the Call for Change: Suggestions for Curricular Action, ed. L. Steen, MAA Notes, No. 22.
Crowley, M. L. (1993), "Student Mathematics Portfolio: More Than a Display Case," The Mathematics Teacher, 87, 544-547.
EQUALS Staff, (1989), Assessment Alternatives in Mathematics, Berkeley, CA: Lawrence Hall of Science, University of California.
Garfield, J. (in press), "How Students Learn Statistics," International Statistical Review.
Garfield, J. (1993), "An Authentic Assessment of Students' Statistical Knowledge," in National Council of Teachers of Mathematics 1993 Yearbook: Assessment in the Mathematics Classroom, ed. N. Webb, Reston, VA: NCTM, pp. 187-196.
Garfield, J. (1991), "Evaluating Students' Understanding of Statistics: Development of the Statistical Reasoning Assessment," in Proceedings of the Thirteenth Annual Meeting of the North American Chapter of the International Group for the Psychology of Mathematics Education, Volume 2, ed. R. Underhill, Blacksburg, VA, pp. 1-7.
Hawkins, A., Jolliffe, F., and Glickman, L. (1992), Teaching Statistical Concepts, Harlow, Essex, England: Longman Group UK Limited.
Jolliffe, F. (1991), "Assessment of the Understanding of Statistical Concepts," in Proceedings of the Third International Conference on Teaching Statistics, Vol. 1, ed. D. Vere-Jones, Otago, NZ: Otago University Press, pp. 461-466.
Kenney, P., and Silver, E. (1993), "Student Self-Assessment in Mathematics," in National Council of Teachers of Mathematics 1993 Yearbook: Assessment in the Mathematics Classroom, ed. N. Webb, Reston, VA: NCTM, pp. 229-238.
Kulm, G., ed. (1990), Assessing Higher Order Thinking in Mathematics, Washington, DC: AAAS.
Lesh, R., and Lamon, S. (1992), Assessment of Authentic Performance in School Mathematics, Washington, DC: AAAS.
Mathematical Sciences Education Board (1993), Measuring What Counts: A Conceptual Guide for Mathematical Assessment, Washington, DC: National Academy Press.
National Council of Teachers of Mathematics (1993), Assessment Standards for School Mathematics: Working Draft, Reston, VA: NCTM.
National Council of Teachers of Mathematics (1989), Curriculum and Evaluation Standards for School Mathematics, Reston, VA: NCTM.
Pandey, T. (1991), A Sampler of Mathematics Assessment, Sacramento, CA: California Department of Education.
Romberg, T., ed. (1992), Mathematics Assessment and Evaluation: Imperatives for Mathematics Education, Albany: State University of New York Press.
Stenmark, J. (1991), Mathematics Assessment: Myths, Models, Good Questions, and Practical Suggestions, Reston, VA: NCTM.
Webb, N. (1993), "Assessment for the Mathematics Classroom," in National Council of Teachers of Mathematics 1993 Yearbook: Assessment in the Mathematics Classroom, ed. N. Webb, Reston, VA: NCTM, pp. 1-6.
Webb, N., and Romberg, T. (1992), "Implications of the NCTM Standards for Mathematics Assessment," in Mathematics Assessment and Evaluation: Imperatives for Mathematics Education, ed. T. Romberg, Albany: State University of New York Press, pp. 37-60.
Wiggins, G. (1990), "The Truth May Make You Free, but the Test May Keep You Imprisoned, AAHE Assessment Forum, 17-31.
Earlier versions of this paper were presented at meetings of the American Statistical Association (August, 1993) and the American Educational Research Association (April, 1994).