Dan Nettleton

Journal of Statistics Education v.6, n.2 (1998)

Copyright (c) 1998 by Dan Nettleton, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

Key Words: Bivariate data; Confidence region; Paired comparisons; Scatterplot; Simultaneous confidence intervals.

## Abstract

Scores of 1997 Big Ten Conference men's basketball games involving the University of Iowa Hawkeyes are analyzed with a series of scatterplots accompanied by formal bivariate statistical inference. The analyses reveal that the Hawkeyes' defensive performance is largely unaffected by the site of the game, while offensive performance dips significantly in games played on opposing teams' courts.

# 1. Introduction

1 Most students are familiar with the concept of home court advantage in college basketball. From small junior colleges to large universities, basketball teams tend to boast a greater winning percentage in games played on their home floor than in games hosted by the opposition. Since basketball can be considered in two basic phases, offense and defense, it is natural to ask how sensitive these phases are to game location. Such information may prove useful to players and coaches as they prepare for the upcoming season. We attempt to answer this question for one team based on their performance during one season of conference play.

2 The analyses presented can be conducted on any team's scores as long as they play several teams in a home and away format. This type of data should be readily available at most colleges and universities belonging to a conference. Students are likely to find the data and analyses interesting, especially if they feel they have a hand in creating the home court advantage. I do not know if conclusions drawn for the Iowa dataset are typical or an exception. It is certainly not the case that all schools have constant defensive effectiveness but improved offensive performance at home.

# 2. The Dataset

3 The University of Iowa Hawkeyes' 1997 Big Ten Conference season serves as a fine example of the home court advantage phenomenon. The Hawks were beaten only once in nine games played in Iowa City's Carver Hawkeye Arena. Their record on opposing teams' courts was much less impressive -- only four road wins to go with five road losses. The site, opponent, and score of each of these 18 games are contained in Table 1.

Table 1: The Scores of Iowa's 1997 Big Ten Basketball Games

 HOMEPoints Scored by ... AWAYPoints Scored by ... Opponent Iowa Opponent Iowa Opponent Illinois 82 65 51 66 Indiana 75 67 - - Michigan 80 75 71 79 Michigan State - - 67 69 Minnesota 66 68 51 66 Northwestern 72 55 75 59 Ohio State 76 62 69 56 Penn State 81 55 69 57 Purdue 84 62 59 56 Wisconsin 78 53 48 49

4 As most collegiate sports fans know, the Big Ten actually contains eleven teams. To allow sufficient flexibility for scheduling non-conference opponents in 1997, each Big Ten team played a complete two-game home-and-away series with only eight Big Ten opponents. The other two opponents were played only once during the season. Because the Hawks played Indiana and Michigan State each one time in 1997, the scores of the away game against Indiana and the home game with Michigan State are unobserved data in Table 1.

# 3. Classroom Use

5 While the analyses presented in the next section are most appropriate for a standard applied multivariate statistics course, the dataset is suitable for an elementary statistics course as well. By focusing on either offensive or defensive performance alone, the data can be used to illustrate the univariate paired t-test with corresponding confidence interval and/or the sign test. A question like "Does game location (home or away) affect offensive performance?" captures student interest. With some prompting, students are quick to point out the need to control for opponent in the analyses. Hence, pairing the data is perceived as a natural course of action.

6 In an applied multivariate course, I have asked the following somewhat open-ended question.

Dr. Tom Davis, University of Iowa men's basketball coach, is interested in knowing about differences in how his team performed at their home gym compared to on the road in Big Ten Conference games. (The phrase "on the road" refers to games played at the opposing school's gymnasium.) Specifically, he would like to know if his team's ability to score points and prevent points from being scored by the other team varies according to whether the game is played at the University of Iowa. The table below contains the final scores of sixteen Big Ten basketball games played by the University of Iowa this season. Note that the sixteen games consist of a home game and a road game with each of eight opponents. Analyze these data for Coach Davis. Include appropriate tests, confidence regions or intervals, graphs, etc. to support your conclusions. Be sure to provide a summary in terms the coach can understand (his Ph.D. is in history).

7 The point of the final sentence is not to disparage historians, basketball coaches, or any person, but rather, to encourage students to communicate their conclusions intelligibly to people who might not speak the technical language of statistics. Note that the question does not specify any particular test, confidence interval, or significance level. While this general phrasing makes grading more difficult, it adds realism by placing the student in the consultant's role. I supply the students with the scores of only sixteen games; the games with Indiana and Michigan State are excluded. The existence of these two scores and their utility are discussed in class.

# 4. Data Analysis

8 The scatterplot in Figure 1 provides a nice pictorial summary of the information contained in Table 1. Each point can be labeled by opponent for a more complete (although more cluttered) view of the data. Figure 1 clearly illustrates two facts about the Hawkeyes' 1997 Big Ten season. First, they had a fairly good season, since most points fall above the reference line. Second, their performance at home was generally superior to their performance on the road, since the points corresponding to home games have a greater tendency to fall above the reference line than the points corresponding to away games.

Figure 1 (5.1K gif)

Figure 1. Iowa's Big Ten Games.

9 To determine impact of game location on the two aspects of the Hawks' performance, we use the natural bottom-line measure of offensive and defensive effectiveness at our disposal, i.e., points scored and points allowed, respectively. These two measures taken on any one game are dependent, since some games are very high-scoring affairs where both points scored and points allowed are likely to be high, while others are defensive battles in which neither team scores many points. In addition, the two measures may depend heavily on the opposing team's skill level and/or style of play. Thus, it is important to consider the bivariate nature of the data and to control for varying opponents in the analyses to follow.

10 Figure 2 and Figure 3 are scatterplots of points scored at home against points scored on the road and points allowed at home versus points allowed on the road, respectively. In both plots, the data are paired according to opponent. Because of the unobserved data, the Indiana and Michigan State scores are excluded from these plots and from the subsequent analyses.

Figure 2 (5.3K gif)

Figure 2. Offensive Performance at Home Versus Away.

Figure 3 (5.3K gif)

Figure 3. Defensive Performance at Home Versus Away.

11 These figures clearly suggest an answer to the question of interest. In Figure 3, most points fall quite near the reference line, suggesting that the points allowed are nearly constant for a given opponent. In contrast, the points of Figure 2 tend to fall above the reference line, indicating greater offensive production for the Hawks when playing at home.

12 The conclusions suggested by the exploratory analysis above can be confirmed formally using multivariate statistical techniques. For each opponent, consider the two-dimensional difference vector whose first and second components are points scored by Iowa at home less points scored by Iowa away and points allowed by Iowa at home less points allowed by Iowa away, respectively. These eight vectors can be considered a simple random sample from some distribution with unknown mean and variance-covariance matrix . A value of in the fourth quadrant would suggest that both Iowa's offense and defense are more effective in home games.

13 Assuming that the underlying distribution is bivariate normal, the techniques outlined in Section 6.2 of Johnson and Wichern (1992) can be used to construct a 95% confidence region for and/or simultaneous 95% confidence intervals for and . The sample mean and variance-covariance matrix of the eight vectors are

The point estimate of , , suggests that playing at home benefits Iowa's offense an average of nearly 16 points while, perhaps, reducing defensive effectiveness slightly (around a single point on average). A 95% confidence region for is given by the set of points (x,y) satisfying

where 5.14325 is the 0.95 quantile of an F-distribution with 2 and 6 degrees of freedom.

14 This region, outlined in Figure 4, is the solid ellipse centered at (15.750, 0.875) with minor axis of length 9.13 and major axis of length 29.79, lying along the line y = 0.167 x - 1.755. The position of the confidence ellipse indicates that is positive while may very well be zero. This confirms the message of Figures 2 and 3; i.e., the Hawks' home offensive performance is generally superior to their road performance, while defensive effectiveness remains fairly constant.

Figure 4 (6.1K gif)

Figure 4. A 95% Confidence Ellipse.

15 Bonferroni simultaneous confidence intervals tell a similar story. We can be 95% confident that both (3.69, 27.81) and (-3.33, 5.08). Hence, the true mean offensive benefit to playing at home is somewhere between 4 and 28 points, roughly speaking. According to the data, it is feasible that the impact on the defense is neutral. It is interesting to note that the simultaneous confidence intervals do not rule out the possibility , which would contradict the notion of an Iowa home court advantage. However, x > y for all points in the 95% confidence region for , supporting the impression of home court advantage conveyed by the data.

16 To validate the analysis, the assumption of bivariate normality should be verified. Using the Q-Q plot correlation coefficient test for normality described in Section 4.6 of Johnson and Wichern (1992), the hypothesis of univariate normality cannot be rejected at the 0.10 level of significance for either of the variables considered marginally. However, with only eight points, all but severe violations of normality are likely to go undetected. In addition, marginal normality does not guarantee the bivariate normality needed for the techniques above. Methods for assessing bivariate normality directly suffer from the same lack of power problematic in the univariate case. Although the basic conclusions of the analysis are not in doubt, the dataset could be used as motivation for multivariate nonparametric techniques in an advanced course.

# 5. Conclusion

17 Most students have an interest in sports, either as a participant, spectator, or both. Among collegiate sports, basketball in certainly one of the most popular. Hence, many students are likely to find basketball score data appealing, especially if that data can be used to answer an interesting question about a team with which they are familiar.

# 6. Getting the Data

18 The file hawks.dat.txt contains the raw data. The file hawks.txt is a documentation file containing a brief description of the dataset.

# Appendix - Key to Variables in hawks.dat.txt

      Columns
1 - 14  Iowa's opponent
17  Site of the game
(H and A stand for home (Iowa City) and away)
19 - 20  Points scored by Iowa
22 - 23  Points scored by Iowa's opponent
(points allowed by Iowa)

Values are aligned and delimited by blanks.

# Reference

Johnson, R. A., and Wichern, D. W. (1992), Applied Multivariate Statistical Analysis (3rd ed.), New York: Prentice Hall.

Dan Nettleton
924 Oldfather Hall
Department of Mathematics and Statistics