Robin H. Lock
St. Lawrence University
Journal of Statistics Education v.5, n.3 (1997)
Copyright (c) 1997 by Robin H. Lock, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.
Key Words: Football; Predictions; Wagering.
Four datasets (nfl93.dat.txt, nfl94.dat.txt, nfl95.dat.txt, nfl96.dat.txt) contain National Football League game results for recent seasons. In addition to game scores, the data give oddsmakers' pointspreads and over/under values for each game.
1 The datasets described in this article contain information for all regular season and playoff games played by National Football League (NFL) teams during the 1993, 1994, 1995, and 1996 seasons. Data for each game consist of the date, visiting team name, home team name, and the number of points scored by each team. An indicator variable shows which games were tied at the end of regulation time and required an extra overtime period. I also include a pointspread and over/under line which are used for betting purposes and may be viewed as pregame estimates of the margin of victory and total points scored. More information on interpreting the betting lines can be found in the next section.
2 Although the scores alone can be used to address questions
of interest to sports fans (e.g., the magnitude of any home
field advantage), the additional information on pointspreads
and over/under values creates new dimensions for exploring
prediction models with immediate applications. I'll discuss
some of the basic ideas behind these concepts below and
refer the reader to the National Sports Services, Inc.,
Introduction To Las Vegas Sports Betting found on the
World Wide Web at
for more detailed explanations and examples.
3 Before each week's games, the oddsmakers (in the case of our data, The Gold Sheet) attempt to forecast the margin of victory for each game. These predictions are known as the pointspreads. Their intent is to equalize two competing teams by setting an amount to add to (or subtract from) one team's score before determining the "winner against the spread." In these datasets, the pointspread adjustments are made to the home team's score. Thus a pointspread of -6.5 indicates that the home team is favored by 6.5 points. A bettor who chooses the favored home team will only win if the home team wins by 7 or more points; a victory by the visitor or a loss by 6 or fewer points means the favorite loses. A positive pointspread indicates that the home team is the "underdog" and needs a few extra points to compete on an even basis with its opponent that week.
4 Here are a couple of examples. On September 5, 1993, the Arizona Cardinals played a game at Philadelphia. The pointspread was -6.5, so Philadelphia was favored by almost a touchdown. The final score was Arizona 17, Philadelphia 23, and thus Arizona was the "winner" against the betting line. Note that in this case (the first in nfl93.dat.txt), the oddsmakers did an excellent job of forecasting the eventual game result.
5 That same day, Minnesota was playing at the LA Raiders with a pointspread of +2.5, indicating that visiting Minnesota was a slight favorite. The game ended with the LA Raiders winning by a score of 24-7. Here we see that the pointspread did a fairly poor job of predicting the winner and the closeness of the game.
6 In reality, the oddsmaker's goal in determining a pointspread is not to simply predict the winner and margin of victory. The actual aim is to set the pointspread on any given game to entice equal amounts of wagering both for and against the favored team. When this occurs, the winners can be paid with the losers' money, while the bookmaker's profit comes from a small fee (called "vigorish" -- usually 10%) collected on all bets. Thus to "break even," a longtime player will need to correctly pick the winner about 52.4% of the time. In actual practice, the "line" may fluctuate slightly during the week before a game in order to attract money to both sides of the bet.
7 A similar philosophy drives the over/under line which is somewhat less common but easier to understand than the pointspread. The over/under value represents a prediction of the total points scored by both teams in a game. Again the bettor may choose to wager on whether the two teams will combine to score more (over) or fewer (under) points than the over/under line. In our examples above, the over/under line in the Arizona/Philadelphia game was 37.5 points, so a wager of "over" would be a winner, while the "unders" won against a 36.0 point line in the Minnesota/LA Raiders game. With either the pointspread or over/under, a tie between the line and actual game result (known as a "push") generally gives everyone their money back.
8 The NFL scores alone can be used to investigate a variety of questions of interest to sports-minded students. What are typical NFL scores and margins of victory? Is there a home field advantage, and, if so, how can it be quantified? Is there a correlation between the scores of the home team and the visitor?
9 Football fans are well aware that the primary scoring units are a field goal (worth 3 points) and a touchdown with an extra point (worth 7 points). Thus one might expect to see scores like 14, 17, or 24 points more often than 11, 18, or 25. Does this really occur? Similar questions can be raised about the distributions of actual margins of victory and the pointspreads themselves. Do they tend to cluster around 3, 7, and 10 points? Are these trends still apparent in the total points and over/under values?
10 How do the pointspreads relate to the actual game results? Students should find considerably more variation in actual game margins and point totals (when compared to more stable pointspreads and over/under values), but are the distributions centered at similar locations? Note that a departure in the locations of the over/under and actual game total distributions might give the bettor an opportunity to gain an advantage over the oddsmaker.
11 How often does the favored team beat the spread? The oddsmakers would hope for a 50-50 split between favorites and underdogs winning against the spread. Does this occur in practice? Similar questions can be raised about the over/under line and home teams versus visitors.
12 Can we predict future game scores from past game results? This may require somewhat more sophisticated models and/or consideration of time series techniques. Can students come up with a scheme that can pick the winners of games with reliability as good as (or better than) the pointspreads? Even without sophisticated models, students may propose their own betting schemes (e.g., always take the favorite, unless the pointspread is more than 7 points or the underdog is playing at home on a Monday night). The separate files for each season allow models to be fit and tuned for data from one season and then tested on data from another season.
13 The file
nfl.txt is a
documentation file containing a brief description of the
datasets. The following files contain the raw data:
Columns 1 - 9 Date of game 11 - 24 Visiting team name 26 - 27 Visiting team score 30 - 43 Home team name 45 - 46 Home team score 49 - 49 Indicator for overtime games (o or -) 51 - 56 Pointspread (see story for explanation) 61 - 64 Over/Under (see story for explanation)Values are aligned and delimited by blanks. The teams are the same each year with the following exceptions:
LA Raiders became Oakland in 1995.
LA Rams became St. Louis in 1995.
Carolina and Jacksonville were new teams added in 1995.
Cleveland became Baltimore in 1996.
The last game in each dataset is the Super Bowl, which is played at a neutral site.
The Gold Sheet,
National Sports Services, Inc.,
Introduction To Las Vegas Sports Betting,
Robin H. Lock
St. Lawrence University
Canton, NY 13617