Allan J. Rossman

Dickinson College

Thomas H. Short

Villanova University

Matthew T. Parks

Boston University

Journal of Statistics Education v.6, n.3 (1998)

Copyright (c) 1998 by Allan J. Rossman, Thomas H. Short, and Matthew T. Parks, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

**Key Words**:
Highest posterior density interval; Improper prior distribution.

Classical estimators for the parameter of a uniform distribution on the interval are often discussed in mathematical statistics courses, but students are frequently left wondering how to distinguish which among the variety of classical estimators are better than the others. We show how classical estimators can be derived as Bayes estimators from a family of improper prior distributions. We believe that linking the estimation criteria in a Bayesian framework is of value to students in a mathematical statistics course, and we believe that the students benefit from the exposure to Bayesian methods. In addition, we compare classical and Bayesian interval estimators for the parameter and illustrate the Bayesian analysis with an example.

1 The continuous uniform distribution is widely studied in mathematical statistics textbooks and courses in part because classical estimation criteria produce different estimators for the parameter. Letting have independent uniform distributions on the interval , the likelihood function is for .

2 The maximum likelihood estimator of is , while the minimum variance unbiased estimator is . Furthermore, among estimators of the form , the one which minimizes the mean squared error is . These results can be found in many textbooks on mathematical statistics, including Freund (1992), Hogg and Craig (1978), and Larsen and Marx (1986).

3 While we find this example useful for helping students discover that classical estimation criteria can in fact lead to different estimators, we nevertheless feel a sense of unease when students naturally ask which estimator is ``better.'' At this point we are tempted to turn from the competing desirability criteria of the classical approach to the unifying philosophy and analysis strategy of a Bayesian framework. As we will show, this example is ideal in that a Bayesian analysis with a simple family of improper prior distributions provides a direct link among several classical estimators.

4 Moreover, we contend that students of mathematical statistics should
explore principles of Bayesian inference for a variety of reasons. One
is that the development and use of Bayesian methods are on the
increase. A growing number of papers appearing in statistical forums
such as the *Journal of the American Statistical Association*
represent the Bayesian approach, and even some applied statisticians
have adopted a Bayesian viewpoint. *The American Statistician*
recently presented a collection of papers by Berry
(1997), Moore (1997), and
Albert (1997), with accompanying discussion,
exploring the value of a Bayesian perspective in an introductory
statistics course.

5 A second reason for encouraging students to study the Bayesian paradigm is that it models the process of science. Berry (1997) writes that ``science progresses with scientists altering their opinions as information accumulates, and with scientists trying to persuade other scientists of the correctness of their opinions.'' Eliciting opinions, updating after observing data, and quantifying uncertainty using probability distributions are all part of Bayesian statistics.

6 A third motivation for studying Bayesian statistics is that students might better understand classical procedures and estimation criteria by studying them in comparison to Bayesian methods.

7 Few undergraduate texts present a Bayesian analysis of the continuous
uniform distribution, although DeGroot (1986),
Lee (1989), and DeGroot
(1970) present the Pareto distribution as a conjugate family of
prior distributions. One can adopt a simpler form for the prior
distribution by considering improper priors which do not integrate to
one but still perform the same function as a proper prior
distribution. For instance, if one chooses the flat improper prior
distribution of the form
for ,
the posterior distribution is proportional to the likelihood
function,
for
.
This posterior distribution is proper provided that *n* > 1,
with the constant of proportionality turning out to be
.
Assuming a quadratic loss function, the Bayes estimator
equals the posterior mean

which exists when

8 In fact, one can derive all estimators of this form from a Bayesian
perspective. Consider the family of prior distributions having the form
for .
These distributions are improper for any real *k*.
The resulting posterior distribution is
for
,
which is proper when *k* + *n* > 1 with the constant
of proportionality equaling
.
The posterior mean exists when *k* + *n* > 2, producing a
Bayes estimator of

Notice that this estimator corresponds to the minimum variance unbiased estimator when

9 Positive values of *k* can be interpreted to represent *k*
unobserved uniform random variables on the interval
.
Larger values of *k* put more prior weight on smaller values of
and therefore produce lower posterior estimates.

10 One can also compare classical and Bayesian
interval estimators of the parameter
.
The classical
confidence interval for
is
since
.
From the Bayesian perspective, a
highest probability density (HPD) interval for
,
using the family of improper prior
distributions described above, turns out to be
since
.
The classical and Bayesian interval
estimators are therefore the same when *k* = 1.

11 The choice of *k* = 1 comes highly recommended from the
Bayesian literature because it corresponds to the Jeffreys' prior,
which is in this case a standard noninformative prior distribution for
a scale parameter. The Jeffreys' prior is noninformative because it is
invariant to parameter transformations. For example,
may be transformed to obtain standard deviation
or variance
.
The prior
is equivalent to priors
or
on the standard deviation or scale parameters, respectively. Furthermore,
is noninformative on the ratio scale -- for a given constant *c*, it
implies that all intervals of the form
are equally likely for any choice of *x*.
See, for example, Box and Tiao (1973) for
more information about Jeffreys' priors.

12 Larger values of *k* in the prior distribution represent increased
prior certainty about the value of the parameter, and thus produce narrower
posterior HPD intervals.

13 As an example suppose that *n* = 12 and that the observed data are:

Starting with a flat improper prior distribution for corresponding to

Figure 1 (6.0K gif)

Figure 1. Prior and Posterior Distributions for *k* = 0.

Table 1. Bayes Estimates for Various Values of *k*

k |
Bayes estimate(posterior mean) |
Upper bound of95% HPD interval |
Bayesianinterpretation |
Classicalinterpretation |

-2 | 36.23 | 44.92 | ||

-1 | 35.78 | 43.45 | ||

0 | 35.42 | 42.28 | flat prior | |

1 | 35.13 | 41.33 | Jeffreys' prior | confidence interval |

2 | 34.88 | 40.55 | unbiased estimate | |

3 | 34.68 | 39.88 | minimum MSE estimate | |

4 | 34.50 | 39.32 |

Figure 2 (6.2K gif)

Figure 2. Bayes Estimates and 95% HPD Interval Upper Bounds.

14 We have demonstrated that a Bayesian framework unites the various classical estimators produced by different estimation criteria for the parameter of a continuous uniform distribution. The Bayes estimators arise from a family of improper prior distributions and highlight both differences and similarities of Bayesian and classical analyses.

15 We believe that this comparison can help students of mathematical statistics
both to gain valuable experience with Bayesian methods and also to understand
classical estimation criteria more fully.

The authors thank Jerry Moreno, Jeff Witmer, three anonymous referees, and the editor for comments that improved the quality of this article.

Albert, J. (1997), "Teaching Bayes' Rule: A Data-Oriented Approach," The American Statistician, 51, 247-253.

Berry, D. A. (1997), "Teaching Elementary Bayesian Statistics with Real Applications in Science," The American Statistician, 51, 241-246.

Box, G. E. P., and Tiao, G. C. (1973), Bayesian Inference in Statistical Analysis, New York: John Wiley and Sons, Inc.

DeGroot, M. H. (1970), Optimal Statistical Decisions, New York: McGraw-Hill, Inc.

----- (1986), Probability and Statistics (2nd ed.), Reading, MA: Addison-Wesley Publishing Company.

Freund, J. E. (1992), Mathematical Statistics (5th ed.), Englewood Cliffs, NJ: Prentice Hall.

Hogg, R. V., and Craig, A. T. (1978), Introduction to Mathematical Statistics (4th ed.), New York: Macmillan Publishing Co., Inc.

Larsen, R. J., and Marx, M. L. (1986), An Introduction to Mathematical Statistics and Its Applications (2nd ed.), Englewood Cliffs, NJ: Prentice Hall.

Lee, P. M. (1989), Bayesian Statistics: An Introduction, New York: Oxford University Press.

Moore, D. S. (1997), "Bayes for Beginners? Some Reasons to Hesitate," The American Statistician, 51, 247-253.

Allan J. Rossman

Department of Mathematics and Computer Science

Dickinson College

Carlisle, PA 17013

Thomas H. Short

Department of Mathematical Sciences

Villanova University

Villanova, PA 19085

Matthew T. Parks

Department of Political Science

Boston University

Boston, MA 02215

Return to Table of Contents | Return to the JSE Home Page