Dexter C. Whittinghill
Rowan University
Robert V. Hogg
University of Iowa
Journal of Statistics Education Volume 9, Number 2 (2001)
Copyright © 2001 by Dexter C. Whittinghill and
Robert V. Hogg, all rights reserved.
This text may be freely shared among individuals, but it
may not be republished in any medium without express
written consent from the authors and advance notification
of the editor.
Key Words: Confidence intervals; Efficiency; Estimation; Maximum likelihood; Sufficiency; Tests of hypotheses.
We explore the varied uses of the uniform distribution on as an example in the undergraduate probability and statistics sequence or the mathematical statistics course. Like its cousin, the uniform distribution on , this density provides tractable examples from the topic of order statistics to hypothesis tests. Unlike its cousin, which appears in many probability and statistics books, this uniform is less well known or used. We discuss maximum likelihood estimators, likelihood ratio tests, confidence intervals, joint distributions of order statistics, use of Mathematica®, sufficiency, and other advanced topics. Finally, we suggest a few exercises deriving likelihood ratio tests when the range is unknown as well, or for the uniform on .
Anyone who has taught undergraduate probability and statistics knows that it is sometimes difficult to find good examples of families of density functions that can be used in the classroom or for challenging homework for the better students. The 'common' families of one-parameter probability functions (such as the geometric or exponential) are almost surely used as an example or in one of the textbook problems. When trying to 'cook up' an example, often the instructor finds that the author of the book has already used the recipe! For illustration, we often think of a uniform density on , but it is in nearly every probability and statistics text. When instructors get more creative ('the author will never have thought of this!'), they find that their wonderful mass or density function has a maximum likelihood estimator with a nearly intractable distribution. Exasperation sets in soon thereafter.
Here is a simple distribution that appears in two textbooks by the second author. It is also in Miller and Miller (1999) and Mood, Graybill, and Boes (1974), but not in many other texts. It is the uniform distribution on , with density
(1) |
I_{[a,b]}(x) is an indicator function that is 1 when the argument is in the interval, and 0 when it is not. Other texts sometimes use the uniform on other fixed ranges, such as , but this is essentially the same function. Often more interesting problems can be created when the uniform distribution is on , with both and unknown, and we will discuss these briefly.
The density (1) is a location-parameter alternative to the uniform on , and it provides a rich assortment of material for discussions or examples. There are many unbiased and consistent estimators of to compare, including the familiar as the method of moments estimator. The rather simple likelihood function yields an interesting uncountable set of maximum likelihood estimators. The density also provides fairly simple distributions for the order statistics. Using a modified likelihood ratio test, sensible and intuitive tests of hypotheses and confidence intervals can easily be derived. Finally, for more advanced students, the sufficiency results are not straightforward (we have joint sufficiency), and completeness and minimal sufficiency can also be discussed.
When teaching probability and statistics, often the greatest need for help is in finding examples when the course reaches 'parametric point estimation.' That is, we want to estimate the value of the unknown parameter or parameters in a family of distributions, and to make statistical inferences we need to know the distributions of the estimators. Before presenting the flexibility of the uniform distribution on , we begin with a discussion of how such a model might arise.
Consider the situation where you are waiting to catch a bus. You want to estimate , the average number of minutes that the bus takes to travel from the previous stop to your stop (you can see that other stop up the road). Each day you can take an observation on how long the bus takes and use the observations to estimate . Let us assume that under 'usual conditions' the variation induced by the traffic causes the bus to take between -minus and -plus a fixed unit of time, say one-half a minute for convenience, to arrive at your stop. Moreover, it is reasonable to assume that the distribution is uniform. If we let the random variable X be the number of minutes for the trip, then X has a probability density function (density) given by (1).
We note two things. First, the interval can be open or closed, and we will use a closed interval to avoid discussion of supremum versus maximum. Second, it is an easy homework problem for students to show that the mean of the distribution is , the variance is 1/12, and the cumulative distribution function is
The joint density for a random sample of n observations is the deceivingly simple
where are the order statistics. If an instructor does not stress the derivation of distributions of order statistics (we recommend covering it), the distributions of the minimum Y_{1} and maximum Y_{n}, along with their joint distributions, can be given to the students. The students can also simply work with data. (See Hogg and Tanis 1997, problems 6.1.15 and 10.1.11.)
In general, we present our students with different ways of finding estimators in probability and statistics. Because is the mean of the distribution for X, our students usually suggest as an estimator (which we will call W_{1}). This intuitive result is the method of moments estimator of and leads nicely into a discussion of that topic. is also the least-squares estimator of . (See Hogg and Tanis 1997, problem 6.1.15.)
If we are lucky, our students will suggest other
intuitive estimators, such as the median (which we will
call W_{3}) or the midrange, which is the
average of the minimum and maximum order statistics, say
The likelihood function for a random variable is the joint density of the random sample, but we consider it as a function of for a given sample, not as a function of the sample for a given . We seek maximum likelihood estimators (MLEs) because they can possess useful properties. For instance, if is an MLE for , and h is a function with a single-valued inverse, then is the MLE of . (See Casella and Berger 1990, Theorem 7.2.1, for a more advanced invariance property, where h is any function, not necessarily with a single-valued inverse.) Also, under certain regularity conditions, the sequence of MLEs (one for each n) is best asymptotically normal, or BAN. Textbooks are full of examples for which these results apply, but they sometimes lack examples that fail to satisfy regularity conditions. One of the 'charms' of our example is that it does not meet the regularity conditions needed for many results.
The likelihood function of a random sample from a distribution with density (1) is
Unlike many other situations, here the MLE of is not unique. A careful look at the likelihood function shows that any statistic satisfying
is an MLE of .
This includes the midrange, W_{2}.
By way of examples, the student should demonstrate that
and W_{3} (the median) are not
maximum likelihood estimators of
even though they are unbiased. Of course every estimator
of the form
,
with
and
,
Which of our three unbiased estimators, W_{1}, W_{2}, or W_{3}, is best? Are there better estimators? If class discussion has not already led to the notion that we are searching for unbiased estimators with small variance, then the idea of minimum variance unbiased estimators or efficiency can be raised here. When comparing W_{1}, W_{2}, and W_{3}, we must calculate the mean and variance of each of the three estimators. The case of is straightforward, as it is a consequence of standard formulas like .
Each of the midrange (W_{2}) and the median (W_{3}) requires knowledge of distributions of order statistics. Depending on the level of the students or the book, the calculations for the cases involving order statistics can be assigned or not (with the results being given to the students) in various combinations. For the better students these are challenging homework problems, and we note some of the difficulties.
Because
where F and f are the respective
distribution and density function of the uniform
distribution on [0,1]. Using
It is easy to show that and , so and . Thus
making W_{2} unbiased. Also it is an easy exercise to find the variances, namely
To find the Var[W_{2}], we need
(4) |
Hence the joint density for Z_{1} and Z_{n} is
after some straightforward integration. Thus
Finally, the variance of
Remarks. The transformation used employed the function , which is a pivotal quantity, because it made the distribution of Z_{i} independent of , whatever was. Pivotal quantities, generally discussed in the more rigorous undergraduate texts, are also very helpful in constructing confidence intervals. (See Casella and Berger 1990 or Mood, Graybill, and Boes 1974 for further discussion.) Another way of finding Var[W_{2}] is to first find the distribution of W_{2} using the 'distribution function technique' (see page 222 of Hogg and Tanis 1997).
Again in this discussion, we let
.
For
which is that of a beta distribution with . Hence ; so . Moreover,
If the students have not studied the beta distribution, a computer algebra system (CAS) such as Mathematica® or Maple® can be used to establish the results.
For n = 2m, the median is
Moreover, the marginal distributions of Z_{m} and Z_{m+1} are, respectively,
and
These are beta distributions with , and , , respectively. Hence
and
Also,
As before, to find Var[W_{3}] we must evaluate
The integration is straightforward, or a CAS can be used. Thus
Finally, algebra yields
In most cases you would not make the students do all of the work indicated above, and again Mathematica® or Maple® could be used. However, after they have suggested the various estimators, and found the mean and variance of at least two (including ), an instructor could present all of the variances and have the students compare them by calculating the relative efficiency of one estimator to another. Having the students derive or check some of the efficiency results requires them to work with inequalities, something they may not have done for quite some time in their undergraduate careers.
Table 1 gives the variances and shows the relative
efficiencies of the estimators
,
the midrange (W_{2}), and the median
(W_{3}). It contains some interesting
results. Of course, for
Table 1. Variances and Relative Efficiencies of the Mean, Median, and Midrange
Estimator: | W_{1}, mean | W_{2}, midrange | W_{3}, median |
Variance: | n even:
n odd: |
||
Efficiency of W_{1} to above: | n even:
n odd: |
||
Efficiency of W_{2} to above: | n even:
n odd: |
NOTE: See Hogg and Tanis (1997), problem 10.1.11, for a
comparison of the mean, median, and midrange for
Remark. If the median
is to be used for an odd sample size, don't throw away an
observation! Statisticians do not recommend throwing away
or ignoring any data. Instead, for a sample of size
Finally, when discussing the Cramér-Rao Lower Bound, this example can be used as one where the conditions of the theorem are not met: the domain depends on the parameter being estimated. The students can also show that all three estimators are consistent; each is unbiased and its variance converges to zero as n increases. In that regard, it is interesting to note that the variances of the mean and median are of order 1/n, while that of the midrange is 1/n^{2}. That is, for large n, the variance of the midrange is much smaller.
The topic of sufficiency is very important for finding
good estimators, but it can be very difficult for students
to grasp. It is important because sufficient statistics
are associated with good estimators. Unique MLEs are
functions of sufficient statistics, if they exist (see Rice
1995, p. 284, Corollary A). Unbiased estimators that are
functions of sufficient statistics have smaller variances,
and unbiased estimators derived from sufficient statistics
with 'complete' families are uniformly minimum variance
unbiased estimators. The concept that a sufficient
statistic contains all of the information from the sample
necessary for estimating the parameter sounds simple and
intuitive. However, the definition of a sufficient
statistic is very technical and not always written
intuitively. Many authors use
Let us assume that sufficiency has been defined in the course, and that a one-statistic-one-parameter factorization theorem like Theorem 1 (p. 318) of Hogg and Craig (1995) has been presented. The random variable of (1) provides a good second or third example for discussing sufficiency. The likelihood function in (2) is:
From this expression, some students will say that you can't rewrite (5) so that there is a factorization, and hence there is no single sufficient statistic u_{1}. Some may in fact come up with the idea that Y_{1} and Y_{n} are jointly sufficient statistics. Example 1, page 348 of Hogg and Craig (1995), shows that Y_{1} and Y_{n} are jointly sufficient for . They also point out that these are minimally sufficient and state that there can't be one sufficient statistic. So the density of (1) shows the student that all examples don't work out nicely. The fact that Y_{1} and Y_{n} are joint sufficient statistics for explains why the variance of the midrange is so much less than for W_{1} and W_{3}.
Remark. First, Mood, Graybill, and Boes (1974) give
an alternate definition of sufficiency (Definition 16,
The density for (1) is also useful for illustrating other topics. Mood, Graybill, and Boes (1974) show that is a location parameter (example 37, p. 333), and then that W_{2} is the uniformly smallest mean-squared error estimator in the class of location invariant estimators (example 39, p. 335). Hogg and Craig (1995) essentially show that Y_{1} and Y_{n} are not complete (example 2, p. 349), and discuss the idea of ancillary statistics (example 5, p. 357). We have noted that the MLE is not unique, as any weighted mean of and will serve as an MLE. However, none of these are sufficient alone; so there is no minimal sufficient statistic for , because to enjoy that property the MLE must be sufficient.
After discussing the properties of good estimators, it
is easy to forget that we wanted to estimate the time the
bus takes to travel between stops. Let us agree that
We illustrate a better way by first finding a good test of against , and then we use the relationship between tests and confidence intervals to find a confidence interval for . The likelihood ratio is given by
where is a maximum likelihood estimator such as W_{2}. Note that provided , but if or . Thus, strictly following the likelihood ratio criterion, we would reject H_{0} if or . That is, we would accept H_{0} if . Clearly, if is true, we would never reject H_{0} with this test, and the significance level is . However, we must be concerned about the power of the test when . This suggests that we could improve the power if we selected a constant c slightly less than and accepted H_{0} if ; otherwise reject H_{0}.
Let us find c so that the significance level is . Thus we want, when , that
Accordingly,
.
Going 'backwards' from this test, the corresponding
confidence interval for
is
The power of this test is easy for a student to compute by evaluating
Clearly, when , . Of course, is symmetric about and hence we need only consider when . If , then all and . So when ,
We present several exercises when considering the uniform distribution on , where is unknown. Thus, . These are analogous to problems associated with the normal distribution when is unknown, and bright students can often think of the two-sample and k-sample problems on their own (possibly with a little direction from the instructor). In each of the following situations, the student is asked to show that the likelihood ratio equals the form given. Students may find these 'easy' or 'hard' depending on their insight.
Then
where are the order statistics of a random sample of size n.
Using the same notation as in Exercise 6.1,
For convenience, let
Using the same assumptions and notation of Exercise 6.3,
For convenience, let all k independent random samples have the same sample size n, and be the order statistics of the i^{th} sample. Then
Using the same assumptions and notation of Exercise 6.5,
To determine the significance level of each of the above tests that rejects H_{0} if , we need to determine the distribution of or some appropriate function of . These are much more difficult exercises.
First we must thank the referees who made great suggestions, and one of whom caught a major error in one of our statements. We certainly appreciate their time and effort. The estimator in the Remark in Section 3.3 is theirs. We also thank the other authors whom we have referenced and who have dabbled with the interesting, one-parameter example. Their work has contributed to this paper.
Although the first author was dismayed when he found out that 'his' wonderful example was already in some of the literature (some published over 40 years ago; see Hogg and Craig (1956) and the 1959 edition of their book), he realized that he was not the only person to think this example was a good one. He then contacted the second author about presenting many of the applications of these uniform distributions in one document. When the first author later found the example in Mood, Graybill, and Boes (1974), a book he had used at least eleven years prior to his 'creating' the example, he almost thought that he had subconsciously plagiarized! However, with the continued rise in importance of statistics education and the fact that no single textbook has used this uniform distribution to any extent, the authors felt the project was still worthwhile.
The authors have one request: readers who find uniform distributions of the type considered here, referenced in any book or article, should contact the first author with the references. This is especially true of those who have written the book or article themselves! We have looked at many texts, but certainly not all of them.
Casella, G., and Berger, R. L. (1990), Statistical Inference, Belmont, CA: Duxbury.
Hogg, R. V., and Craig, A. T. (1956), "Sufficient Statistics In Elementary Distribution Theory," Sankhya, 17, 209.
----- (1995), Introduction to Mathematical Statistics (5th ed.), Englewood Cliffs, NJ: Prentice Hall.
Hogg, R. V., and Tanis, E. A. (1997), Probability and Statistical Inference (5th ed.), Englewood Cliffs, NJ: Prentice Hall.
Miller, I., and Miller, M. (1999), John E. Freund's Mathematical Statistics (6th ed.), Upper Saddle River, NJ: Prentice Hall.
Mood, A. M., Graybill, F. A., and Boes, D. C. (1974), Introduction to the Theory of Statistics (3rd ed.), NY: McGraw-Hill.
Rice, J. A. (1995), Mathematical Statistics and Data Analysis (2nd ed.), Belmont, CA: Duxbury.
Dexter C. Whittinghill
Department of Mathematics
Rowan University
201 Mullica Hill Rd.
Glassboro, NJ 08028
Robert V. Hogg
Department of Statistics and Actuarial Science
University of Iowa
Iowa City, IA 52242
Volume 9 (2001) | Archive | Index | Data Archive | Information Service | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications