SPES INVITED SESSION

How to predict a high-dimensional variable?

Organizer: J. T. Hwang, Cornell University


ASA Theme Session: (yes or no) Yes.

Applied Session: (yes or no) It has both applications and theory. Brief description of session concept: One of the most exciting areas in modern statistics is the study of how to handle high-dimensional data. In many situations, the ultimate goal is prediction. This is the focus of the session.


In many disciplines (for example, industry manufacturing), measuring a high-dimensional variable (to insure the quality of the product) costs time and money. Is it possible that one can measure a small part of the variable and use it (together with the training data which is high-dimensional) to predict well the unobserved elements of the variable? We shall discuss a technique, called High-dimensional Empirical Linear Prediction (HELP) by some of us, that works amazingly well when applied to electronic converters. In a motivating example, HELP can predict well based on measuring less than one percent of the characteristics: the saving is tremendous.

M. Souders, one of the inventors of HELP, shall give an overall expository talk about this technique and its application. H. K. Liu and J. T. Hwang, two leading statisticians in this area, will discuss the issue of constructing statistical intervals. In addition to the amazing applications of HELP, several phenomena observed are quite interesting and striking theoretically. Lynne Hare will give introductory discussions about the three talks and the three speakers.


Session Chair & Affiliation: Lynne Hare , NIST

Mailing Address (include zip),Phone, Fax, E-mail: Chief of the Statistical Engineering Division National Institute of Standards and Technology Building 820, Room 353 Gaithersburg MD 20899-0001 Phone no. 301 975 2840 fax no. 301 990 4127 E-mail address: lynne.hare@nist.gov

Session Organizer & Affiliation: J.T. Gene Hwang ,Cornell University Mailing Address (include zip), Phone, Fax, E-mail: Professor, Department of Mathematics White Hall, Cornell University, Ithaca, NY 14853 Phone no. 607 255 3443 fax no. 607 255 7149 E-mail address: hwang@math.cornell.edu


Participant No. 1 & Affiliation: T. M. Souders, NIST Title of Paper OR Role in Session (incl brief abstract): "Reducing the Costs of Testing Electronic Devices Using High-dimensional Empirical Linear Prediction."

This talk will describe how High-dimensional Empirical Linear Prediction has been used to substantially reduce the amount of testing required to characterize the performance of electronic devices. Examples include laboratory calibrations of multirange instruments and production line testing of integrated circuits. Modeling methods, test point selection algorithms and prediction intervals will be included.

Co-Authors & Affiliations (for papers only): G. N. Stenbakken and A. D. Koffman Mailing Address (include zip), Phone, Fax, E-mail for Participant No. 1: Rm. B162, Bldg. 220, NIST, Gaithersburg, MD 20899. Phone: 301-975-2406; e-mail: souders@eeel.nist.gov


Participant No. 2 & Affiliation: Hung-kung Liu, NIST Title of Paper OR Role in Session (incl brief abstract): High-dimensional Empirical Linear Prediction (HELP)

Many engineering problems involve high-dimensional observations with mean vectors sitting in a lower dimensional space. Exhaustive measurement of all the elements is often time consuming and expensive. Applying a traditional multivariate linear model, one can incorporate a small number of the elements of an observation with the known design matrix to predict the rest of the elements. However, for a complicated engineering system, the design matrix is often hard to be fully determined. In this talk, we investigate an empirical model, in which we allow ourselves to use the data to determine the size of the design matrix and to estimate the unknown part of the design matrix. This estimated model is then used to construct point and interval estimation for the future observation. This technique is called HELP (High-dimensional Empirical Linear Prediction). The performance of HELP when applied to some engineering problems will also be discussed.

Co-Authors & Affiliations (for papers only): Mailing Address (include zip), Phone, Fax, E-mail for Participant No. 2: Statistical Engineering Division National Institute of Standards and Technology Building 820, Room 353 Gaithersburg, MD 20899-0001 Phone: 301-975-2718 Fax: 301-990-4127 Email: liu@cam.nist.gov


Participant No. 3 & Affiliation: J.T. Gene Hwang, Cornell University Ithaca, NY 14853 Title of Paper OR Role in Session (incl brief abstract): High-dimensional statistical intervals for HELP

After a product is made, you often are interested in checking the characteristics of the product to make sure that it works properly. One problem, however, is that the number of characteristics may be huge. In the case of a 13 bit electronic converter, the number of characteristics is 2^13 = 8192. Checking such a huge number of characteristics is time consuming and expensive. What can you do? You may need HELP.

HELP, High-dimensional Empirical Linear Prediction, is a technique that allows you to use part of the observed characteristics (together with some other exhaustively measured data) to predict the rest of the characteristics.

A group of Electrical Engineers led by M. Souders and J. Stenbakken in the National Institute of Standards and Technology(NIST) have been trying to develop the technique and applying to converters over the last fifteen years. They claimed that 64 measured characteristics of the converter (together with the data of exhaustively measured 90 converters) predict well the rest of the (8192-64) characteristics. Their claim is confirmed later by the prediction intervals which is the focus of the talk.

To discuss the question of constructing prediction intervals, we review the approach of Hwang and Liu. We also present a new approach based on my work with Adam Ding in which an asymptotic approach relating to large dimension matrices is used. This new approach, although computationally more intensive, substantially improves upon Hwang and Liu's approach and is very promising. We shall also compare intervals constructed using various shapes of confidence sets. The comparison shows surprising phenomena due to high dimensionality.

Co-Authors & Affiliations (for papers only): Mailing Address (include zip), Phone, Fax, E-mail for Participant No.3: The information is given above under session organizer.