If you are reading this message, then your current internet browser is outdated. Your browser is old and does not recognize the code used to create this site correctly or does not recognize it at all. It is recommended that you take the time to upgrade to the latest version of any of the following browsers: Internet Explorer, Netscape, Mozilla Firefox, or Safari (Mac users), just to name a few.

Workshops

Continuing Education > Courses & Workshops

CE Program for JSM 2008

Register for the courses and workshops through the main JSM registration

Registration fees in parentheses are the on site rates.

Continuing Education Courses

Saturday, August 2
Sunday, August 3
Monday, August 4
Tuesday, August 5

Computer Technology Workshops

Wednesday August 6


SATURDAY, AUGUST 2

CE_01C (two-day course Saturday and Sunday)
8:30 a.m.–5:00 p.m.
Title: Generalized Linear Mixed Models: Theory and Applications
Instructors: Oliver Schabenberger and Walter Stroup

Abstract:
This two-day course is for those who want to learn about the theory and application of generalized linear mixed models across disciplines from a non-Bayesian perspective. Each day comprises theory and application components with numerous examples. The material is presented at an applied level, accessible to participants with training in linear statistical models and previous exposure to linear mixed models.

On the first day, we will cover classes of mixed models and how their features are made manifest in today’s likelihood-based estimation methods. We will make the connection between linear models, generalized linear models, linear mixed models, and generalized linear mixed models (GLMM) in terms of model formulation, distributional properties, and approaches to estimation. Participants will learn that GLMMs are an encompassing family and understand the differences and similarities in approaches to estimation and inference within the family. We will discuss overarching issues that confront analysts who work with correlated, non-normal data, such as overdispersion, the marginal and conditional models, and model diagnostics.

During the second day, we will focus on application areas for GLMMs and examples; supporting theory will be introduced as needed. Focus areas will include modeling of rates and proportions, modeling of regular and zero-inflated counts, mixed model smoothing, the computation of power and sample size, and inferential tasks with and without adjustments. Computations will be based on the mixed model tools in SAS/STAT software.
FEES: M–$575 ($735), NM–$700 ($865), S–$340 ($550)


CE_02

8:30 a.m.–5:00 p.m.
Title: Genetic and Microarray Data Analysis
Instructors: Russell D. Wolfinger and Carl Langefeld

Abstract:
This course is for statisticians who wish to learn about statistical genetics, microarray data analysis, and prediction with genomic biomarkers. Course content will be at the intermediate level. It time permits, we will cover topics such as copy number, exon arrays, ChIP-on-Chip, and eQTL. There will be a mixture of theory and practical examples. JMP Genomics software and custom scripts will be used for illustration.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: DNA Microarrays and Related Techniques (Chapman & Hall) by Allison, D.B. et.al.


CE_03C

8:30 a.m.–5:00 p.m.
Title: Optimal Experimental Designs
Cosponsor: Section on Physical and Engineering Sciences
Instructors: Alexander N. Donev and Randy Tobias

Abstract:
Optimal design for the practitioner is often discussed as a “black box,” shying away from the theory. On the contrary, the premise for this course is that powerful practical approaches for assessing the properties of standard designs and of finding good designs in nonstandard situations result from familiarity with the theory of optimal experimental design. We will start by covering fundamental theory, including forms of the General Equivalence Theorem that are central to algorithms for the construction of optimal designs. These ideas will be illustrated with standard designs for response surface models. We will move on to common nonstandard problems in design for response surfaces, such as blocking, finding designs over irregular regions, and mixture designs. We will also discuss the augmentation of designs and designs for checking the adequacy of models. Many models in chemistry and the pharmaceutical industry are nonlinear in the parameters. Optimal designs for these models depend on prior information about the parameters, which may be available in the form of a prior distribution. We will show how this information may be used to provide good designs.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Statistics for Experimenters: Design, Innovation, and Discovery, 2nd Edition (Wiley) by Box, G.E.P., Hunter, W.G., and Hunter, J.S.


CE_04C

8:30 a.m.–5:00 p.m.
Title: Regression Modeling Strategies
Instructor: Frank E. Harrell, Jr.

Abstract:
All standard regression models have assumptions that must be verified for the model to have power to test hypotheses and predict accurately. Of the principal assumptions, this course will emphasize methods for assessing and satisfying linearity and additivity. Practical but powerful tools will be presented for validating model assumptions and presenting model results. This course provides methods for estimating the shape of the relationship between predictors and response by augmenting the design matrix using restricted cubic splines. Even when assumptions are satisfied, over fitting can ruin a model’s predictive ability for future observations. Methods for data reduction will be introduced, methods of model validation will be covered, and auxiliary topics such as modeling interaction surfaces, efficiently utilizing partial covariable data by using multiple imputation, variable selection, overly influential observations, collinearity, and shrinkage will be discussed. The methods covered will apply to almost any regression model, including ordinary least squares, logistic regression models, and survival models.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Applied Regression Analysis (Wiley) by Draper, N.R., and Smith, N.


CE_05C

8:30 a.m.–5:00 p.m.
Title: Hot Topics in Clinical Trials
Cosponsors: Teaching Statistics in the Health Sciences, Boston Chapter of the ASA
Instructors: Scott R. Evans, Lee-Jen Wei, Lu Tian, Lingling Li

Abstract:
We will address several hot-topic areas in clinical trials, including the use of prediction to identify biomarkers, meta-analysis of rare safety events, data monitoring committees, data monitoring using prediction, noninferiority studies, causal inference, benefit:risk assessment, and bridging studies. We will present motivating examples and discuss standard and novel approaches to analyses.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Introduction to Statistical Methods for Clinical Trials (Chapman & Hall) by Cook, T.D., and Demets, D.L.


CE_06C

8:30 a.m.–5:00 p.m.
Title: Successful Data Mining in Practice
Instructors: Richard De Veaux

Abstract:
This course will introduce data mining, which is the exploration and analysis of large data sets by automatic or semiautomatic means with the purpose of discovering meaningful patterns. The knowledge learned from these patterns can be used for decisionmaking via “knowledge discovery.” Much exploratory data analysis and inferential statistics concern the same type of problems, so what is different about data mining? What is similar? In the course, I will attempt to answer these questions by providing a broad survey of the problems that motivate data mining and the approaches used to solve them.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)


SUNDAY, AUGUST 3

CE_01C (two-day course Saturday and Sunday)
8:30 a.m.–5:00 p.m.
Title: Generalized Linear Mixed Models: Theory and Applications
Instructors: Oliver Schabenberger and Walter Stroup


CE_07C
8:00 a.m.–noon
Title: Design and Analysis of Epidemiologic Studies of Gene-Environment Interactions
Cosponsors: Section on Statistics in Epidemiology
Instructors: Raymond Carroll and Nilanjan Chatterjee

Abstract:
Most common human diseases have a multifactorial etiology involving a complex interplay of genetic and environmental exposures. Understanding how genetic and environmental exposures interact and jointly influence the risk of a complex disease can be important for both biological and public health purposes. We will present the state of the art of efficient design and analysis for studies of gene-environment interaction by statisticians, epidemiologists, and geneticists. Topics covered will include population- and family-based case-control designs, stratified sampling designs, modern semiparametric methods for analysis of case-control data, estimation of haplotype-environment interactions, and flexible modeling approaches to empirical Bayes methods. We will blend theory and applications with illustrations using real examples and software implementation.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Applied Logistic Regression (Wiley) by Hosmer, D.W. and Lemeshow, S.


CE_08C

8:30 a.m.–5:00 p.m.
Title: Modern Practical Bayesian Clinical Trial Design
Cosponsors: Section on Bayesian Statistical Science
Instructors: Peter F. Thall and J. Kyle Wathen

Abstract:
We will cover practical Bayesian methods for clinical trial design and conduct. Attendees should have a master’s degree in statistics, or equivalent experience, and an understanding of elementary Bayesian concepts. There will be numerous illustrations using actual clinical trials. Drawn from oncology, examples will include methods for eliciting and calibrating priors, incorporating historical data, and using computer simulation to establish a design’s frequentist properties. The morning will cover phase I designs—including dose-finding using the continual reassessment method and logistic regression, finding optimal dose pairs of a two-agent combination, and accommodating multiple toxicities—and phase II designs, including a paradigm for monitoring multiple discrete outcomes, randomized phase II trials, monitoring event times, hierarchical Bayesian methods for trials with multiple disease subtypes, and using regression to account for patient heterogeneity. The afternoon will cover phase I/II dose-finding based on efficacy-toxicity trade-offs, optimizing schedule of administration, jointly optimizing dose and schedule, adaptive randomization, a geometric approach to treatment comparison based on two-dimensional parameters, and designs to evaluate multistage dynamic treatment regimes.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Bayesian data Analysis (Chapman & Hall) by Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B.


CE_09C

8:30 a.m.–5:00 p.m.
Title: Statistical Challenges in Proteomics
Cosponsors: Biometrics Section
Instructors: Scott C. Schmidler

Abstract:
Proteomics is the next frontier in the rapidly evolving field of bioinformatics. I will provide an introduction to the principal aims, technologies, and statistical issues arising in structural and functional proteomics studies. Topics will include experimental data sources (e.g., X-ray, NMR, mass spectrometry [MALDI, SELDI, MS/MS], peptide arrays), statistical problems in structural proteomics (e.g., molecular comparison and database search, classification of structures, structure-based function prediction), and statistical problems in functional proteomics (e.g., fragment identification, normalization and registration of spectra, peak finding, sample comparison, classification and biomarker identification).
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)


CE_10C

8:30 a.m.–5:00 p.m.
Title: Systematically Improving Your Professional Practice
Instructors: Doug Zahn

Abstract:
The key to professional growth is becoming increasingly effective in the interactions that constitute professional practice. Central to this growth are processes for structuring effective interactions and dealing with breakdowns that inevitably occur. Participants will enhance their professional development by learning a process for engaging in effective interactions that focuses on five activities: prepare, open, work, end, and reflect. While this knowledge is valuable, by itself it is not enough to markedly improve your practice. Improvement requires facing the uncomfortable fact that no matter how carefully we plan, breakdowns inevitably occur. The second objective for the course is learning a process for dealing with breakdowns. You will learn to use the processes by doing, observing, and analyzing videos of three role-plays of three different situations that arise in professional practice. (We will construct these situations from pre-course information gathered from participants.) You will participate once as “consultant,” once as “client,” and once as observer of a role-play. It is amazing to discover your own responses to difficult situations through the objective lens of the camera. Using the tools in the workshop and your new awareness, you will have more effective interactions when you return to work. Prerequisite: At least one year of professional practice
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)


CE_11C

8:30 a.m.–5:00 p.m.
Title: Principles of Statistical Design
Instructors: George Casella

Abstract:
We will cover the principles and practice of statistical design, paying attention to the setup and implementation of an experiment and the underlying theory that allows valid inferences. The course will begin with a review of the basic tools for statistical design and the statistical package R. The more common designs (e.g., factorial completely randomized designs, randomized complete blocks) and their variations (e.g., Latin squares) will be covered. Emphasis will be on designing the experiment to obtain the best inference on treatment contrasts, and designs will be illustrated will real data problems. We will focus on microarray designs and spend a lot of time on split plots and their variations (e.g., strip plot, repeated measures). Finally, we will move to confounding (e.g., incomplete blocks, fractions). This course is aimed at professional-level statisticians or interested faculty and graduate students.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)

Attendees should have a working knowledge of statistical methodology and data analysis. Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Applied Regression Analysis: A Research Tool (Springer) by Rawlings, J.O., Pantula, S.G., and Dickey, D.A. or Applied Linear Statistical Models (Irwin Professional Pub) by Neter, J., Wasserman, W., Kutner, M.H., and Nachtsheim, C.J.


CE_12C

1:00 p.m.–5:00 p.m.
Title: Sampling in Networks
Cosponsor: Section on Survey Research Methods
Instructor: Steven K. Thompson

Abstract:
Network models are in increasing use to describe populations, including socially networked human populations, computer and communication networks, and gene regulatory networks. A network has nodes (e.g., people) and links (e.g., relationships between people). The nodes may have characteristics of interest, and the relationships may be of different types and strengths. Network data, however, generally represent a sample from the wider population network of interest. This short course will cover methods for obtaining samples from networks and using the sample data to make inference about characteristics of the population network.

In many cases the only practical way to obtain a large enough sample from the population is to follow links from sample individuals to add more individuals to the sample. For example, in studies of the risk behaviors in people at risk for HIV/AIDS, the population is hidden so standard sampling designs cannot be applied. Instead, researchers follow social referrals from individuals in the sample to find more members of the hidden population. Similarly, in studies of the World Wide Web, links or connections from sites in the sample are followed to add more sites to the sample. Network methods also turn out to be useful for spatial sampling in environmental and ecological sciences where the populations tend to be highly clustered or rare. Link-tracing sampling designs will be described, together with design-based and Bayes methods for estimating population characteristics based on such samples. Computational methods and available software also will be described.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Sampling, 2nd Edition (Wiley) by Thompson, S.K.


MONDAY, AUGUST 4

CE_13C
8:00 a.m.–noon
Title: Evaluating Probability of Success for Internal Decisionmaking in Early Drug Development
Cosponsor: Biopharmaceutical Section
Instructors: Narinder Nangia, Martin King, and Jane Qian

See a Sneak Preview of this course

Abstract:
Early development (the “learning stage”) is a crucial period of the drug development process, as decisions to continue or halt development of a compound must be made with incomplete information. Relying solely on p-values from phase I-II studies for making drug development milestone decisions is an inefficient approach, as it ignores several important determinants of future success. We will discuss the statistical tools that enable quantification of the uncertainty associated with results coming from learning stage studies. These tools use the Bayesian approach to exploit the totality of accumulated data/knowledge in a formal way for internal decisionmaking in early drug development. Posterior and/or predictive probabilities computed in a Bayesian paradigm are easy to interpret and provide much more relevant information than p-values for decisionmaking. We will also discuss evaluation of probability of a successful phase III trial through clinical trial simulations. Examples from the CNS, inflammation and oncology therapeutic areas will be considered for evaluation of probability of success for drug candidates in meeting the target product profile.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)

Participants should have a basic familiarity with hypothesis testing and inference (any first semester graduate course) and with Bayesian methods.


CE_14C
8:00 a.m.–noon
Title: U-Statistics for Scoring Multivariate Data: From Sports to Genetics
Instructors: Knut M. Wittkowski and Tingting Song

Abstract:
We will extend commonly used u-statistics for univariate and censored data to multivariate data with innovative applications in sports, economics, sociology, biology, and medicine. The course consists of four parts: stratification as a means to improve McNemar-type tests for trio data in genetics (‘TDT’) and adapt them to various genetic models; history of u-statistics; how information about relationships between variables can be incorporated through transforming data, converting data into partial orderings, and combining partial orderings; and computational and statistical aspects of screening studies involving thousands of variables (SNP or gene-expression microarrays) and nonparametric “factor analyses.” Demonstrations will be based on spreadsheets, functions from muStat (available from http://cran.r-project.org and http://csan.insightful.com), and web services available from http://muStat.rockefeller.edu. Prerequisites: Basic knowledge of statistics and programming.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Nonparametrics: Statistical Methods Based on Ranks (Holden-Day) by Lehmann, E.L.


CE_15C

8:30 a.m.–5:00 p.m.
Title: Analysis of Clinical Trials: Theory and Applications
Cosponsor: Biopharmaceutical Section
Instructors: Christy Chuang-Stein, Alex Dmitrienko, and Keaven Anderson

Abstract:
We will cover analysis of stratified data, multiple comparisons and multiple endpoints, and interim analysis and interim data monitoring by presenting practical advice from experts, offering a well-balanced mix of theory and applications, and discussing regulatory considerations. The discussed statistical methods will be implemented using SAS software, and clinical trial examples will be used for illustration. This course is for statisticians working in the pharmaceutical or biotechnology industries, as well as contract research organizations. It is equally beneficial to statisticians working in institutions that deliver health care and government branches that conduct health care–related research. Attendees must have basic knowledge of clinical trials. Familiarity with drug development is highly desirable, but not necessary.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)


CE_16C
8:30 a.m.–5:00 p.m.
Title: Graphics of Large Data Sets
Cosponsor: Section on Statistical Graphics
Instructors: Antony Unwin and Heike Hofmann

Abstract:
Graphics are great for exploring data, but how can they be used for looking at the large data sets commonplace today? Large data sets bring new complications and require different emphases and approaches. In this course, based on Graphics of Large Datasets, we will discuss how to look at ways of visualizing large datasets, whether large in number of cases, number of variables, or both. Data visualization is useful for data cleaning, exploring data, identifying trends and clusters, spotting local patterns, evaluating modeling output, and presenting results. It is essential for exploratory data analysis and data mining. Data analysts, statisticians, and computer scientists should benefit from attending this course. Participants are welcome to bring laptops and should have knowledge of standard statistical graphics and experience carrying out data analysis. Either the software Mondrian (which can be downloaded from
stats.math.uni-augsburg.de/Mondrian/) or, if you use R, the R package iPlots should be installed.
FEES: M–$365 ($475), NM–$460 ($570), S–$225 ($350)


CE_17C
8:30 a.m.–5:00 p.m.
Title: Statistical Evaluation of Medical Tests and Biomarkers for Classification
Cosponsor: Section on Statistics in Epidemiology
Instructors: Margaret S. Pepe, Holly Janes, and Todd Alonzo

Abstract:
Development of biomarkers and medical diagnostic devices has accelerated. Their rigorous evaluation is a high priority for research, yet principles and techniques for the design and analysis of these studies are not widely known. There are fundamental differences among methods for therapeutic and etiologic studies. Moreover, much basic methodology has developed recently. We will cover estimation and comparison of Receiver Operating Characteristic (ROC) curves and describe extensions to adjust for covariates that affect biomarker/test measurements. For assessing factors associated with test performance, ROC regression methods will be presented. We also will consider how to evaluate the benefit of a new test when standard tests or clinical variables exist. Second, we will consider the design of case-control studies most common in this field. Sample size calculations and optimal choice of case-control ratio will be presented and the attributes and limitations of matching controls to cases will be discussed. Third, prospective studies will be considered. Finally, we will discuss problems incurred when the gold standard reference test is, itself, subject to error. A suite of freely available Stata programs will implement analyses. Prerequisite: introductory statistics.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)


CE_18C
8:30 a.m.–5:00 p.m.
Title: Computational Statistics: Methods for Monte Carlo Integration and Optimization
Cosponsor: Statistical Computing Section
Instructors: Jennifer A. Hoeting and Geof H. Givens

Abstract:
This course will consist of two parts: a morning session on Monte Carlo integration strategies and an afternoon session on optimization methods. We will survey a variety of techniques, ranging from classic to state-of-the-art. The course will be based on Computational Statistics, and is aimed at quantitative scientists and statisticians who are unfamiliar with these methods. Upper division undergraduate mathematical literacy is recommended. Many problems in statistics require the evaluation of integrals that cannot be solved analytically, particularly in Bayesian statistics. We will cover Monte Carlo integration, importance sampling and variance reduction techniques, and Markov chain Monte Carlo methods. Optimization also plays a central role in statistics, particularly in numerical maximum likelihood estimation. The afternoon session will cover Newton-like methods, Gauss-Seidel iteration, tabu algorithms, simulated annealing, genetic algorithms, and the EM algorithm and its variants. We seek to give students a practical understanding of how and why existing methods work, enabling them to use modern statistical methods effectively. We focus on methodological concepts, and not details of computer programming. Examples are drawn from diverse fields including bioinformatics, ecology, and medicine.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)

The course will be based on the book "Computational Statistics," by G. Givens and J. Hoeting, Wiley.


CE_19C

1:00 p.m.–5:00 p.m.
Title: Methods of Identifying and Dealing with Overdispersed Regression Models
Instructor: Joseph Hilbe

Abstract:
We will define overdispersion in the context of binomial and count models and specify the difference between apparent and real overdispersion and how to identify each. We also will show methods that can be used to eradicate apparent overdispersion from a model, as well as discuss methods used to deal with real overdispersion.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Negative Binomial Regression (Cambridge University Press) by Hilbre, J.M.


CE_20C

1:00 p.m.–5:00 p.m.
Title: Adaptive Analysis of Data: Tests of Significance and Confidence Intervals
Instructor: Thomas W. O’Gorman

Abstract:
I will present several adaptive methods for the analysis of data, beginning with a two-sample adaptive test, and then present an adaptive method of testing any subset of coefficients in a multiple regression model. I will also describe adaptive tests for interaction and main effects in the analysis of factorial experiments and adaptive tests for slope. The advantage of adaptive tests is that they are usually more powerful than the traditional tests for non-normal error distributions. As there is little power loss with normal error distributions, adaptive tests can be recommended for general use in studies having more than 20 observations. For each adaptive test, we will compare its performance to the traditional method, and I will show how to perform the test using a SAS macro. Adaptive tests used in the analysis of repeated measurements will be described and compared to the nonadaptive mixed model tests. In addition, I will describe a method of computing adaptive confidence intervals.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)

Attendees should be familiar with basic statistical modeling—including multiple regression and the analysis of variance. Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Applied Regression Analysis (Wiley) by Draper, N.R. and Smith, H.


TUESDAY, AUGUST 5

CE_21C
8:00 a.m.–noon
Title: Analysis of Multivariate Failure Time Data
Instructor: Danyu Lin

Abstract:
Multivariate failure time data arise when each study subject can potentially experience multiple events or when there exists clustering of subjects such that failure times within the same cluster are correlated. Major complications in analyzing such data include the dependence among related failure times and censoring due to limited follow-up or competing events. This short course presents a variety of statistical models and methods for the analysis of multivariate failure time data. We will discuss marginal and frailty models, paying primary attention to semiparametric regression methods. Relevant software will be described, and a number of clinical and epidemiologic studies will be provided for illustrations.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)


CE_22C

8:00 a.m.–noon
Title: Fundamental Statistics Concepts in Presenting Data: Principles for Constructing Better Graphics
Cosponsor: Section on Statistical Graphics
Instructor: Rafe Donahue

Abstract:
Data displays are mental models for understanding distributions of data. At the heart of any data display lays the distribution of the data; a model for this distribution includes demonstrating and exposing sources of variation in the distribution. Like a good map, a display of data ought to operate on several levels. At the lowest level (the highest level of granularity) are the data, themselves. Further up are the actual distributions, each with its component summaries, such as the mean or relevant quantiles. At the highest level are sources of variation in these distributions, the parameters in the (mental) model for understanding the data. The closer an architect can come to showing all these levels, the more information will be conveyed. I will present a number of principles, both developed by the masters (e.g., Minard, Tufte, Cleveland, Wilkinson, Wainer) and discovered by me, for constructing displays that will allow the architect of the data display to present the data for improved understanding; it will not be a “Don’t use pie charts” or “Here’s a bad graph from USA Today” course. We will focus on uncovering and formulating principles for presenting data visually. Examples will abound.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Visual Display of Quantitative Information (Graphics Press) by Tufte, E.


CE_23C

8:30 a.m.–5:00 p.m.
Title: Bayesian Methods and Software for Data Analysis
Cosponsor: Section on Bayesian Statistical Science
Instructors: Bradley P. Carlin and Thomas A. Louis

See a Sneak Preview of this course

Abstract:
This course will introduce hierarchical and empirical Bayes methods, demonstrate their usefulness in challenging applied settings, and show how they can be implemented using modern Markov chain Monte Carlo (MCMC) computational methods. We will provide an introduction to and live demonstration of WinBUGS, the most general Bayesian software package available to date, and BRugs, a convenient function for calling BUGS from R. Use of the methods will be demonstrated in advanced high-dimensional model settings (e.g., nonlinear longitudinal modeling or spatiotemporal estimation and mapping), where the MCMC Bayesian approach often provides the only feasible alternative incorporating all relevant model features. Participants should have an MS (or advanced undergraduate) understanding of mathematical statistics at the Hogg and Craig (1978) or Casella and Berger (2001) level. Basic familiarity with common statistical models (e.g., the linear regression model) and computing will be assumed, but we will not assume significant previous exposure to Bayesian methods or Bayesian computing. This course is aimed at students and practicing statisticians who are intrigued by all the fuss about Bayes and Gibbs, but who may still mistrust the approach as theoretically mysterious and practically cumbersome.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)


CE_24C

8:30 a.m.–5:00 p.m.
Title: Models for Discrete Repeated Measures
Instructors: Geert Verbeke and Geert Molenberghs

See a Sneak Preview of this course (please note that the course has been moved to Tuesday, August 5, since this recording)

Abstract:
Starting from a brief introduction to the linear mixed model for continuous longitudinal data, we will formulate extensions to model outcomes of a categorical nature, including counts and binary data. Based on Verbeke and Molenberghs (2005), several families of models will be discussed and compared, from an interpretational and computational point of view. First, we will discuss models for the full marginal distribution of the outcome vector. Such models allow inference to be based on maximum likelihood principles, but they have the disadvantage of requiring complete specification of all higher-order interactions. We will talk about two alternatives: random-effects models and semiparametric marginal models with specification of the first moments only, or the first and second moments only. We will discuss and illustrate in full detail estimation and inference, and we will extensively argue that both approaches yield parameters with completely different interpretations. Finally, when analyzing longitudinal data, one is often confronted with missing observations. We will show that, if no appropriate measures are taken, missing data can cause seriously biased results and interpretational difficulties. Methods to properly analyze incomplete data, under flexible assumptions, will be presented and key concepts of sensitivity analysis will be introduced.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Linear Mixed Models for Longitudinal Data (Springer) by Verbeke, G. and Molenberghs, G.


CE_25C

8:30 a.m.–5:00 p.m.
Title: Mixed Models for the Practicing Statistician
Cosponsor: Statistics and the Environment
Instructors: Linda Young and Ramon Littell

Abstract:
Data sets from designed experiments, sample surveys, and observational studies often contain correlated observations due to random effects and repeated measures. Mixed models can be used to accommodate the correlation structure, produce efficient estimates of means and differences between means, and provide valid estimates of standard errors. Repeated measures and longitudinal data require special attention because they involve correlated data that arise when the primary sampling units are measured repeatedly over time or under different conditions. We will use normal theory models for random effects and repeated measures ANOVA to introduce the concept of correlated data. We will then extend these models to generalized linear mixed models for the analysis of non-normal data, including binomial responses, Poisson counts, and over-dispersed count data. We will discuss methods of assessing the fit and deciding among competing models. Radial smoothing splines can be represented as mixed models, and we will illustrate their application. We will illustrate PROC GLIMMIX in the SAS system using practical examples from pharmaceutical trials, environmental studies, educational research, and laboratory experiments.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: An Introduction to Statistical Methods and Data Analysis (Duxbury) by Ott, R.L. and Longnecker, M.T.


CE_26C

8:30 a.m.–5:00 p.m.
Title: Multiple Imputation of Missing Data
Instructor: Paul Allison

Abstract:
This course will cover both the conceptual foundations and practical details of implementing multiple imputation. Conventional methods for handling missing data typically yield biased estimates and/or incorrect standard errors. By contrast, multiple imputation produces estimates with nearly optimal properties under weaker assumptions. I will explain the assumptions of “missing at random” and “missing completely at random.” After a brief review of conventional methods, we will consider multiple imputation based on linear regression with random draws. We will examine implementation using the MCMC algorithm in SAS PROC MI in detail, and then move on to the role of the dependent variable, imputation under a restricted range, imputation of categorical variables, multivariate inference, interactions and nonlinearities, congeniality of data model and imputation model, longitudinal data, nonignorable missing data, and imputation by chained equations (demonstrated using the ice command in Stata).
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)

Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Introduction to Linear Regression Analysis (Wiley) by Montgomery, D.C., Peck, E.A., and Vining, G.G.


CE_27C

1:00 p.m.–5:00 p.m.
Title: Meta-analysis: Statistical Methods for Combining the Results of Independent Studies
Instructor: Ingram Olkin

Abstract:
Meta-analysis enables researchers to synthesize the results of a number of independent studies designed to determine the effect of an experimental protocol, such as an intervention, so the combined weight of evidence can be considered and applied. Increasingly, meta-analysis is being used in the health sciences, education, and economics to augment traditional methods of narrative research by systematically aggregating and quantifying research literature. The information explosion in almost every field coupled with the movement toward evidence-based decisionmaking and cost-effective analysis has served as a catalyst for the development of procedures to synthesize the results of independent studies. In this course, I will provide a historical perspective of meta-analysis and discuss some of its issues. The statistical methodology will include discussions of nonparametric and parametric models, effect sizes for proportions, fixed versus random effects, regression, and ANOVA models. New material on multivariate models also will be presented.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)

Course attendees should have a basic understanding of statistics including regression.


CE_28C

1:00 p.m.–5:00 p.m.
Title: Analysis of Censored Health Outcomes Data: Developments for the Last 10 Years
Cosponsors: Health Policy Statistics Section, Biopharmaceutical Section
Instructors: Hongwei Zhao and Heejung Bang

Abstract:
Medical cost and quality-adjusted lifetime are common health outcomes data from clinical trials and observational studies. Although these data look different, they share many statistical properties and can be understood in a unified framework. Just like standard survival data, censoring is an important issue in these data. Despite the analogy, censoring mechanism is informative, different from the traditional paradigm. It has been a decade since it was shown that the use of most standard statistical techniques (e.g., sample mean, linear regression, and Kaplan-Meier estimator) can be invalid. However, we often find that even experienced researchers still use traditional methods for the analysis of health outcome data in practice. In this course, we will review valid methods for statistical estimation and inference that have been developed in last 10 years. Unfortunately, not all are easy or user-friendly, and no commercial software is available so far. Therefore, we will suggest methods as practical solutions for practitioners. We also will present the analytic relationships among well-known medical cost estimators recently identified. Extended applications to customer lifetime value and cost-effectiveness analysis will be discussed. Course prerequisite is basic knowledge of survival analysis.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)

Course attendees should consider as a prerequisite a basic knowledge of survival analysis at the level of: Survival Analysis (Springer) by Klein, J.P. and Moeschberger, M.L.


WEDNESDAY, AUGUST 6
Computer Technology Workshops

CE_29T
8:00 a.m.–9:45 a.m.
Title: Meta-analysis: Concepts and Applications
Instructors: Michael Borenstein and Hannah R. Rothstein

Abstract:
Meta-analysis is a set of statistical procedures to synthesize data from multiple studies. When the studies share a common effect size, the meta-analysis yields a more precise estimate of that effect than any single study, and when the effect varies from one study to the next, meta-analysis may be used to explain the variation. Meta-analyses are used to inform policy, obtain approval for drugs, and design research. They also play a key role in grant applications and publications. We will explain the concept of meta-analysis and show how to compute treatment effects and a combined effect, assess heterogeneity, and explain variation in treatment effects across studies. We will discuss the difference between fixed and random effects models and address common criticisms of meta-analysis. We will demonstrate Comprehensive Meta Analysis Version 2, a program developed with funding from the NIH. This course is intended for people who perform or interpret meta-analyses. Attendees should have some familiarity with meta-analysis, but the course will cover the basics before moving on to advanced topics. FEE: $50


CE_30T

8:00 a.m.–9:45 a.m.
Title: Determining Sample Size and Power in Study Planning: nQuery Advisor 7.0
Instructors: Janet D. Elashoff and Brian Sullivan

Abstract:
Choosing an adequate sample size is a vital part of study planning. We will review statistical methods for determining study sample size and power. Using nQuery Advisor with real examples, we will demonstrate the steps in sample size determination from specifying the design to writing a sample size justification statement. We will provide tips for the toughest problem in sample size determination—eliciting the information needed to specify “effect” sizes and “guesstimate” standard deviations—and we encourage discussion. We will illustrate sample size planning for survival studies with user specified hazard ratios and illustrate the effects of accrual and dropout patterns on required sample size. We will show the relationships between sample size methods for tests, confidence intervals, and noninferiority and equivalence studies. We will discuss the logistic and power issues of unequal n’s and stratification and show how to make the last step in study planning, the creation of randomization lists, easy. Attendees should be experienced in the use of data analysis methods commonly taught in master’s programs in statistics. FEE: $50


CE_31T

8:00 a.m.–9:45 a.m.
Title: An Introduction to Stat Studio for SAS/STAT Users
Instructor: Rick Wicklin

Abstract:
Stat Studio 3.1 is new statistical software in SAS 9.2. It provides a highly flexible programming environment in which you can run SAS/STAT and SAS/IML analyses and display the results with dynamically linked graphics and data tables. You can also call SAS procedures from an IML program. Stat Studio is intended for data analysts who write SAS programs to solve statistical problems but need more versatility for data exploration and model building. This workshop introduces Stat Studio to SAS/STAT users. You will learn how to use the point-and-click features of Stat Studio for analyzing data interactively, write programs that use interactive graphics to display diagnostic statistics computed by SAS/STAT procedures for model assessment and outlier identification, and write programs that implement modern statistical methods, such as bootstrap algorithms and nonparametric smoothing techniques. Attendees should have basic knowledge of SAS/STAT procedures such as FREQ, REG, and LOGISTIC. Experience with SAS/IML and object-oriented programming is helpful, but not required. FEE: $50


CE_32T
8:00 a.m.–9:45 a.m.
Title: From Software to Solutions in Statistics and Risk Analysis
Instructor: Shawn Harahush

Abstract:
The world of business and education has become more complex with the decision of the type of software a business will use to successfully manage their incoming data. Palisade, a world leader in risk analysis, has been creating software solutions for more than 20 years. Palisade’s flagship product, @RISK, integrates into Microsoft Excel to provide a powerful Monte Carlo simulation engine to the ease-of-use environment. Palisade’s StatTools also integrates with Excel to provide reliable and easy-to-use statistics to an easily accessible program. NeuralTools adds sophisticated neural networks analysis into an easy-to-use and familiar interface: Microsoft Excel. FEE: $50


CE_33T
10:00 a.m.–11:45 a.m.
Title: EastAdapt: A Module for Late Stage Adaptive Trial Design Within the East® 5 Software System
Instructor: Cyrus Mehta

Abstract:
We will demonstrate EastAdapt®, a major upgrade of the adaptive design module of East that is used for designing and simulating late-stage (phase II and phase III) clinical trials. EastAdapt makes it possible to design clinical trials with a data-dependent mid-course correction to sample size, spending function, and number of future interim analyses and their spacing without inflating the type I error. EastAdapt’s simulations are used to determine the operating characteristics of the adaptive design and compare them to those of a classical group sequential design. A major new capability is the ability to compute valid p-values, point estimates, and confidence intervals at the end of the adaptive clinical trial. Another major new EastAdapt capability is the ACR Method for performing the adaptive hypothesis test. With the ACR method, one can use the usual sufficient statistic, rather than the Cui, Hung and Wang (1999) weighted statistic to determine statistical significance. FEE: $50


CE_34T
10:00 a.m.–11:45 a.m.
Title: Survey Data Analysis with Stata
Instructor: Jeffrey Pitblado

Abstract:
This workshop will cover how to use Stata for survey data analysis assuming a fixed population. Knowledge of Stata is not required, but attendees should have some statistical knowledge, such as what is typically covered in an introductory statistics course. We will begin by reviewing the sampling methods used to collect survey data and how they affect the estimation of totals, ratios, and regression coefficients. We will then cover the three variance estimators implemented in Stata’s survey estimation commands. Stata with a single sampling unit, certainty sampling units, subpopulation estimation, and poststratification will also be covered. Each topic will be illustrated with an example in a Stata session. FEE: $50


CE_35T
10:00 a.m.–11:45 a.m.
Title: Nonparametric Regression Modeling in SAS Software
Instructor: Weijei Cai

Abstract:
Nonparametric modeling is widely employed in modern statistical analysis in cases where only limited knowledge of the underlying model is available. You can use nonparametric modeling to discover nonlinear dependencies in your data, enabling you to develop parsimonious parametric models. This workshop is intended for a broad audience of statisticians and data analysts who are interested in nonparametric regression modeling. In it, I will describe methods and SAS tools for fitting local regression models with the LOESS procedure, penalized spline models with the TRANSREG procedure, thin-plate spline models with the TPSPLINE procedure, generalized additive models with the GAM procedure, penalized spline and radial basis function models using a mixed model approach with the GLIMMIX procedure, and selected basis functions models with the GLMSELECT procedure. The audience should have a basic understanding of regression theory. FEE: $50


CE_36T
10:00 a.m.–11:45 a.m.
Title: Introduction to CART: Data Mining with Decision Trees
Instructor: Mikhail Golovnya

Abstract:
This course, intended for the applied statistician wanting to understand and apply the CART methodology for tree-structured nonparametric data analysis, will emphasize practical data analysis involving classification. All concepts will be illustrated using real-world examples. The course will begin with an intuitive introduction to tree-structured analysis. Working through examples, we will review how to read CART output and set up basic analysis. This session will include performance evaluation of CART trees and cover ways to search for possible improvements of the results. Once a basic working knowledge of CART has been mastered, we will focus on critical details essential for advanced CART applications, including choice of splitting criteria, choosing the best split, using prior probabilities to shape results, refining results with differential misclassification costs, the meaning of cross validation, tree growing, and tree pruning. The course will conclude with discussion of the comparative performance of CART versus other computer-intensive methods, such as artificial neural networks and statistician-generated parametric models. FEE: $50


CE_37T
1:00 p.m.–2:45 p.m.
Title: New Software for the Design, Analysis and Reporting of Bioequivalence and Clinical Pharmacology Trials
Instructors: Yannis Jemiai and Pralay Senchuadhuri

Abstract:
Cytel Inc. introduces a software package for the design, analysis, and reporting of early phase clinical pharmacology trials, as well as pivotal and nonpivotal bioequivalence trials. Key development members from Cytel Inc. will demonstrate how to quickly design parallel and crossover clinical trials with superiority, noninferiority, or equivalence objectives; create, import, and explore data sets; produce and compare analyses and plots; and construct standardized templates to generate standardized reports of your work
FEE: $50


CE_38T
1:00 p.m.–2:45 p.m.
Title: New Procedures and Features for Clustered and Survey Data Analysis in SUDAAN® Release 10
Instructors: Angela Pitts and G. Gordon Brown

Abstract:
This workshop will highlight two new procedures and several new features in SUDAAN Release 10, which will be available in early August 2008. SUDAAN is a statistical software package for the analysis of complex survey and other cluster-correlated data. We will focus on the new PROC HOTDECK procedure that conducts sequential weighted hot deck imputation; the PROC WTADJUST procedure that computes weight adjustments; and the addition of model-adjusted risk ratios, a test for the proportional odds assumption, exponentiated point estimates defined by EFFECTS statement contrasts, a SORTED option on the NEST statement, the use of character variables in all procedures, and several enhancements to the PRINT statement. The workshop will include a brief introduction to SUDAAN syntax. Attendees are not required to be SUDAAN users, but should have knowledge of statistical issues that arise when analyzing survey and other correlated data. The new SUDAAN features will be demonstrated on complex survey data. We will demonstrate proper implementation of SUDAAN, provide interpretation of the output, and discuss statistical issues related to the data. All course material, including a 30-day trial version of SUDAAN Release 10, will be provided. FEE: $50


CE_39T
1:00 p.m.–2:45 p.m.
Title: Introduction to Bayesian Analysis Using SAS Software
Instructor: Fang Chen

Abstract:
Bayesian methods have become increasingly popular in recent years in a number of disciplines. This workshop will provide an introduction to Bayesian methods with applications in the generalized linear model and survival analysis. The first part will provide an overview of Bayesian methodology, including motivation and Bayesian inference, and computational methods and convergence diagnostics relevant to the SAS implementation. The second part will cover applications using new capabilities in SAS/STAT software in the GENMOD, LIFEREG, and PHREG procedures, which are based on Gibbs sampling. Examples will include linear regression, logistic regression, Poisson regression, Cox regression, parametric survival models, and the piecewise exponential model. Note that these enhanced procedures are available in the newly available SAS 9.2.

A master’s-level knowledge of statistics is assumed, as well as experience with generalized linear models and survival analysis. Previous exposure to Bayesian methods is useful, but not required. FEE: $50


CE_40T
1:00 p.m.–2:45 p.m.
Title: Introduction to MARS: Predictive Modeling with Nonlinear Automated Regression Tools
Instructor: Mikhail Golovnya

Abstract:
This workshop will introduce the main concepts behind Jerome Friedman’s MARS, a modern regression tool that can help analysts quickly develop superior predictive models. MARS is a nonlinear automated regression tool that can trace complex patterns in data. It automates the model specification search, including variable selection, variable transformation, interaction detection, missing value handling, and model validation. Conventional regression models typically fit straight lines to data. Although this usually oversimplifies the data structure, the approximation is sometimes good enough for practical purposes. However, in the frequent situations in which a straight line is inappropriate, an expert modeler must search tediously for transformations to find the right curve. MARS approaches model construction more flexibly, allowing for bends, thresholds, and other departures from straight lines from the beginning. Attendees will be presented with MARS’ key benefits. FEE: $50


CE_41T
3:00 p.m.–4:45 p.m.
Title: Exact Methods Module for East® 5: Design, Simulate, Analyze, and Monitor Binomial Endpoint Trials by Exact Inference Methods
Instructors: Anthiyur Kannappan and Pralay Senchuadhuri

Abstract:
We will present a newly added special module, “Exact Methods” for East® 5, to design, simulate, analyze, and monitor binomial endpoint trials by exact inference methods. This module includes procedures for Simon’s two-stage optimal, one sample (group sequential), paired proportions, two sample superiority (difference, ratio, Fisher’s), two sample noninferiority (difference, ratio), and two sample equivalence. This module will be especially suitable to situations where the sample sizes are not expected to be large. The usual features of East®—boundary chart, enhanced simulation, and interim monitoring capability—are available for the procedures in this module. The additional feature is the ability to input 2x2 data in the interim monitoring sheet and get the exact inference method results there. FEE: $50


CE_42T
3:00 p.m.–4:45 p.m.
Title: Structural Analysis of Time Series Using the SAS/ETS UCM Procedure
Instructor: Rajesh Seluker

Abstract:
This workshop will introduce the SAS/ETS UCM procedure, which enables analysis of time series data by using structural models. Structural models provide regression-like decomposition of the response series into components such as trend, seasonal or other periodic, and linear and nonlinear regression effects. Apart from the series forecasts, this methodology provides estimates of these unobserved components, which are useful in practical decisionmaking. Participants will learn to identify, diagnose, and use structural time series models for time series data in a variety of situations. The course will cover novel time series techniques, including approximation of long and complex seasonal patterns by using splines and incorporation of linear and nonlinear regression effects with time varying coefficients. Several real-life examples will be used to demonstrate the functionality of the UCM procedure. Participants also will learn the relationship between the ARIMA models—another class of models widely used for analyzing time series data—and structural models. FEE: $50


CE_43T
3:00 p.m.–4:45 p.m.
Title: Advances in Data Mining: Jerome Friedman’s TreeNet/MART and Leo Breiman’s Random Forests
Instructor: Mikhail Golovnya

Abstract:
This workshop will present Leo Breiman’s Random Forests and Jerome Friedman’s TreeNet/MART. Random Forests and MART/TreeNet are advances to classification and regression tree software, which enable the modeler to construct predictive models of extraordinary accuracy. Random Forest is a tree-based procedure that makes use of bootstrapping and random feature generation. In TreeNet, classification and regression models are built gradually through a potentially large collection of small trees, each of which improves on its predecessor through an error-correcting strategy. I will show how the software is used to solve real-world data mining problems, discuss theory and what is novel in the software, highlight implementation, compare the two methodologies, and show where the software fits in terms of other data mining software. FEE: $50

 

Key Dates

  • August 2 – 7, 2008
    Onsite registration (increased fees apply)
  • August 15, 2008 - Online submission of JSM Proceedings will open.

  • October, 27, 2008 - JSM Proceedings online submissions and editing will close.