Continuing Education > Courses & Workshops
CE Program for JSM 2008
Register for the courses and workshops through the main JSM registration
Registration fees in parentheses are the on site rates.
Continuing Education Courses
Saturday, August 2
Sunday, August 3
Monday, August 4
Tuesday, August 5
Computer Technology Workshops
SATURDAY,
AUGUST 2
CE_01C (two-day course Saturday and Sunday)
8:30 a.m.–5:00 p.m.
Title: Generalized Linear Mixed Models: Theory and Applications
Instructors: Oliver Schabenberger and Walter Stroup
Abstract:
This two-day course is for those who want to learn about the theory and application
of generalized linear mixed models across disciplines from a non-Bayesian
perspective. Each day comprises theory and application components with numerous
examples. The material is presented at an applied level, accessible to participants
with training in linear statistical models and previous exposure to linear
mixed models.
On the first day, we will cover classes of mixed models and how their features
are made manifest in today’s likelihood-based estimation methods. We
will make the connection between linear models, generalized linear models,
linear mixed models, and generalized linear mixed models (GLMM) in terms of
model formulation, distributional properties, and approaches to estimation.
Participants will learn that GLMMs are an encompassing family and understand
the differences and similarities in approaches to estimation and inference
within the family. We will discuss overarching issues that confront analysts
who work with correlated, non-normal data, such as overdispersion, the marginal
and conditional models, and model diagnostics.
During the second day, we will focus on application areas for GLMMs and examples;
supporting theory will be introduced as needed. Focus areas will include
modeling of rates and proportions, modeling of regular and zero-inflated
counts, mixed model smoothing, the computation of power and sample size,
and inferential tasks with and without adjustments. Computations will be
based on the mixed model tools in SAS/STAT software.
FEES: M–$575 ($735), NM–$700 ($865), S–$340 ($550)
CE_02
8:30 a.m.–5:00 p.m.
Title: Genetic and Microarray Data Analysis
Instructors: Russell D. Wolfinger and Carl Langefeld
Abstract:
This course is for statisticians who wish to learn about statistical genetics,
microarray data analysis, and prediction with genomic biomarkers. Course
content will be at the intermediate level. It time permits, we will cover
topics such as copy number, exon arrays, ChIP-on-Chip, and eQTL. There will
be a mixture of theory and practical examples. JMP Genomics software and
custom scripts will be used for illustration.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: DNA Microarrays and Related Techniques (Chapman & Hall) by Allison, D.B. et.al.
CE_03C
8:30 a.m.–5:00 p.m.
Title: Optimal Experimental Designs
Cosponsor: Section on Physical and Engineering Sciences
Instructors: Alexander N. Donev and Randy Tobias
Abstract:
Optimal design for the practitioner is often discussed as a “black box,” shying
away from the theory. On the contrary, the premise for this course is that
powerful practical approaches for assessing the properties of standard designs
and of finding good designs in nonstandard situations result from familiarity
with the theory of optimal experimental design. We will start by covering fundamental
theory, including forms of the General Equivalence Theorem that are central
to algorithms for the construction of optimal designs. These ideas will be
illustrated with standard designs for response surface models. We will move
on to common nonstandard problems in design for response surfaces, such as
blocking, finding designs over irregular regions, and mixture designs. We will
also discuss the augmentation of designs and designs for checking the adequacy
of models. Many models in chemistry and the pharmaceutical industry are nonlinear
in the parameters. Optimal designs for these models depend on prior information
about the parameters, which may be available in the form of a prior distribution.
We will show how this information may be used to provide good designs.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Statistics for Experimenters: Design, Innovation, and Discovery, 2nd Edition (Wiley) by Box, G.E.P., Hunter, W.G., and Hunter, J.S.
CE_04C
8:30 a.m.–5:00 p.m.
Title: Regression Modeling Strategies
Instructor: Frank E. Harrell, Jr.
Abstract:
All standard regression models have assumptions that must be verified for the
model to have power to test hypotheses and predict accurately. Of the principal
assumptions, this course will emphasize methods for assessing and satisfying
linearity and additivity. Practical but powerful tools will be presented
for validating model assumptions and presenting model results. This course
provides methods for estimating the shape of the relationship between predictors
and response by augmenting the design matrix using restricted cubic splines.
Even when assumptions are satisfied, over fitting can ruin a model’s
predictive ability for future observations. Methods for data reduction will
be introduced, methods of model validation will be covered, and auxiliary
topics such as modeling interaction surfaces, efficiently utilizing partial
covariable data by using multiple imputation, variable selection, overly
influential observations, collinearity, and shrinkage will be discussed.
The methods covered will apply to almost any regression model, including
ordinary least squares, logistic regression models, and survival models.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Applied Regression Analysis (Wiley) by Draper, N.R., and Smith, N.
CE_05C
8:30 a.m.–5:00 p.m.
Title: Hot Topics in Clinical Trials
Cosponsors: Teaching Statistics in the Health Sciences, Boston Chapter of the
ASA
Instructors: Scott R. Evans, Lee-Jen Wei, Lu Tian, Lingling Li
Abstract:
We will address several hot-topic areas in clinical trials, including the use
of prediction to identify biomarkers, meta-analysis of rare safety events,
data monitoring committees, data monitoring using prediction, noninferiority
studies, causal inference, benefit:risk assessment, and bridging studies.
We will present motivating examples and discuss standard and novel approaches
to analyses.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Introduction to Statistical Methods for Clinical Trials (Chapman & Hall) by Cook, T.D., and Demets, D.L.
CE_06C
8:30 a.m.–5:00 p.m.
Title: Successful Data Mining in Practice
Instructors: Richard De Veaux
Abstract:
This course will introduce data mining, which is the exploration and analysis
of large data sets by automatic or semiautomatic means with the purpose of
discovering meaningful patterns. The knowledge learned from these patterns
can be used for decisionmaking via “knowledge discovery.” Much
exploratory data analysis and inferential statistics concern the same type
of problems, so what is different about data mining? What is similar? In
the course, I will attempt to answer these questions by providing a broad
survey of the problems that motivate data mining and the approaches used
to solve them.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
SUNDAY, AUGUST 3
CE_01C (two-day course Saturday and Sunday)
8:30 a.m.–5:00 p.m.
Title: Generalized Linear Mixed Models: Theory and Applications
Instructors: Oliver Schabenberger and Walter Stroup
CE_07C
8:00 a.m.–noon
Title: Design and Analysis of Epidemiologic Studies of Gene-Environment Interactions
Cosponsors: Section on Statistics in Epidemiology
Instructors: Raymond Carroll and Nilanjan Chatterjee
Abstract:
Most common human diseases have a multifactorial etiology involving a complex
interplay of genetic and environmental exposures. Understanding how genetic
and environmental exposures interact and jointly influence the risk of a
complex disease can be important for both biological and public health purposes.
We will present the state of the art of efficient design and analysis for
studies of gene-environment interaction by statisticians, epidemiologists,
and geneticists. Topics covered will include population- and family-based
case-control designs, stratified sampling designs, modern semiparametric
methods for analysis of case-control data, estimation of haplotype-environment
interactions, and flexible modeling approaches to empirical Bayes methods.
We will blend theory and applications with illustrations using real examples
and software implementation.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Applied Logistic Regression (Wiley) by Hosmer, D.W. and Lemeshow, S.
CE_08C
8:30 a.m.–5:00 p.m.
Title: Modern Practical Bayesian Clinical Trial Design
Cosponsors: Section on Bayesian Statistical Science
Instructors: Peter F. Thall and J. Kyle Wathen
Abstract:
We will cover practical Bayesian methods for clinical trial design and conduct.
Attendees should have a master’s degree in statistics, or equivalent
experience, and an understanding of elementary Bayesian concepts. There will
be numerous illustrations using actual clinical trials. Drawn from oncology,
examples will include methods for eliciting and calibrating priors, incorporating
historical data, and using computer simulation to establish a design’s
frequentist properties. The morning will cover phase I designs—including
dose-finding using the continual reassessment method and logistic regression,
finding optimal dose pairs of a two-agent combination, and accommodating
multiple toxicities—and phase II designs, including a paradigm for
monitoring multiple discrete outcomes, randomized phase II trials, monitoring
event times, hierarchical Bayesian methods for trials with multiple disease
subtypes, and using regression to account for patient heterogeneity. The
afternoon will cover phase I/II dose-finding based on efficacy-toxicity trade-offs,
optimizing schedule of administration, jointly optimizing dose and schedule,
adaptive randomization, a geometric approach to treatment comparison based
on two-dimensional parameters, and designs to evaluate multistage dynamic
treatment regimes.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Bayesian data Analysis (Chapman & Hall) by Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B.
CE_09C
8:30 a.m.–5:00 p.m.
Title: Statistical Challenges in Proteomics
Cosponsors: Biometrics Section
Instructors: Scott C. Schmidler
Abstract:
Proteomics is the next frontier in the rapidly evolving field of bioinformatics.
I will provide an introduction to the principal aims, technologies, and statistical
issues arising in structural and functional proteomics studies. Topics will
include experimental data sources (e.g., X-ray, NMR, mass spectrometry [MALDI,
SELDI, MS/MS], peptide arrays), statistical problems in structural proteomics
(e.g., molecular comparison and database search, classification of structures,
structure-based function prediction), and statistical problems in functional
proteomics (e.g., fragment identification, normalization and registration
of spectra, peak finding, sample comparison, classification and biomarker
identification).
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
CE_10C
8:30 a.m.–5:00 p.m.
Title: Systematically Improving Your Professional Practice
Instructors: Doug Zahn
Abstract:
The key to professional growth is becoming increasingly effective in the interactions
that constitute professional practice. Central to this growth are processes
for structuring effective interactions and dealing with breakdowns that inevitably
occur. Participants will enhance their professional development by learning
a process for engaging in effective interactions that focuses on five activities:
prepare, open, work, end, and reflect. While this knowledge is valuable,
by itself it is not enough to markedly improve your practice. Improvement
requires facing the uncomfortable fact that no matter how carefully we plan,
breakdowns inevitably occur. The second objective for the course is learning
a process for dealing with breakdowns. You will learn to use the processes
by doing, observing, and analyzing videos of three role-plays of three different
situations that arise in professional practice. (We will construct these
situations from pre-course information gathered from participants.) You will
participate once as “consultant,” once as “client,” and
once as observer of a role-play. It is amazing to discover your own responses
to difficult situations through the objective lens of the camera. Using the
tools in the workshop and your new awareness, you will have more effective
interactions when you return to work. Prerequisite: At least one year of
professional practice
FEES: M–$340 ($450), NM–$435
($545), S–$200 ($325)
CE_11C
8:30 a.m.–5:00 p.m.
Title: Principles of Statistical Design
Instructors: George Casella
Abstract:
We will cover the principles and practice of statistical design, paying attention
to the setup and implementation of an experiment and the underlying theory
that allows valid inferences. The course will begin with a review of the
basic tools for statistical design and the statistical package R. The more
common designs (e.g., factorial completely randomized designs, randomized
complete blocks) and their variations (e.g., Latin squares) will be covered.
Emphasis will be on designing the experiment to obtain the best inference
on treatment contrasts, and designs will be illustrated will real data problems.
We will focus on microarray designs and spend a lot of time on split plots
and their variations (e.g., strip plot, repeated measures). Finally, we will
move to confounding (e.g., incomplete blocks, fractions). This course is
aimed at professional-level statisticians or interested faculty and graduate
students.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
Attendees should have a working knowledge of statistical methodology and data analysis. Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Applied Regression Analysis: A Research Tool (Springer) by Rawlings, J.O., Pantula, S.G., and Dickey, D.A. or Applied Linear Statistical Models (Irwin Professional Pub) by Neter, J., Wasserman, W., Kutner, M.H., and Nachtsheim, C.J.
CE_12C
1:00 p.m.–5:00 p.m.
Title: Sampling in Networks
Cosponsor: Section on Survey Research Methods
Instructor: Steven K. Thompson
Abstract:
Network models are in increasing use to describe populations, including socially
networked human populations, computer and communication networks, and gene
regulatory networks. A network has nodes (e.g., people) and links (e.g.,
relationships between people). The nodes may have characteristics of interest,
and the relationships may be of different types and strengths. Network data,
however, generally represent a sample from the wider population network of
interest. This short course will cover methods for obtaining samples from
networks and using the sample data to make inference about characteristics
of the population network.
In many cases the only practical way to obtain a large enough sample from
the population is to follow links from sample individuals to add more individuals
to the sample. For example, in studies of the risk behaviors in people at risk
for HIV/AIDS, the population is hidden so standard sampling designs cannot
be applied. Instead, researchers follow social referrals from individuals in
the sample to find more members of the hidden population. Similarly, in studies
of the World Wide Web, links or connections from sites in the sample are followed
to add more sites to the sample. Network methods also turn out to be useful
for spatial sampling in environmental and ecological sciences where the populations
tend to be highly clustered or rare. Link-tracing sampling designs will be
described, together with design-based and Bayes methods for estimating population
characteristics based on such samples. Computational methods and available
software also will be described.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)
Course attendees should consider as a prerequisite for the course familiarity
with the subject at the level of: Sampling, 2nd Edition (Wiley) by Thompson,
S.K.
MONDAY, AUGUST 4
CE_13C
8:00 a.m.–noon
Title: Evaluating Probability of Success for Internal Decisionmaking in Early
Drug Development
Cosponsor: Biopharmaceutical Section
Instructors: Narinder Nangia, Martin King, and Jane Qian
See a Sneak Preview of this course
Abstract:
Early development (the “learning stage”) is a crucial period of
the drug development process, as decisions to continue or halt development
of a compound must be made with incomplete information. Relying solely on p-values
from phase I-II studies for making drug development milestone decisions is
an inefficient approach, as it ignores several important determinants of future
success. We will discuss the statistical tools that enable quantification of
the uncertainty associated with results coming from learning stage studies.
These tools use the Bayesian approach to exploit the totality of accumulated
data/knowledge in a formal way for internal decisionmaking in early drug development.
Posterior and/or predictive probabilities computed in a Bayesian paradigm are
easy to interpret and provide much more relevant information than p-values
for decisionmaking. We will also discuss evaluation of probability of a successful
phase III trial through clinical trial simulations. Examples from the CNS,
inflammation and oncology therapeutic areas will be considered for evaluation
of probability of success for drug candidates in meeting the target product
profile.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)
Participants should have a basic familiarity with hypothesis testing and inference (any first semester graduate course) and with Bayesian methods.
CE_14C
8:00 a.m.–noon
Title: U-Statistics for Scoring Multivariate Data: From Sports to Genetics
Instructors: Knut M. Wittkowski and Tingting Song
Abstract:
We will extend commonly used u-statistics for univariate and censored data
to multivariate data with innovative applications in sports, economics, sociology,
biology, and medicine. The course consists of four parts: stratification
as a means to improve McNemar-type tests for trio data in genetics (‘TDT’)
and adapt them to various genetic models; history of u-statistics; how information
about relationships between variables can be incorporated through transforming
data, converting data into partial orderings, and combining partial orderings;
and computational and statistical aspects of screening studies involving
thousands of variables (SNP or gene-expression microarrays) and nonparametric “factor
analyses.” Demonstrations will be based on spreadsheets, functions
from muStat (available from http://cran.r-project.org and http://csan.insightful.com),
and web services available from http://muStat.rockefeller.edu. Prerequisites:
Basic knowledge of statistics and programming.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Nonparametrics: Statistical Methods Based on Ranks (Holden-Day) by Lehmann, E.L.
CE_15C
8:30 a.m.–5:00 p.m.
Title: Analysis of Clinical Trials: Theory and Applications
Cosponsor: Biopharmaceutical Section
Instructors: Christy Chuang-Stein, Alex Dmitrienko, and Keaven Anderson
Abstract:
We will cover analysis of stratified data, multiple comparisons and multiple
endpoints, and interim analysis and interim data monitoring by presenting
practical advice from experts, offering a well-balanced mix of theory and
applications, and discussing regulatory considerations. The discussed statistical
methods will be implemented using SAS software, and clinical trial examples
will be used for illustration. This course is for statisticians working in
the pharmaceutical or biotechnology industries, as well as contract research
organizations. It is equally beneficial to statisticians working in institutions
that deliver health care and government branches that conduct health care–related
research. Attendees must have basic knowledge of clinical trials. Familiarity
with drug development is highly desirable, but not necessary.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
CE_16C
8:30 a.m.–5:00 p.m.
Title: Graphics of Large Data Sets
Cosponsor: Section on Statistical Graphics
Instructors: Antony Unwin and Heike Hofmann
Abstract:
Graphics are great for exploring data, but how can they be used for looking
at the large data sets commonplace today? Large data sets bring new complications
and require different emphases and approaches. In this course, based on Graphics
of Large Datasets, we will discuss how to look at ways of visualizing large
datasets, whether large in number of cases, number of variables, or both.
Data visualization is useful for data cleaning, exploring data, identifying
trends and clusters, spotting local patterns, evaluating modeling output,
and presenting results. It is essential for exploratory data analysis and
data mining. Data analysts, statisticians, and computer scientists should
benefit from attending this course. Participants are welcome to bring laptops
and should have knowledge of standard statistical graphics and experience
carrying out data analysis. Either the software Mondrian (which can be downloaded
from
stats.math.uni-augsburg.de/Mondrian/) or, if you use R, the R package iPlots
should be installed.
FEES: M–$365 ($475), NM–$460 ($570), S–$225 ($350)
CE_17C
8:30 a.m.–5:00 p.m.
Title: Statistical Evaluation of Medical Tests and Biomarkers for Classification
Cosponsor: Section on Statistics in Epidemiology
Instructors: Margaret S. Pepe, Holly Janes, and Todd Alonzo
Abstract:
Development of biomarkers and medical diagnostic devices has accelerated. Their
rigorous evaluation is a high priority for research, yet principles and techniques
for the design and analysis of these studies are not widely known. There
are fundamental differences among methods for therapeutic and etiologic studies.
Moreover, much basic methodology has developed recently. We will cover estimation
and comparison of Receiver Operating Characteristic (ROC) curves and describe
extensions to adjust for covariates that affect biomarker/test measurements.
For assessing factors associated with test performance, ROC regression methods
will be presented. We also will consider how to evaluate the benefit of a
new test when standard tests or clinical variables exist. Second, we will
consider the design of case-control studies most common in this field. Sample
size calculations and optimal choice of case-control ratio will be presented
and the attributes and limitations of matching controls to cases will be
discussed. Third, prospective studies will be considered. Finally, we will
discuss problems incurred when the gold standard reference test is, itself,
subject to error. A suite of freely available Stata programs will implement
analyses. Prerequisite: introductory statistics.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
CE_18C
8:30 a.m.–5:00 p.m.
Title: Computational Statistics: Methods for Monte Carlo Integration and Optimization
Cosponsor: Statistical Computing Section
Instructors: Jennifer A. Hoeting and Geof H. Givens
Abstract:
This course will consist of two parts: a morning session on Monte Carlo integration
strategies and an afternoon session on optimization methods. We will survey
a variety of techniques, ranging from classic to state-of-the-art. The course
will be based on Computational Statistics, and is aimed at quantitative scientists
and statisticians who are unfamiliar with these methods. Upper division undergraduate
mathematical literacy is recommended. Many problems in statistics require
the evaluation of integrals that cannot be solved analytically, particularly
in Bayesian statistics. We will cover Monte Carlo integration, importance
sampling and variance reduction techniques, and Markov chain Monte Carlo
methods. Optimization also plays a central role in statistics, particularly
in numerical maximum likelihood estimation. The afternoon session will cover
Newton-like methods, Gauss-Seidel iteration, tabu algorithms, simulated annealing,
genetic algorithms, and the EM algorithm and its variants. We seek to give
students a practical understanding of how and why existing methods work,
enabling them to use modern statistical methods effectively. We focus on
methodological concepts, and not details of computer programming. Examples
are drawn from diverse fields including bioinformatics, ecology, and medicine.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
The course will be based on the book "Computational Statistics," by G. Givens and J. Hoeting, Wiley.
CE_19C
1:00 p.m.–5:00 p.m.
Title: Methods of Identifying and Dealing with Overdispersed Regression Models
Instructor: Joseph Hilbe
Abstract:
We will define overdispersion in the context of binomial and count models and
specify the difference between apparent and real overdispersion and how to
identify each. We also will show methods that can be used to eradicate apparent
overdispersion from a model, as well as discuss methods used to deal with
real overdispersion.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Negative Binomial Regression (Cambridge University Press) by Hilbre, J.M.
CE_20C
1:00 p.m.–5:00 p.m.
Title: Adaptive Analysis of Data: Tests of Significance and Confidence Intervals
Instructor: Thomas W. O’Gorman
Abstract:
I will present several adaptive methods for the analysis of data, beginning
with a two-sample adaptive test, and then present an adaptive method of testing
any subset of coefficients in a multiple regression model. I will also describe
adaptive tests for interaction and main effects in the analysis of factorial
experiments and adaptive tests for slope. The advantage of adaptive tests
is that they are usually more powerful than the traditional tests for non-normal
error distributions. As there is little power loss with normal error distributions,
adaptive tests can be recommended for general use in studies having more
than 20 observations. For each adaptive test, we will compare its performance
to the traditional method, and I will show how to perform the test using
a SAS macro. Adaptive tests used in the analysis of repeated measurements
will be described and compared to the nonadaptive mixed model tests. In addition,
I will describe a method of computing adaptive confidence intervals.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)
Attendees should be familiar with basic statistical modeling—including
multiple regression and the analysis of variance. Course attendees should consider
as a prerequisite for the course familiarity with the subject at the level
of: Applied Regression Analysis (Wiley) by Draper, N.R. and Smith, H.
TUESDAY, AUGUST 5
CE_21C
8:00 a.m.–noon
Title: Analysis of Multivariate Failure Time Data
Instructor: Danyu Lin
Abstract:
Multivariate failure time data arise when each study subject can potentially
experience multiple events or when there exists clustering of subjects such
that failure times within the same cluster are correlated. Major complications
in analyzing such data include the dependence among related failure times
and censoring due to limited follow-up or competing events. This short course
presents a variety of statistical models and methods for the analysis of
multivariate failure time data. We will discuss marginal and frailty models,
paying primary attention to semiparametric regression methods. Relevant software
will be described, and a number of clinical and epidemiologic studies will
be provided for illustrations.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)
CE_22C
8:00 a.m.–noon
Title: Fundamental Statistics Concepts in Presenting Data: Principles for Constructing
Better Graphics
Cosponsor: Section on Statistical Graphics
Instructor: Rafe Donahue
Abstract:
Data displays are mental models for understanding distributions of data. At
the heart of any data display lays the distribution of the data; a model
for this distribution includes demonstrating and exposing sources of variation
in the distribution. Like a good map, a display of data ought to operate
on several levels. At the lowest level (the highest level of granularity)
are the data, themselves. Further up are the actual distributions, each with
its component summaries, such as the mean or relevant quantiles. At the highest
level are sources of variation in these distributions, the parameters in
the (mental) model for understanding the data. The closer an architect can
come to showing all these levels, the more information will be conveyed.
I will present a number of principles, both developed by the masters (e.g.,
Minard, Tufte, Cleveland, Wilkinson, Wainer) and discovered by me, for constructing
displays that will allow the architect of the data display to present the
data for improved understanding; it will not be a “Don’t use
pie charts” or “Here’s a bad graph from USA Today” course.
We will focus on uncovering and formulating principles for presenting data
visually. Examples will abound.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Visual Display of Quantitative Information (Graphics Press) by Tufte, E.
CE_23C
8:30 a.m.–5:00 p.m.
Title: Bayesian Methods and Software for Data Analysis
Cosponsor: Section on Bayesian Statistical Science
Instructors: Bradley P. Carlin and Thomas A. Louis
See a Sneak Preview of this course
Abstract:
This course will introduce hierarchical and empirical Bayes methods, demonstrate
their usefulness in challenging applied settings, and show how they can be
implemented using modern Markov chain Monte Carlo (MCMC) computational methods.
We will provide an introduction to and live demonstration of WinBUGS, the
most general Bayesian software package available to date, and BRugs, a convenient
function for calling BUGS from R. Use of the methods will be demonstrated
in advanced high-dimensional model settings (e.g., nonlinear longitudinal
modeling or spatiotemporal estimation and mapping), where the MCMC Bayesian
approach often provides the only feasible alternative incorporating all relevant
model features. Participants should have an MS (or advanced undergraduate)
understanding of mathematical statistics at the Hogg and Craig (1978) or
Casella and Berger (2001) level. Basic familiarity with common statistical
models (e.g., the linear regression model) and computing will be assumed,
but we will not assume significant previous exposure to Bayesian methods
or Bayesian computing. This course is aimed at students and practicing statisticians
who are intrigued by all the fuss about Bayes and Gibbs, but who may still
mistrust the approach as theoretically mysterious and practically cumbersome.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
CE_24C
8:30 a.m.–5:00 p.m.
Title: Models for Discrete Repeated Measures
Instructors: Geert Verbeke and Geert Molenberghs
See a Sneak Preview of this course (please note that the course has been moved to Tuesday, August 5, since this recording)
Abstract:
Starting from a brief introduction to the linear mixed model for continuous
longitudinal data, we will formulate extensions to model outcomes of a categorical
nature, including counts and binary data. Based on Verbeke and Molenberghs
(2005), several families of models will be discussed and compared, from an
interpretational and computational point of view. First, we will discuss
models for the full marginal distribution of the outcome vector. Such models
allow inference to be based on maximum likelihood principles, but they have
the disadvantage of requiring complete specification of all higher-order
interactions. We will talk about two alternatives: random-effects models
and semiparametric marginal models with specification of the first moments
only, or the first and second moments only. We will discuss and illustrate
in full detail estimation and inference, and we will extensively argue that
both approaches yield parameters with completely different interpretations.
Finally, when analyzing longitudinal data, one is often confronted with missing
observations. We will show that, if no appropriate measures are taken, missing
data can cause seriously biased results and interpretational difficulties.
Methods to properly analyze incomplete data, under flexible assumptions,
will be presented and key concepts of sensitivity analysis will be introduced.
FEES: M–$340 ($450), NM–$435 ($545), S–$200
($325)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Linear Mixed Models for Longitudinal Data (Springer) by Verbeke, G. and Molenberghs, G.
CE_25C
8:30 a.m.–5:00 p.m.
Title: Mixed Models for the Practicing Statistician
Cosponsor: Statistics and the Environment
Instructors: Linda Young and Ramon Littell
Abstract:
Data sets from designed experiments, sample surveys, and observational studies
often contain correlated observations due to random effects and repeated
measures. Mixed models can be used to accommodate the correlation structure,
produce efficient estimates of means and differences between means, and provide
valid estimates of standard errors. Repeated measures and longitudinal data
require special attention because they involve correlated data that arise
when the primary sampling units are measured repeatedly over time or under
different conditions. We will use normal theory models for random effects
and repeated measures ANOVA to introduce the concept of correlated data.
We will then extend these models to generalized linear mixed models for the
analysis of non-normal data, including binomial responses, Poisson counts,
and over-dispersed count data. We will discuss methods of assessing the fit
and deciding among competing models. Radial smoothing splines can be represented
as mixed models, and we will illustrate their application. We will illustrate
PROC GLIMMIX in the SAS system using practical examples from pharmaceutical
trials, environmental studies, educational research, and laboratory experiments.
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: An Introduction to Statistical Methods and Data Analysis (Duxbury) by Ott, R.L. and Longnecker, M.T.
CE_26C
8:30 a.m.–5:00 p.m.
Title: Multiple Imputation of Missing Data
Instructor: Paul Allison
Abstract:
This course will cover both the conceptual foundations and practical details
of implementing multiple imputation. Conventional methods for handling missing
data typically yield biased estimates and/or incorrect standard errors. By
contrast, multiple imputation produces estimates with nearly optimal properties
under weaker assumptions. I will explain the assumptions of “missing
at random” and “missing completely at random.” After a
brief review of conventional methods, we will consider multiple imputation
based on linear regression with random draws. We will examine implementation
using the MCMC algorithm in SAS PROC MI in detail, and then move on to the
role of the dependent variable, imputation under a restricted range, imputation
of categorical variables, multivariate inference, interactions and nonlinearities,
congeniality of data model and imputation model, longitudinal data, nonignorable
missing data, and imputation by chained equations (demonstrated using the
ice command in Stata).
FEES: M–$340 ($450), NM–$435 ($545), S–$200 ($325)
Course attendees should consider as a prerequisite for the course familiarity with the subject at the level of: Introduction to Linear Regression Analysis (Wiley) by Montgomery, D.C., Peck, E.A., and Vining, G.G.
CE_27C
1:00 p.m.–5:00 p.m.
Title: Meta-analysis: Statistical Methods for Combining the Results of Independent
Studies
Instructor: Ingram Olkin
Abstract:
Meta-analysis enables researchers to synthesize the results of a number of
independent studies designed to determine the effect of an experimental protocol,
such as an intervention, so the combined weight of evidence can be considered
and applied. Increasingly, meta-analysis is being used in the health sciences,
education, and economics to augment traditional methods of narrative research
by systematically aggregating and quantifying research literature. The information
explosion in almost every field coupled with the movement toward evidence-based
decisionmaking and cost-effective analysis has served as a catalyst for the
development of procedures to synthesize the results of independent studies.
In this course, I will provide a historical perspective of meta-analysis
and discuss some of its issues. The statistical methodology will include
discussions of nonparametric and parametric models, effect sizes for proportions,
fixed versus random effects, regression, and ANOVA models. New material on
multivariate models also will be presented.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)
Course attendees should have a basic understanding of statistics including regression.
CE_28C
1:00 p.m.–5:00 p.m.
Title: Analysis of Censored Health Outcomes Data: Developments for the Last
10 Years
Cosponsors: Health Policy Statistics Section, Biopharmaceutical Section
Instructors: Hongwei Zhao and Heejung Bang
Abstract:
Medical cost and quality-adjusted lifetime are common health outcomes data
from clinical trials and observational studies. Although these data look
different, they share many statistical properties and can be understood in
a unified framework. Just like standard survival data, censoring is an important
issue in these data. Despite the analogy, censoring mechanism is informative,
different from the traditional paradigm. It has been a decade since it was
shown that the use of most standard statistical techniques (e.g., sample
mean, linear regression, and Kaplan-Meier estimator) can be invalid. However,
we often find that even experienced researchers still use traditional methods
for the analysis of health outcome data in practice. In this course, we will
review valid methods for statistical estimation and inference that have been
developed in last 10 years. Unfortunately, not all are easy or user-friendly,
and no commercial software is available so far. Therefore, we will suggest
methods as practical solutions for practitioners. We also will present the
analytic relationships among well-known medical cost estimators recently
identified. Extended applications to customer lifetime value and cost-effectiveness
analysis will be discussed. Course prerequisite is basic knowledge of survival
analysis.
FEES: M–$210 ($285), NM–$275 ($345), S–$125 ($200)
Course attendees should consider as a prerequisite a basic knowledge of survival
analysis at the level of: Survival Analysis (Springer) by Klein, J.P. and Moeschberger,
M.L.
WEDNESDAY, AUGUST 6
Computer Technology Workshops
CE_29T
8:00 a.m.–9:45 a.m.
Title: Meta-analysis: Concepts and Applications
Instructors: Michael Borenstein and Hannah R. Rothstein
Abstract:
Meta-analysis is a set of statistical procedures to synthesize data from multiple
studies. When the studies share a common effect size, the meta-analysis yields
a more precise estimate of that effect than any single study, and when the
effect varies from one study to the next, meta-analysis may be used to explain
the variation. Meta-analyses are used to inform policy, obtain approval for
drugs, and design research. They also play a key role in grant applications
and publications. We will explain the concept of meta-analysis and show how
to compute treatment effects and a combined effect, assess heterogeneity,
and explain variation in treatment effects across studies. We will discuss
the difference between fixed and random effects models and address common
criticisms of meta-analysis. We will demonstrate Comprehensive Meta Analysis
Version 2, a program developed with funding from the NIH. This course is
intended for people who perform or interpret meta-analyses. Attendees should
have some familiarity with meta-analysis, but the course will cover the basics
before moving on to advanced topics. FEE: $50
CE_30T
8:00 a.m.–9:45 a.m.
Title: Determining Sample Size and Power in Study Planning: nQuery Advisor
7.0
Instructors: Janet D. Elashoff and Brian Sullivan
Abstract:
Choosing an adequate sample size is a vital part of study planning. We will
review statistical methods for determining study sample size and power. Using
nQuery Advisor with real examples, we will demonstrate the steps in sample
size determination from specifying the design to writing a sample size justification
statement. We will provide tips for the toughest problem in sample size determination—eliciting
the information needed to specify “effect” sizes and “guesstimate” standard
deviations—and we encourage discussion. We will illustrate sample size
planning for survival studies with user specified hazard ratios and illustrate
the effects of accrual and dropout patterns on required sample size. We will
show the relationships between sample size methods for tests, confidence
intervals, and noninferiority and equivalence studies. We will discuss the
logistic and power issues of unequal n’s and stratification and show
how to make the last step in study planning, the creation of randomization
lists, easy. Attendees should be experienced in the use of data analysis
methods commonly taught in master’s programs in statistics. FEE: $50
CE_31T
8:00 a.m.–9:45 a.m.
Title: An Introduction to Stat Studio for SAS/STAT Users
Instructor: Rick Wicklin
Abstract:
Stat Studio 3.1 is new statistical software in SAS 9.2. It provides a highly
flexible programming environment in which you can run SAS/STAT and SAS/IML
analyses and display the results with dynamically linked graphics and data
tables. You can also call SAS procedures from an IML program. Stat Studio
is intended for data analysts who write SAS programs to solve statistical
problems but need more versatility for data exploration and model building.
This workshop introduces Stat Studio to SAS/STAT users. You will learn how
to use the point-and-click features of Stat Studio for analyzing data interactively,
write programs that use interactive graphics to display diagnostic statistics
computed by SAS/STAT procedures for model assessment and outlier identification,
and write programs that implement modern statistical methods, such as bootstrap
algorithms and nonparametric smoothing techniques.
Attendees should have basic knowledge of SAS/STAT procedures such as FREQ,
REG, and LOGISTIC. Experience with SAS/IML and object-oriented programming
is helpful, but not required. FEE: $50
CE_32T
8:00 a.m.–9:45 a.m.
Title: From Software to Solutions in Statistics and Risk Analysis
Instructor: Shawn Harahush
Abstract:
The world of business and education has become more complex with the decision
of the type of software a business will use to successfully manage their
incoming data. Palisade, a world leader in risk analysis, has been creating
software solutions for more than 20 years. Palisade’s flagship product,
@RISK, integrates into Microsoft Excel to provide a powerful Monte Carlo
simulation engine to the ease-of-use environment. Palisade’s StatTools
also integrates with Excel to provide reliable and easy-to-use statistics
to an easily accessible program. NeuralTools adds sophisticated neural networks
analysis into an easy-to-use and familiar interface: Microsoft Excel. FEE: $50
CE_33T
10:00 a.m.–11:45 a.m.
Title: EastAdapt: A Module for Late Stage Adaptive Trial Design Within the
East® 5 Software System
Instructor: Cyrus Mehta
Abstract:
We will demonstrate EastAdapt®, a major upgrade of the adaptive design
module of East that is used for designing and simulating late-stage (phase
II and phase III) clinical trials. EastAdapt makes it possible to design clinical
trials with a data-dependent mid-course correction to sample size, spending
function, and number of future interim analyses and their spacing without inflating
the type I error. EastAdapt’s simulations are used to determine the operating
characteristics of the adaptive design and compare them to those of a classical
group sequential design. A major new capability is the ability to compute valid
p-values, point estimates, and confidence intervals at the end of the adaptive
clinical trial. Another major new EastAdapt capability is the ACR Method for
performing the adaptive hypothesis test. With the ACR method, one can use the
usual sufficient statistic, rather than the Cui, Hung and Wang (1999) weighted
statistic to determine statistical significance. FEE: $50
CE_34T
10:00 a.m.–11:45 a.m.
Title: Survey Data Analysis with Stata
Instructor: Jeffrey Pitblado
Abstract:
This workshop will cover how to use Stata for survey data analysis assuming
a fixed population. Knowledge of Stata is not required, but attendees should
have some statistical knowledge, such as what is typically covered in an
introductory statistics course. We will begin by reviewing the sampling methods
used to collect survey data and how they affect the estimation of totals,
ratios, and regression coefficients. We will then cover the three variance
estimators implemented in Stata’s survey estimation commands. Stata
with a single sampling unit, certainty sampling units, subpopulation estimation,
and poststratification will also be covered. Each topic will be illustrated
with an example in a Stata session. FEE: $50
CE_35T
10:00 a.m.–11:45 a.m.
Title: Nonparametric Regression Modeling in SAS Software
Instructor: Weijei Cai
Abstract:
Nonparametric modeling is widely employed in modern statistical analysis in
cases where only limited knowledge of the underlying model is available.
You can use nonparametric modeling to discover nonlinear dependencies in
your data, enabling you to develop parsimonious parametric models. This workshop
is intended for a broad audience of statisticians and data analysts who are
interested in nonparametric regression modeling. In it, I will describe methods
and SAS tools for fitting local regression models with the LOESS procedure,
penalized spline models with the TRANSREG procedure, thin-plate spline models
with the TPSPLINE procedure, generalized additive models with the GAM procedure,
penalized spline and radial basis function models using a mixed model approach
with the GLIMMIX procedure, and selected basis functions models with the
GLMSELECT procedure. The audience should have a basic understanding of regression
theory. FEE: $50
CE_36T
10:00 a.m.–11:45 a.m.
Title: Introduction to CART: Data Mining with Decision Trees
Instructor: Mikhail Golovnya
Abstract:
This course, intended for the applied statistician wanting to understand and
apply the CART methodology for tree-structured nonparametric data analysis,
will emphasize practical data analysis involving classification. All concepts
will be illustrated using real-world examples. The course will begin with
an intuitive introduction to tree-structured analysis. Working through examples,
we will review how to read CART output and set up basic analysis. This session
will include performance evaluation of CART trees and cover ways to search
for possible improvements of the results. Once a basic working knowledge
of CART has been mastered, we will focus on critical details essential for
advanced CART applications, including choice of splitting criteria, choosing
the best split, using prior probabilities to shape results, refining results
with differential misclassification costs, the meaning of cross validation,
tree growing, and tree pruning. The course will conclude with discussion
of the comparative performance of CART versus other computer-intensive methods,
such as artificial neural networks and statistician-generated parametric
models. FEE: $50
CE_37T
1:00 p.m.–2:45 p.m.
Title: New Software for the Design, Analysis and Reporting of Bioequivalence
and Clinical Pharmacology Trials
Instructors: Yannis Jemiai and Pralay Senchuadhuri
Abstract:
Cytel Inc. introduces a software package for the design, analysis, and reporting
of early phase clinical pharmacology trials, as well as pivotal and nonpivotal
bioequivalence trials. Key development members from Cytel Inc. will demonstrate
how to quickly design parallel and crossover clinical trials with superiority,
noninferiority, or equivalence objectives; create, import, and explore data
sets; produce and compare analyses and plots; and construct standardized
templates to generate standardized reports of your work
FEE: $50
CE_38T
1:00 p.m.–2:45 p.m.
Title: New Procedures and Features for Clustered and Survey Data Analysis in
SUDAAN® Release 10
Instructors: Angela Pitts and G. Gordon Brown
Abstract:
This workshop will highlight two new procedures and several new features in
SUDAAN Release 10, which will be available in early August 2008. SUDAAN is
a statistical software package for the analysis of complex survey and other
cluster-correlated data. We will focus on the new PROC HOTDECK procedure
that conducts sequential weighted hot deck imputation; the PROC WTADJUST
procedure that computes weight adjustments; and the addition of model-adjusted
risk ratios, a test for the proportional odds assumption, exponentiated point
estimates defined by EFFECTS statement contrasts, a SORTED option on the
NEST statement, the use of character variables in all procedures, and several
enhancements to the PRINT statement. The workshop will include a brief introduction
to SUDAAN syntax. Attendees are not required to be SUDAAN users, but should
have knowledge of statistical issues that arise when analyzing survey and
other correlated data. The new SUDAAN features will be demonstrated on complex
survey data. We will demonstrate proper implementation of SUDAAN, provide
interpretation of the output, and discuss statistical issues related to the
data. All course material, including a 30-day trial version of SUDAAN Release
10, will be provided. FEE: $50
CE_39T
1:00 p.m.–2:45 p.m.
Title: Introduction to Bayesian Analysis Using SAS Software
Instructor: Fang Chen
Abstract:
Bayesian methods have become increasingly popular in recent years in a number
of disciplines. This workshop will provide an introduction to Bayesian methods
with applications in the generalized linear model and survival analysis.
The first part will provide an overview of Bayesian methodology, including
motivation and Bayesian inference, and computational methods and convergence
diagnostics relevant to the SAS implementation. The second part will cover
applications using new capabilities in SAS/STAT software in the GENMOD, LIFEREG,
and PHREG procedures, which are based on Gibbs sampling. Examples will include
linear regression, logistic regression, Poisson regression, Cox regression,
parametric survival models, and the piecewise exponential model. Note that
these enhanced procedures are available in the newly available SAS 9.2.
A master’s-level knowledge of statistics is assumed, as well as experience
with generalized linear models and survival analysis. Previous exposure to
Bayesian methods is useful, but not required. FEE: $50
CE_40T
1:00 p.m.–2:45 p.m.
Title: Introduction to MARS: Predictive Modeling with Nonlinear Automated Regression
Tools
Instructor: Mikhail Golovnya
Abstract:
This workshop will introduce the main concepts behind Jerome Friedman’s
MARS, a modern regression tool that can help analysts quickly develop superior
predictive models. MARS is a nonlinear automated regression tool that can trace
complex patterns in data. It automates the model specification search, including
variable selection, variable transformation, interaction detection, missing
value handling, and model validation. Conventional regression models typically
fit straight lines to data. Although this usually oversimplifies the data structure,
the approximation is sometimes good enough for practical purposes. However,
in the frequent situations in which a straight line is inappropriate, an expert
modeler must search tediously for transformations to find the right curve.
MARS approaches model construction more flexibly, allowing for bends, thresholds,
and other departures from straight lines from the beginning. Attendees will
be presented with MARS’ key benefits. FEE: $50
CE_41T
3:00 p.m.–4:45 p.m.
Title: Exact Methods Module for East® 5: Design, Simulate, Analyze, and
Monitor Binomial Endpoint Trials by Exact Inference Methods
Instructors: Anthiyur Kannappan and Pralay Senchuadhuri
Abstract:
We will present a newly added special module, “Exact Methods” for
East® 5, to design, simulate, analyze, and monitor binomial endpoint trials
by exact inference methods. This module includes procedures for Simon’s
two-stage optimal, one sample (group sequential), paired proportions, two sample
superiority (difference, ratio, Fisher’s), two sample noninferiority
(difference, ratio), and two sample equivalence. This module will be especially
suitable to situations where the sample sizes are not expected to be large.
The usual features of East®—boundary chart, enhanced simulation,
and interim monitoring capability—are available for the procedures in
this module. The additional feature is the ability to input 2x2 data in the
interim monitoring sheet and get the exact inference method results there.
FEE: $50
CE_42T
3:00 p.m.–4:45 p.m.
Title: Structural Analysis of Time Series Using the SAS/ETS UCM Procedure
Instructor: Rajesh Seluker
Abstract:
This workshop will introduce the SAS/ETS UCM procedure, which enables analysis
of time series data by using structural models. Structural models provide
regression-like decomposition of the response series into components such
as trend, seasonal or other periodic, and linear and nonlinear regression
effects. Apart from the series forecasts, this methodology provides estimates
of these unobserved components, which are useful in practical decisionmaking.
Participants will learn to identify, diagnose, and use structural time series
models for time series data in a variety of situations. The course will cover
novel time series techniques, including approximation of long and complex
seasonal patterns by using splines and incorporation of linear and nonlinear
regression effects with time varying coefficients. Several real-life examples
will be used to demonstrate the functionality of the UCM procedure. Participants
also will learn the relationship between the ARIMA models—another class
of models widely used for analyzing time series data—and structural
models. FEE: $50
CE_43T
3:00 p.m.–4:45 p.m.
Title: Advances in Data Mining: Jerome Friedman’s TreeNet/MART and Leo
Breiman’s Random Forests
Instructor: Mikhail Golovnya
Abstract:
This workshop will present Leo Breiman’s Random Forests and Jerome Friedman’s
TreeNet/MART. Random Forests and MART/TreeNet are advances to classification
and regression tree software, which enable the modeler to construct predictive
models of extraordinary accuracy. Random Forest is a tree-based procedure that
makes use of bootstrapping and random feature generation. In TreeNet, classification
and regression models are built gradually through a potentially large collection
of small trees, each of which improves on its predecessor through an error-correcting
strategy. I will show how the software is used to solve real-world data mining
problems, discuss theory and what is novel in the software, highlight implementation,
compare the two methodologies, and show where the software fits in terms of
other data mining software. FEE: $50
Key Dates
-
August 2 – 7, 2008
Onsite registration (increased fees apply) - August 15, 2008 - Online submission of JSM Proceedings will open.
- October, 27, 2008 - JSM Proceedings online submissions and editing will
close.