# Program > Add-Ons

JSM sessions which require ticket purchase have limited availability and therefore are subject to sell-out or cancellation. Below are the functions which still have availability. Although this list is updated in real-time, please bear in mind that tickets are sold online around the clock; if you plan to purchase a function ticket onsite and see the function on this list before you travel to JSM, we cannot guarantee it will still be available for purchase when you arrive at JSM. To find out how many tickets remain for a particular function, please contact the ASA at (703) 684-1221

## Available Add-Ons

- Continuing Education and Computer Technology Workshops
- Monday Roundtables and Speaker Luncheons
- Tuesday Roundtables and Speaker Luncheons
- Wednesday Roundtables and Speaker Luncheons

#### Continuing Education and Computer Technology Workshops

**CE_01C (TWO-DAY COURSE) Foundations and Recent Advances in Longitudinal and Incomplete Data and in Joint Modeling**

Instructor(s): *Geert Molenerghs and Dimitris Rizopoulos*

*2011 Excellence-in-CE Award Winner

We first present linear mixed models for continuous hierarchical data. The focus lies on the modeler's perspective and on applications. Emphasis will be on model formulation, parameter estimation, and hypothesis testing, as well as on the distinction between the random-effects (hierarchical) model and the implied marginal model. Apart from classical model building strategies, many of which have been implemented in standard statistical software, a number of flexible extensions and additional tools for model diagnosis will be indicated. Second, models for non-Gaussian data will be discussed, with a strong emphasis on generalized estimating equations (GEE) and the generalized linear mixed model (GLMM). To usefully introduce this theme, a brief review of the classical generalized linear modeling framework will be presented. Similarities and differences with the continuous case will be discussed. The differences between marginal models, such as GEE, and random-effects models, such as the GLMM, will be explained in detail. Third, when analyzing hierarchical and longitudinal data, one is often confronted with missing observations, i.e., scheduled measurements have not been made, due to a variety of (known or unknown) reasons. It will be shown that, if no appropriate measures are taken, missing data can cause seriously jeopardize results, and interpretation difficulties are bound to occur. Methods to properly analyze incomplete data, under flexible assumptions, are presented. Fourth, a selection of contemporary an highly relevant advances will be discussed:

o The joint modeling of longitudinal and time-to-event outcomes;

o Flexible modeling strategies for models with non-normally distributed random effects;

o Recent advances in model diagnostics;

o Modeling, fitting, and inferential strategies for (high-dimensional) multivariate longitudinal data;

o The use of longitudinal data for discrimination and classification;

o Robust and doubly robust estimation for incomplete data based on semi-parametric modeling (generalized estimating equations and pseudo-likelihood);

o Strategies to undertake sensitivity analysis when data are incomplete, with an eye on both theoretical development as well as the regulatory framework for clinical trials and related studies.

Throughout the course, it is assumed that the participants are familiar with basic statistical modeling, including linear models (regression and analysis of variance), as well as generalized linear models (logistic and Poisson regression). Moreover, pre-requisite knowledge should also include general estimation and testing theory (maximum likelihood, likelihood ratio). All developments will be illustrated with worked examples using the SAS System.

**CE_05C Introduction to Analysis of Extremes: Univariate and Multivariate Cases**

Cosponsor: *Section on Statistics and the Environment*

Instructor(s): *Daniel Cooley*

Assessing risk associated with extreme events, requires an accurate description of a distribution's tail and may require the researcher to extrapolate into the tail beyond the range of the data. This course will introduce the ideas and techniques involved in the analysis of extremes. The first half of the course will be devoted to the analysis of univariate data and the second half to the analysis of multivariate data. Extreme value analyses are based on fundamental results from probability theory which provide distributions appropriate for modeling the tail. The initial portion of the course will be devoted to introducing these fundamental results, and this will be done via demonstrations and examples (rather than mathematical proofs) so that the attendees develop some intuition for the underlying theory. Attention will then turn to statistical analysis of extreme data and the techniques used to describe the tail. The multivariate portion of the course will largely focus on how dependence is described for extremes: via an angular measure rather than via correlation. The target audience is quantitative scientists and statisticians who are unfamiliar with these methods. Upper-division undergraduate mathematical literacy is required, and basic understanding of mathematical statistics is recommended.

**CE_07C Statistical Computing For Big Data**

Cosponsor: *Section on Statistical Learning and Data Mining*

Instructor(s): *Liang Zhang and Deepak Agarwal*

Massive data gets generated, stored and analyzed every day in various fields like bioinformatics, climatology, internet, telecommunications, and many more. Hadoop, as a distributed file storage and computing system, has become the most popular distributed system in the world. Statistical methods for analyzing such large scale data sets have become a challenging research area. The objective of this tutorial is to provide a detailed introduction of the open-source Hadoop system that uses Map-Reduce framework, and more importantly, to illustrate the use of Map-Reduce and Hadoop for real statistical applications, starting from basics like computing mean and variances, to more complicated scenarios such as fitting a large scale logistic regression on hundreds of gigabytes of data. Through this tutorial, the audience will learn Hadoop and Map-Reduce as a tool for statistical analysis and contribute to the research of statistical methods for big data. No prior knowledge of Hadoop or Map-Reduce is required.

**CE_08C Practical Bayesian Computation**

Cosponsor: *Section for Statistical Programmers and Analysts*

Instructor(s): *Fang Chen*

This one-day course reviews the basic concepts of Bayesian inference and focuses on the practical use of Bayesian computational methods. The objectives are to familiarize statistical programmers and practitioners with the essentials of Bayesian computing, and to equip them with computational tools through a series of worked-out examples that demonstrate sound practices for a variety of statistical models and Bayesian concepts. The first part of the course will review differences between classical and Bayesian approaches to inference, fundamentals of prior distributions, and concepts in estimation. The course will also cover MCMC methods and related simulation techniques, emphasizing the interpretation of convergence diagnostics in practice. The rest of the course will take a topic-driven approach that introduces Bayesian simulation, analysis, and illustrates the Bayesian treatment of a wide range of statistical models using software with code explained in detail. The course will present major applications areas and case studies, including multi-level hierarchical models, multivariate analysis, non-linear models, meta-analysis, and survival models. Special topics that are discussed include Monte Carlo simulation, sensitivity analysis, missing data, model assessment and selection, variable subset selection, and prediction. The examples will be done using SAS (PROC MCMC), with a strong focus on technical details. Attendees should have a background equivalent to an M.S. in applied statistics. Previous exposure to Bayesian methods is useful but not required. Familiarity with material at the level of this text book is appropriate: Probability and Statistics (Addison Wesley), DeGroot and Schervish.

**CE_09C Recent Advances in Bayesian Adaptive Clinical Trial Design**

Cosponsor: *Section on Bayesian Statistical Science*

Instructor(s): *Peter Thall and Brian Hobbs*

This one-day short course will cover a variety of recently developed Bayesian methods for the design and conduct of adaptive clinical trials. Emphasis will be on practical application, with the course structured around a series of specific illustrative examples. Topics to be covered will include: (1) using historical data in both planning and adaptive decision making during the trial; (2) using elicited utilities or scores of different types of multivariate patient outcomes to characterize complex treatment effects; (3) characterizing and calibrating prior effective sample size; (4) monitoring safety and futility; (5) eliciting and establishing priors; and (6) using computer simulation as a design tool. These methods will be illustrated by actual clinical trials, including cancer trials involving chemotherapy for leukemia and colorectal cancer, stem cell transplantation, and radiation therapy, as well as trials in neurology and neonatology. The illustrations will include both early phase trials to optimize dose, or dose and schedule, and randomized comparative phase III trials.

**CE_10C Applied Multiple Imputation in R**

Instructor(s): *Stef van Buuren*

Missing data seriously complicate the statistical analysis of data. Multiple imputation is a general and statistically valid technique to analyse incomplete data. Creating good multiple imputations in real data requires a flexible methodology that is able to mimic distinctive features in the data. Fully conditional specification (FCS) is the cutting edge of imputation technology for multivariate missing data. The course will explain the principles of FCS, outline a step-by-step approach toward creating high quality imputations, and provide guideline how the results can be reported. Specific topics include: imputation of mixed continuous-categorical variables, influx/outflux missing data patterns, assessment of convergence, compatibility, predictor selection, derived variables, multilevel data, diagnostics, increasing robustness, imputation under MNAR and reporting guidelines. All computations are done by the authors' MICE package in R. The lectures will follow the book "Flexible Imputation of Missing Data" by Stef van Buuren (Chapman & Hall, 2012). Prerequisites include familiarity to basic statistical concepts and techniques, and elementary R programming skills.

**CE_11C Statistical Evaluation of Prognostic Biomarkers**

Cosponsor: *Biometrics Section*

Instructor(s): *Patrick Heagerty and Paramita Saha-Chaudhuri*

Longitudinal studies allow investigators to correlate changes in time-dependent exposures or biomarkers with subsequent health outcomes. The use of baseline or time-dependent markers to predict a subsequent change in clinical status such as transition to a diseased state requires the formulation of appropriate classification and prediction error concepts. Similarly, the evaluation of markers that could be used to guide treatment require specification of operating characteristics associated with use of the marker. The first part of this course will introduce predictive accuracy
concepts that allow evaluation of time-dependent sensitivity and specificity for prognosis of a subsequent event time. We will overview options that are appropriate for both baseline markers and for longitudinal markers. Methods will be illustrated using examples from HIV and cancer research. The second part of this course will involve a technology workshop that will introduce the R packages (survivalROC, risksetROC, compriskROC) that are currently available for predictive accuracy of survival model. This segment will include hands-on training and demonstration of how to use these R packages for answering research questions. Several real-data examples for analysis will be provided and the
instructors will discuss implementation and interpretation.

**CE_12C (HALF-DAY COURSE) Crowdsourcing for Statisticians**

Cosponsor: *Section on Statistical Learning and Data Mining*

Instructor(s): *Lyle Ungarand and Adam Kapelner*

Crowdsourced applications to scientific problems using platforms such as Amazon's Mechanical Turk, is a hot research area, with over 10,000 publications in the past five years. The crowd's vast inexpensive supply of intelligent labor allows people to attack problems that were previously impractical and gives potential for detailed scientific inquiry of social, psychological, economic, and linguistic phenomena via massive sample sizes of human annotated data. It also raises a number of interesting statistical issues. We introduce crowdsourcing and describe how it is being used in both industry and academia. We explain how academic applications collect data for both (a) creating labels in a training set that can later be used in machine learning and (b) experiments that investigate the effect of a manipulation on subject behavior. We present case studies for both categories collecting (a) labeled data for use in natural language processing and (b) experimental data in the context of psychology. We end with a special section on the potential of the crowdsourcing platform to investigate issues in Statistics. This should be of interest to researchers who would like to learn about designing crowdsourcing applications and analyzing crowdsourced data; no prior exposure is required.

**CE_13C (HALF-DAY COURSE) Techniques for Simulating Data in SAS**

Cosponsor: *Section for Statistical Programmers and Analysts*

Instructor(s): *Rick Wicklin*

Simulating data is a fundamental technique in statistical programming. To assess statistical methods, you often need to create data with known properties, both random and nonrandom. This workshop presents intermediate-level algorithms and techniques for simulating data from:

o mixture distributions

o multivariate distributions with a given correlation

o distributions with arbitrary marginal distributions and correlation structure

o distributions of correlation matrices

o regression models with fixed and random effects

o basic spatial models

o distributions with central moments that match the sample moments of real data

This workshop is intended for practicing statisticians who need to simulate data efficiently in SAS. Examples are present using the SAS system, specifically the DATA step and SAS/IML® software. It is assumed that the student is familiar with basic simulation at the level of Chapters 1-4 of Simulating Data with SAS (Wicklin, 2013). This course covers material in Chapters 5-11 of the same book.

**CE_14C Analysis of Clinical Trials: Theory and Applications**

Cosponsor: *Biopharmaceutical Section*

Instructor(s): *Alex Dmitrienko, Devan Mehrotra, and Jeff Maca*

The course covers six important topics that commonly face statisticians and research scientists conducting clinical research: analysis of stratified trials, analysis of longitudinal data with dropouts and potential outliers, analysis of time-to-event data (with emphasis on small trials), crossover trials, multiple comparisons and multiple endpoints, and interim decision making and adaptive designs. The course offers a well-balanced mix of theory and applications. It presents practical advice from experts and discusses regulatory considerations. The discussed statistical methods will be implemented using SAS and R software. Clinical trial examples will be used to illustrate the statistical methods. The course is designed for statisticians working in the pharmaceutical or biotechnology industries as well as contract research organizations. It is equally beneficial to statisticians working in institutions that deliver health care and government branches that conduct health-care related research. The attendees are required to have basic knowledge of clinical trials. Familiarity with drug development is highly desirable, but not necessary. This course was taught at JSM 2005-2012 and received the Excellence in Continuing Education Award in 2005.

**CE_15C Successful Data Mining in Practice**

Cosponsor: *Section on Statistical Learning and Data Mining*

Instructor(s): *Richard De Veaux*

This one day course serves as a practical introduction to data mining. After an introduction to what data mining is, the types of problems it can solve and the challenges of data mining, we will use a sequence of case studies, mostly taken from my consulting experience, to illustrate the main methods and techniques used in data mining. Methods covered include decision trees, neural networks, naive Bayes, K-nearest neighbors, random forests, boosted trees and various visualization techniques. For each method we describe the mathematics behind it (without dwelling too much on technical details and all the optimization choices), and show how it is used in practice. We discuss how to choose methods for particular problems and how to evaluate the methods using cross validation. Unlike many courses in data mining, we spend a good deal of time talking about how to start a data mining project, the steps to follow and the issues in communicating results to others. We use R and JMP as software for the course (with a few examples in Weka).

**CE_16C Monte Carlo and Bayesian Computation with R**

Cosponsor: *Section on Bayesian Statistical Science*

Instructor(s): *Jim Albertand Maria Rizzo*

This course describes the use of the statistical system R in Monte Carlo experiments, simulation-based inference, and Bayesian computation. R tools are described for generating random variables, computing criteria of statistical procedures, and replicating the procedure to compute quantities such as mean squared error and probability of coverage. R commands for implementing simulation-based procedures such as bootstrap and permutation tests are outlined. The use of R in Bayesian computation is described, including the programming of the posterior distribution and the use of different R tools to summarize the posterior. Special focus will be on the application of Markov chain Monte Carlo algorithms and diagnostic methods to assess convergence of the algorithms. It is assumed that the participant will be familiar with the basics of the R system.

**CE_17C Practical Tools for Designing and Weighting Survey Samples**

Cosponsor: *Survey Research Methods Section*

Instructor(s): *Richard Valliant, Frauke Kreuter, and Jill Dever*

A familiar complaint that students have on finishing a class in applied sampling or in sampling theory is: "I still don't really understand how to design a sample." Students learn a lot of isolated tools or techniques but do not have the ability to put them all together to design a sample from start to finish. One of the main goals of this short course and the associated textbook is to give students, new survey statisticians, and other survey practitioners a taste of what it is involved in designing and weighting single- and multi-stage samples in the real world. This includes devising a sampling plan from sometimes incomplete information; deciding on a sample size given a specified budget; allocating the sample to the strata and stages of the design given a set of constraints; and constructing efficient analysis weights. Our goal will be accomplished during the course through discussions of actual case studies and hands-on exercises involving the use of computer software.

**CE_18C (HALF-DAY COURSE) Meta-Analysis: Combining the Results of Multiple Studies**

Cosponsor: *Health Policy Statistics Section*

Instructor(s): *Christopher Schmidand Ingram Olkin*

Meta-analysis enables researchers to synthesize the results of multiple studies designed to determine the effect of a treatment, device or test. The information explosion as well as the movement toward the requirement of evidence to support policy decisions has promoted the use of meta-analysis in all scientific disciplines. Statisticians play a major role in meta-analysis because analyzing data with few studies and many variables is difficult. In this workshop, we introduce the major principles and techniques of statistical analysis of meta-analytic data. Examples of published meta-analyses in medicine and the social sciences will be used to illustrate the various methods.

**CE_19C (HALF-DAY COURSE) Practical Software Engineering for Statisticians**

Cosponsors: * Section on Statistical Computing Section and the Biometrics Section*

Instructor(s): *Murray Stokely*

Statisticians are increasingly being employed alongside software engineers to make sense of the large amounts of data collected in modern e-commerce, internet, retail, and advertising companies. This course introduces a number of best practices in writing statistical software that is taught to computer scientists, but which is seldom part of a statistics degree. Revision control tools, unit testing, code modularity, structure, and readability, and the basics of computer architecture and performance will be covered. A few examples of real R code written in a commercial environment will be shared and discussed to illustrate some of the problems of moving from working alone or in a small group in an academic setting into a team in a large commercial setting. Some basic familiarity with programming is required. The course is language-agnostic, but R will be used in some examples.

**CE_20C (HALF-DAY COURSE) Personalized Medicine and Dynamic Treatment Regimes**

Cosponsor: *Biometrics Section*

Instructor(s): *Michael Kosorok and Eric Laber*

Dynamic treatment regimes operationalize clinical decision making through a sequence of individualized treatment rules. Each treatment rule corresponds to a milestone in the disease process and maps up-to-date patient-level information to a recommended treatment. The goal is to find a sequence of treatment rules that maximizes a cumulative clinical outcome, while potentially accounting for factors like cost, local availability, and patient individual preference. With their promise of delivering the right treatment to the right patients at the right time, dynamic treatment regimes are positioned to make a positive impact on the quality and affordability of patient care. Due to technical advances, patient-level data continues to increase in quality, complexity, volume, and accessibility. Statistics plays a key role in the construction of dynamic treatment regimes using observational and randomized clinical trial data. This workshop introduces basic concepts for estimation of dynamic treatment regimes from data. We provide a broad overview of evidence-based personalized medicine. We then discuss methods for estimating optimal treatment regimes for one or more decision points. We cover traditional regression-based methods, and more recent classification-based methods. We conclude with techniques for designing Phase II and Phase III clinical trials focused on discovery and verification of individualized treatment regimes.

**CE_21C Causal Inference and Its Application in Health Sciences**

Cosponsor: *Section on Statistics in Epidemiology*

Instructor(s): *Dylan Smal and Miguel Hernan*

Causal inference for clinical trials and observational studies has become crucial for better understanding effects and mechanisms of a treatment/program. Commonly used statistical methods provide measures of association that may lack a causal interpretation even when the investigator 'adjusts for' all potential confounders in the analysis. To eliminate the discordance between the causal goals and the associational methods, it is necessary to formally define causal concepts, identify the conditions required to estimate causal effects, and use analytical methods that, under those conditions, provide estimates that can be endowed with a causal interpretation. The first half of this short course presents a framework for causal inference from observational studies and recent methodological developments, with a special emphasis on complex longitudinal data. The second half of this short course focuses on instrumental variable methods for causal inference in clinical trials and observational studies. Instrumental variable methods control for unmeasured confounding, which is a central concern in many observational studies as well as in per protocol analyses of clinical trials.

**CE_22C Introduction to Statistical Learning**

Cosponsor: *Section on Statistical Learning and Data Mining*

Instructor(s): *Daniela Witten*

This one-day seminar will be a practical introduction to and an overview of statistical learning methods. As computing power and the scope of data being collected across many fields has increased dramatically in the past 20 years, many new "statistical machine learning" methods have been developed. This course will provide an applied introduction to a number of statistical learning techniques, including cross-validation, the lasso, generalized additive models, decision trees, and clustering, as well as more classical approaches such as linear discriminant analysis, quadratic discriminant analysis, nearest neighbors, and ridge regression. Applications to finance, genomics, and other areas will be presented. Participants should be familiar with linear regression at the level of the textbook Applied Linear Regression by Sanford Weisberg.

**CE_23C Analysis of Interval-censored Survival Data**

Cosponsor: *Biometrics Section*

Instructor(s): *Philip Hougaard*

Interval-censored survival data occur when the time to an event is assessed by means of blood samples, X-ray or other screening methods that cannot tell the exact event time, but only whether the event has happened since the previous examination. Interval censoring methods are needed for considering onset and diagnosis as distinct quantities, like when considering screening in order to diagnose a disease earlier. Even though such data are common and non-parametric methods were suggested more than 25 years ago, they are still not in standard use. One reason for the low use is that analysis is technically more complicated than standard survival methods. The course will give an introduction to such data, including a discussion of the issues. Parametric, nonparametric and proportional hazards models will be covered. Statistical theory will be balanced with applications emphasizing differences to standard right-censored survival data. Applications are based on literature examples as well as personal experience regarding development of microalbuminuria among Type 2 diabetic patients. Extension to more complex data (like truncated data and disease data in the presence of mortality) will also be considered, but with fewer details. Prerequisite: Standard methods for right-censored survival data.

**CE_24C Applied Bayesian Nonparametric Mixture Modeling**

Cosponsor: *Section on Bayesian Statistical Science*

Instructor(s): *Athanasios Kottas and Abel Rodriguez*

Bayesian methods are central to the application of modern statistical modeling in a wide variety of fields. Bayesian nonparametric and semiparametric methods are receiving increased attention in the literature as they considerably expand the flexibility of Bayesian models. This one-day course will provide an introduction to Bayesian nonparametric methods, with emphasis on modeling approaches employing nonparametric mixtures and with a focus on applications. The course will start by motivating Bayesian nonparametric modeling and providing an overview of nonparametric prior models for spaces of random functions. The main focus will be on models based on the Dirichlet process, a nonparametric prior for distribution functions. Particular emphasis will be placed on Dirichlet process mixtures, which provide a flexible framework for nonparametric modeling. We will discuss methodological details, computational techniques for posterior inference, and applications of Dirichlet process mixture models. Examples will be drawn from fields such as density estimation, nonparametric regression, hierarchical generalized linear models, survival analysis, and spatial statistics. The course targets students or professionals with background in Bayesian modeling and inference. Sufficient preparation will include statistics training to the M.S. level and some exposure to Bayesian hierarchical modeling and computation.

**CE_25C Statistical Methods for Neuroimaging Data Analysis**

Cosponsor: *Biometrics Section*

Instructor(s): *Hongtu Zhu, Haipeng Shen, and Linglong Kong*

With modern imaging techniques, massive imaging data can be observed over both time and space, for example, magnetic resonance imaging (MRI), functional magnetic resonance imaging (fMRI), electroencephalography (EEG), magnetoencephalography (EMG), and diffusion tensor imaging (DTI), among many others. The subject of medical imaging analysis has exploded from simple algebraic operations on imaging data to advanced statistical and mathematical methods on imaging data. This short course aims to provide a practical introduction to and an overview of recent advanced statistical development to analyze and model medical image data quantitatively. The course material is applicable to a wide variety of medical and biological imaging problems. The topics cover tract-based analysis, multi-scaled statistical methods, fMRI processing methods, diffusion imaging methods, brain image and genetics. While presenting the statistical and mathematical fundamentals, we emphasize the concepts, methods and their real-world implementation. Participants will learn basics that will help them to understand the methods and basic tools built into packages like SPM, FSL, Slicers, and others in order to optimally use them.

**CE_26C (HALF-DAY COURSE) Statistical Methods in Genetic Association Studies**

Cosponsor: *Biometrics Section*

Instructor(s): *Danyu Lin*

Association studies have become the primary tool for the genetic dissection of complex human diseases, and genetic association has played an increasingly important role in biomedical research. This course provides an overview of the statistical methods that have recently been developed for the designs and analysis of genetic association studies. Specific topics include genome-wide association studies, case-control sampling and retrospective likelihood, secondary phenotypes in case-control studies, haplotypes and untyped SNPs, population stratification, meta-analysis, multiple testing, next-generation sequencing studies, rare variants, trait-dependent sampling, variable selection, and risk prediction.

**CE_27T Creating Statistical Graphics in SAS®**

Instructor(s): *Warren Kuhfeld*

Effective graphics are indispensable in modern statistical analysis. SAS provides statistical graphics through ODS Graphics, functionality used by statistical procedures to create statistical graphics as automatically as they create tables. ODS Graphics is also used by a family of procedures designed for graphical exploration of data. This tutorial is intended for statistical users and covers the use of ODS Graphics from start to finish in statistical analysis. You will learn how to: request graphs created by statistical procedures; use the SGPLOT, SGPANEL, SGSCATTER, and SGRENDER procedures in to create customized graphs; access and manage your graphs for inclusion in web pages, papers, and presentations; modify graph styles (colors, fonts, and general appearance); make immediate changes to your graphs using a point-and-click editor; and make permanent changes to your graphs with template changes.

**CE_28T Efficient Trial Design with the New East® Architect**

Instructor(s): *Cyrus Mehta and Charles Liu*

This workshop introduces East® Architect, the entirely re-designed update of our industry standard clinical trial design tool. With East on Cytel's Architect software platform, you can design, monitor and simulate single look, group sequential and adaptive trials with one arm, two or multiple arms, for binary, continuous or time-to-event endpoints.

At the trial design stage, it is important to investigate many different combinations of design parameters - including power, spending functions, timing of interim analyses, and enrollment rates - to discover the most efficient combination for the trial under consideration. We'll tour the new interface, which enables previews of multiple scenarios simultaneously, allowing you to optimize design parameters. You'll experience the advantages of importing external source data sets for interim and final analyses, and forcomputing conditional power.
We will introduce and review many new features, including:

- multiple scenarios stored in workbook files for easy retrieval

- multiple scenarios compared graphically for easy communication

- handling trials with delayed response and drop-outs

- extending available designs through external calls to R functions

- adjusting design for stratification factors

- computation of Bayesian power or probability of success (assurance)

While open to all, East users especially will value this CTW, where they will get an excellent orientation in a short amount of time. It will also give them an opportunity to express their views and influence further developments on the new platform.

**CE_29T Introduction to Data Mining with CART Classification and Regression Trees**

Instructor(s): *Mikhail Golovnya*

This Tutorial is intended for the applied statistician wanting to understand and apply the CART methodology for tree-structured nonparametric data analysis. The emphasis will be on practical data analysis and data mining involving classification and regression. All concepts will be illustrated using real-world examples. The course will begin with an intuitive introduction to tree-structured analysis: what it is, why it works, why it is non-parametric and model-free, advantages in handling of all types of data including missing values and categorical. Working through examples, we will review how to read the CART Tree output and how to set up basic analyses. This session will include performance evaluation of CART trees and will cover ways to search for possible improvements of the results. Once a basic working knowledge of CART has been mastered, the tutorial will focus on critical details essential for advanced CART applications including: choice of splitting criteria, choosing the best split, using prior probabilities to shape results, refining results with differential misclassification costs, the meaning of cross validation, tree growing and tree pruning. The course will conclude with some discussion of the comparative performance of CART versus other computer-intensive methods such as artificial neural networks and statistician-generated parametric models.

**CE_30T Model Selection with SAS/STAT® Software**

Instructor(s): *Funda Gunes*

When you are faced with a predictive modeling problem that has many possible predictor effects - dozens, hundreds or even thousands - a natural question is, "What subset of the effects provides the best model for the data?" This workshop explains how you can address this question with model selection methods in SAS/STAT software. The workshop also explores the practical pitfalls of model selection, including issues which have led experts to criticize the validity of some methods. The workshop focuses on the GLMSELECT procedure and shows how it can be used to mitigate the intrinsic difficulties of model selection.
You will learn how to use the following approaches:

- model selection diagnostics, including graphics, for detecting problems

- use of validation data to detect and prevent under-fitting and over-fitting

- modern penalty-based methods, including LASSO and adaptive LASSO, as alternatives to traditional methods such as stepwise selection

- bootstrap-based model averaging to reduce selection bias and improve predictive performance

This workshop requires an understanding of basic regression techniques.

**CE_31T Compass 2.0: Software for the Design & Execution of Dose-Finding Trials**

Instructor(s): *Jim Bolognese and Charles Liu*

The high failure rate of phase 3 clinical trials is often attributed to selection of an inappropriate dose level fromearly stages. The consequences of getting the dose right, or not, significantly impacts the entire development program. Designing phase 2 dose finding trial poses a significant statistical challenge: to efficiently find a range of dose levels that are neither too low and ineffective, nor too high and unsafe. Compared with traditional approaches, adaptive designs can more effectively identifytarget doses with substantial savings in sample size. Compass® extends Cytel's extensive experience in statistical software development, including popular East® and StatXact®. Compass combine an intuitive user interface - connected to a powerful computational engine - for a natural workflow to efficiently design, simulate and deploy the optimal dose selection study. Workshop attendees will learn using new Compass 2.0 to compare, assess, and implement an adaptive Phase 2 design. Features examined include: (1) a randomization dashboard to perform randomization, conduct interim data analysis and update allocation ratios; (2) two-stage (dose dropping) designs; (3) the Gibbs sampling algorithm and Monte Carlo diagnostics; and (4) stopping rules for futility based on conditional power.

**CE_32T Data Mining with TreeNet (Stochastic Gradient Boosting) and Random Forests: including the latest refinements and model compression techniques (ISLE Importance Sampled Learning Ensembles and RuleLearner)**

Instructor(s): *Mikhail Golovnya*

This workshop discusses key algorithmic details of Breiman's RF and Friedman's TreeNet, and important extensions to bagging/boosting technology. RF and TreeNet are new advances to classification and regression trees, enabling the modeler to construct predictive models of extraordinary accuracy. RF is a tree-based procedure making use of bootstrapping and random feature selection. In TreeNet, classification and regression models are built gradually through a large collection of small trees, each of which improves on its predecessors through an error-correcting strategy. Recent developments include model compression techniques ISLE and RuleLearner, and gradient-boosting/RF crossovers (gradient boosting incorporating core RF ideas). ISLE is a model compression technology to simplify and speed up complex tree-based ensembles. RuleLearner reinterprets TreeNet and/or RF tree ensembles, extracting individual segments described by interesting rules. The rules can be combined to yield compressed models, often more accurate than the original ensembles. RuleLearner supports individual-specific and group-specific variable importance rankings and offers dependency plots for model interpretation. We will show how the software is used to solve real-world problems, cover theory, discuss what is novel, illustrate how to select an ideal balance between model complexity and predictive accuracy, and show where the software fits in terms of other data mining software.

**CE_33T Structural Equation Modeling Using the CALIS Procedure in SAS/STAT® Software**

Instructor(s): *Yiu-Fai Yung*

The CALIS procedure in SAS/STAT software is a general structural equation modeling (SEM) tool. This workshop introduces the general methodology of SEM and applications of PROC CALIS. Background topics such as path analysis, confirmatory factor analysis, measurement error models, and linear structural relations (LISREL) are reviewed. Applications are demonstrated with examples in social, educational, behavioral, and marketing research. More advanced SEM techniques such as the analysis of total and indirect effects and full information maximum likelihood (FIML) method for treating incomplete observations are also covered.

This workshop is designed for statisticians and data analysts who want an overview of SEM applications using the CALIS procedure in SAS/STAT 9.22 and later releases. Attendees should have a basic understanding of regression analysis and experience using the SAS language. Previous exposure to SEM is useful but not required. Attendees will learn how to use PROC CALIS for (1) specifying structural equation models with latent variables, (2) interpreting model fit statistics and estimation results, (3) computing and testing total and indirect effects (4) using the FIML method for treating incomplete observations.

The CALIS procedure in SAS/STAT software is a general structural equation modeling (SEM) tool. This workshop introduces the general methodology of SEM and applications of PROC CALIS. Background topics such as path analysis, confirmatory factor analysis, measurement error models, and linear structural relations (LISREL) are reviewed. Applications are demonstrated with examples in social, educational, behavioral, and marketing research. More advanced SEM techniques such as the analysis of total and indirect effects and full information maximum likelihood (FIML) method for treating incomplete observations are also covered.

This workshop is designed for statisticians and data analysts who want an overview of SEM applications using the CALIS procedure in SAS/STAT 9.22 and later releases. Attendees should have a basic understanding of regression analysis and experience using the SAS language. Previous exposure to SEM is useful but not required. Attendees will learn how to use PROC CALIS for (1) specifying structural equation models with latent variables, (2) interpreting model fit statistics and estimation results, (3) computing and testing total and indirect effects (4) using the FIML method for treating incomplete observations.

The CALIS procedure in SAS/STAT software is a general structural equation modeling (SEM) tool. This workshop introduces the general methodology of SEM and applications of PROC CALIS. Background topics such as path analysis, confirmatory factor analysis, measurement error models, and linear structural relations (LISREL) are reviewed. Applications are demonstrated with examples in social, educational, behavioral, and marketing research. More advanced SEM techniques such as the analysis of total and indirect effects and full information maximum likelihood (FIML) method for treating incomplete observations are also covered.

This workshop is designed for statisticians and data analysts who want an overview of SEM applications using the CALIS procedure in SAS/STAT 9.22 and later releases. Attendees should have a basic understanding of regression analysis and experience using the SAS language. Previous exposure to SEM is useful but not required. Attendees will learn how to use PROC CALIS for (1) specifying structural equation models with latent variables, (2) interpreting model fit statistics and estimation results, (3) computing and testing total and indirect effects (4) using the FIML method for treating incomplete observations.

**CE_34T Overview of New Features in StatXact® 10 and LogXact® 10**

Instructor(s): *Nitin R. Pateland Pralay Senchauduri*

Small and sparse samples of correlated categorical data arise frequently in applied research often with clustered observations. StatXact 10 offers several new statistical tests for analyzing such data. We'll discuss exact correlated-data analogs for conventional tests including Fisher's exact test, the trend test, Wilcoxon and Kruskal-Wallis tests for doubly-ordered tables. Using practical examples, appropriate use of these tests in StatXact 10 - and in a new version of StatXact PROC for SAS users - will be demonstrated.

Research by Firth, Heinze, and others has led to methods for bias corrected estimates in logistic regression. The methods also provide penalized maximum likelihood estimates of parameters in the case of complete separation where maximum likelihood estimates do not exist. Algorithms for these methods have been implemented in LogXact 10. We will discuss these methods and their us in LogXact 10 and new LogXact PROC for SAS users.

We will also briefly discussBayesian estimation in Logistic Regression, a topic we are considering for future versions of LogXact.

**CE_35T Introduction to Modern Regression Analysis Techniques: linear, logistic, nonlinear, regularized, GPS (Generalized Path Seeker), LARS, LASSO, Elastic Net, and MARS (Multivariate Adaptive Regression Splines)**

Instructor(s): *Mikhail Golovnya*

Using real-world datasets we will demonstrate Stanford Professor Jerome Friedman's advances in nonlinear, regularized-linear and logistic regression. This workshop will introduce the main concepts behind Friedman's GPS and MARS, modern regression tools that can help analysts quickly develop superior predictive models. GPS includes classic techniques such as ridge and lasso regression, and also adds the new sub-lasso model, as well as intermediate modeling strategies. GPS gives ultra-fast modeling with massive numbers of predictors, powerful predictor selection and coefficient shrinkage. Clear tradeoff diagrams between model complexity and predictive accuracy allow modelers to select an ideal balance. Linear regression models, including GPS, fit straight lines to data. Although this usually oversimplifies the data structure, the approximation is often good enough for practical purposes. However, in the frequent situations in which a straight line is inappropriate, an expert modeler must search tediously for transformations to find the right curve. MARS is a nonlinear automated regression tool that automatically discovers complex patterns in the data. It automates the model specification search, including variable selection, variable transformation, interaction detection, missing value handling, and model validation. MARS approaches model construction more flexibly, allowing for bends, thresholds, and other departures from straight lines from the beginning.

**CE_36T SAS® Procedures for Analyzing Survey Data**

Instructor(s): *Pushapal Mukhopadhyay*

The analysis of probability-based sample surveys requires specialized techniques that account for survey design elements, such as strata, clusters, and unequal weights. This workshop provides an overview of the basic functionality of SAS/STAT® procedures that have been developed specifically to select and analyze probability samples for survey data. You will learn how to select probability samples with the SURVEYSELECT procedure, how to produce descriptive statistics with the SURVEYMEANS and SURVEYFREQ procedures, and how to build statistical models with the SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures. The workshop discusses different variance estimation techniques, including both Taylor series and replication methods. It also covers topics such as domain (subpopulation) estimation and poststratification. The workshop is intended for a broad audience of statisticians who are interested in analyzing sample survey data. Familiarity with basic statistics, including regression analysis, is strongly recommended.

**CE_37T Using the Bootstrap Feature in JMP**

Instructor(s): *Clayton Barker*

The Bootstrap is a powerful resampling technique for measuring the accuracy of statistical estimates and making inferences. This technique is especially useful when the theoretical distribution of the statistic is unknown or when we cannot trust an underlying assumption of our model (e.g. normality of residuals or sufficient sample size). Recent advances in computing power have made the Bootstrap (and other resampling methods) more popular and more broadly applicable. The release of JMP 10 introduced a "one-click Bootstrap" feature that allows the user to take advantage of the nonparametric Bootstrap in a wide variety of settings with little additional effort. This workshop will provide a brief introduction to the Bootstrap and an overview of how to take advantage of the Bootstrap features in JMP. We will also discuss how to use the JMP Scripting Language (JSL) to implement related resampling methods such as Bagging and the parametric Bootstrap. The workshop is intended for a broad audience that has an understanding of regression and basic statistics. Participants will be provided with a free trial version of JMP and files for reproducing the analyses discussed in the workshop.

**CE_38T Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Datasets**

Instructor(s): *Dan Steinberg*

Attendees will be provided with real-world data mining examples drawn from online advertising and the financial services industries. Data mining is a powerful extension to classical statistical analysis. As opposed to classical techniques, data mining easily finds patterns in data, nonlinear relationships, key predictors and variable interactions that are difficult if not impossible to detect using standard approaches. This tutorial follows a step-step approach to introduce advanced automation technology, including CART, MARS, TreeNet Gradient Boosting, Random Forests, and the latest multi-tree boosting and bagging methodologies by the original creators of CART (Breiman, Friedman, Olshen and Stone).

All attendees will receive 6 months access to fully functional versions of the SPM Salford Predictive Modeler software suite. Workshop materials will allow the attendee to reproduce the examples.

#### Monday Roundtables and Speaker Luncheons

**ML10 Impact of Missing Data on the Approvability of Potentially Efficacious Therapies**

Sponsor: *Biopharmaceutical Section*

*Abdul Sankoh, Vertex Pharmaceuticals*

There are a number of operationally preventive measures and some statistical approaches with seemingly remedial properties in the clinical and statistical literature that, if appropriately implemented during the design, conduct, and analysis of clinical trial data, should mitigate the occurrence and thus the detrimental impact of missing data on trial outcome and subsequent approval of potentially safe and efficacious new drugs. Nonetheless, many drugs with potential therapeutic benefit have failed to gain approval due to serious missing data issues. This roundtable discussion will probe into the challenges posed by missing data in clinical trials and the utility of current approaches in minimizing the non-approval chances of potentially efficacious new therapies.

Fee for this session includes plated lunch.

**ML11 From Accelerometers to Androids: Design and Analytic Issues in Mobile Phone--Based Health Studies**

Sponsor: *Health Policy Statistics Section*

*Warren Comulada, UCLA Center for Community Health*

Using mobile phones for self-monitoring of health behaviors and intervention delivery is a cornerstone of future population health. While advances in mobile technology (e.g., in the ability to detect motion and location) and the proliferation of mobile phone-based research are moving us toward that goal, many challenges remain. Statisticians are often at the forefront of these challenges in operationalizing health-related measures for mobile assessment, protecting sensitive electronic data, crafting IRB-friendly data protocols, and analyzing mobile data that often consist of many data points due to daily assessment across numerous measures. In addition, there is often interest in examining agreement between mobile data and retrospective recall or biomarker data; longitudinal analyses are complicated further. In this roundtable, I will discuss my experiences in dealing with these issues across several mobile studies, including a pilot that assessed mood, diet, and exercise in young mothers and a pilot that assessed sexual behavior and drug use in HIV-positive individuals. I look forward to discussing these issues with other researchers interested in mobile technologies.

Fee for this session includes plated lunch.

**ML13 Using Statistical Engineering to Attack Large, Complex, Unstructured Problems**

Sponsor: *Quality and Productivity Section*

*Roger W. Hoerl, GE Global Research*

This discussion will focus on enhancing the ability of statisticians to use statistical engineering to solve large, complex, unstructured problems encountered in business, industry, and government. Such ability can enable statisticians to move beyond the traditional paradigm of passive consulting and exhibit true technical leadership to their organizations. Statistical engineering has been defined as "The study of how to best utilize statistical concepts, methods, and tools and integrate them with information technology and other relevant sciences to generate improved results." By developing the discipline of how to integrate multiple statistical tools in innovative ways to solve complex problems, statistical engineering complements and enhances statistical science---the in-depth study of individual statistical methods. While routine statistical consulting is often done in the lowest-cost countries today, or the basis of open online competitions through websites such as kaggle.com, statistical engineering provides a framework to attack large, complex, unsolved problems, which appear to be a key aspect of the future of the statistics profession.

Fee for this session includes plated lunch.

Copyright © American Statistical Association.