Online Program
Statistical Analysis of ZeroInflated Continuous Data
*Lei Liu, Northwestern University Keywords: health economics, substance abuse, medical expenditures, generalized linear mixed model Zeroinflated continuous (or semicontinuous) data arise frequently in medical, economical, and ecological studies. Examples include, though certainly aren't limited to, medical costs, medical care usage, substance abuse, coronary artery calcium score, and daily precipitation levels. Such data are often characterized by the presence of a large portion of zero values, in addition to continuous non zero (i.e., positive) values that are often skewed to the right and heteroscedastic. Both features suggest that no simple parametric distribution is suitable for describing such “zeroinflated continuous” data. In this short course we will review statistical methods to analyze such type of data. We will start from the crosssectional zeroinflated continuous data. Three approaches are presented to account for the point mass at zero: a twopart model which separately describes the probability of outcome being positive and the amount of positive values; a sample selection approach (e.g., Tobit model) where zero values are considered as “censored” observations; and a zeroinflated Tobit model which accommodates the characteristics of both the sample selection and the twopart approaches. We will then introduce flexible models to characterize right skewness and heteroscedasticity in the positive values, using, e.g., log normal, Gamma, generalized Gamma, log skew normal, Box Cox transformation, and nonparametric methods. The second section involves modeling repeated measures zeroinflated continuous data. Random effects will be used to tackle the correlation on repeated measures of the same subject and that across different parts of the model. We will incorporate such random effects to the models introduced in Section 1. We will also present joint models of longitudinal zeroinflated continuous data and survival, e.g., in the longitudinal medical cost setting, to account for the possible dependent terminal event or informative dropout. Finally, we will present applications to real datasets to illustrate our methods. We will use longitudinal medical costs, clustered medical costs, and alcohol drinking data as examples. SAS codes will be provided to facilitate the applications of these methods. Model comparison will also be conducted. The lecturer has 8 years of handson experience in the analysis of zeroinflated continuous data, especially the medical costs and alcohol drinking data. He is PI of three grants funded by NIH and AHRQ on this topic. This application oriented short course is of interest to researchers who would apply uptodate statistical tools to zeroinflated continuous data.

Important Dates & Deadlines
 October 9  11, 2013
ICHPS 2013