Online Program

A Comparison of Statistical Models for Analyzing Episodes-of-Care Costs for Chronic Obstructive Pulmonary Disease Exacerbations

*John Paul Kuwornu, Department of Community Health Sciences, Faculty of Medicine, University of Manitoba 
Lisa Lix, Department of Community Health Sciences, Faculty of Medicine, University of Manitoba 
Meric Osman, Health Quality Council  
Jacqueline Quail, Health Quality Council 
Gary Teare, Health Quality Council  
Eric Wang, Health Quality Council 

Keywords: chronic diseases, chronic obstructive pulmonary disease, generalized linear models, robust regression, statistical models

Objective: Accurate predictive models of costs for episodes of healthcare utilization associated with acute and chronic conditions can be used to develop non-fee-for-service provider remuneration systems. This study examined the performance of four predictive models for costs associated with episodes of care for chronic obstructive pulmonary disease (COPD) exacerbations. Approach: Administrative health data including hospital separations, physician billing claims, prescription drug records, and population registration files from Saskatchewan, Canada was used to identify a cohort (35+ years) with diagnosed COPD and define all episodes of healthcare utilization and costs for COPD exacerbations over a nine-year period (fiscal years 2001/02 to 2009/10). Using cross-validation, we examined the performance of four predictive models for episode of care costs for patients’ first episode during the follow-up period: ordinary least squares (OLS) regression on untransformed costs, OLS regression on log-transformed costs, robust regression and the generalized linear model (GLM) with a Poisson distribution. The main predictors included in all models were age, sex, income group, region of residence and Charlson comorbidity score. Results: A total of 17,480 individuals with a hospital-initiated episode (n = 7,910) and physician visit-initiated episode (n = 9,570) were identified; of which 51.7% were males and the average age was 71.3 (SD = 12.1). Half of the costs were below $595 CAD, while the 95th percentile was $ 13,934 CAD. Cross-validation results showed that none of the models consistently resulted in the best prediction of episode costs; the GLM Poisson model had the highest R2, but the OLS model on the actual cost and OLS model on log-transformed cost had the best prediction accuracy in terms of the root mean square error and mean absolute prediction error respectively. The Charlson comorbidy score was the only predictor statistically significant in all regression models. Conclusions: The study findings suggest that OLS regression can be used to predict costs associated with COPD episodes of care, even in the upper tails of the distribution, although accurate inferences will be achieved with GLM or robust regressions models.