Online Program

Multiple Imputation for Measurement Error Correction in Administrative Health Databases: Effect of the Misclassification Mechanism

*Lisa Lix, Department of Community Health Sciences, Faculty of Medicine, University of Manitoba 
Yao Xue, University of Manitoba 

Keywords: diagnostic accuracy, multiple imputation, measurement error, chronic disease

Background: Administrative health data are a key source of population-based information about chronic disease prevalence and health outcomes. However, misclassification of disease diagnoses in administrative health data can result in biased estimates of prevalence and attenuate the associations between disease status and health outcomes.

Objectives: The objectives of this research were: (1) to evaluate a method that uses a logistic regression predictive model, constructed in a validation dataset, with a multiple imputation (MI) procedure to correct for misclassification in a binary chronic disease variable; (2) to investigate the effect of the mechanism of disease misclassification on bias and precision of imputed disease status.

Methods: A population with N =10,000 individuals and disease prevalence of 5% was simulated to investigate the effects of the misclassification mechanism (e.g., subject-specific process versus independent process) and the following data and model characteristics on relative bias, sensitivity, specificity, and root mean squared error of prevalence estimates from imputed disease status: sensitivity and specificity of observed disease status, validation dataset size, sampling mechanism for the validation dataset (i.e., random, not random), number of imputations, and magnitude of measurement error in predictive model covariates.

Results: Results for the independent process indicated that bias in prevalence estimates was, on average, less than 10% if the predictive model was constructed from a random sample of the population. The magnitude of negative bias in the estimates increased dramatically if the sampling mechanism was not random, particularly if the model covariates contained measurement error. Increasing the number of imputations had a larger effect on measures of precision when the predictive model was constructed from a 5% sample than a 20% sample. Simulation results for the subject-specific misclassification mechanism are currently being collected.

Conclusion: A predictive model with a MI method can result in accurate and precise chronic disease estimates from misclassified diagnoses in administrative data, if a validation dataset is available to the researcher. Careful consideration must be given to the model and data characteristics before adopting this approach.