Prevention of Missing Data in Observational Studies – Study Design, Operational, and Analysis & Reporting Considerations
*Eric Gemmen, Quintiles  Zhaohui Su, Quintiles Outcome  Sherry Yan, FDA 

Keywords: observational studies, missing data

Missing data in observational studies can seriously undermine the ability to address study objectives. “Missingness” of data may be related to both exposure and outcomes, and thereby introduce bias and compromise the internal validity of the study. High rates of missing values in observational studies may require the imputation of missing values at the analysis stage; however, avoiding this problem in the first place is preferable to all. This talk presents minimum standards for preventing missing data as highlighted in the following areas: (1) Study Design: Important data elements—outcomes and both known and suspected confounders of outcomes—are critical to the data collection process; collecting only critical data elements in order to minimize site/responder burden; choosing data elements that are routinely collected in real-world practice, including patient reported outcomes collected directly from the patient; streamlined CRF design with clear reporting instructions (CRF completion guidelines); and reporting windows with a target time point but are wide enough to be jointly exhaustive across the observation period. (2) Operations: pilot testing of CRFs; protocol training (including CRF completion) for sites; planning for follow-up contact for participants with missing data and ongoing monitoring to prevent patient loss to follow-up; ensuring ongoing data quality review for data completeness and quality; for patient reported data, allowing for multiple modes of data collection (web-based ePRO, paper, IVRS, etc.) to maximize the convenience of reporting for the patient. (3) Analysis and Reporting: Understanding the nuances of the data source and the impact of study transitions over time, such as changes in data collection forms, data elements, and disease management and treatment; evaluating “missingness” mechanisms, e.g., missing at random; examining sensitivity to the assumptions about the missing data mechanism (i.e., sensitivity analysis); reporting the number of missing fields for key variables and how missing data were handled in the analysis. The above standards will be illustrated with examples, together with challenges identified when implementing them and the real-world tradeoffs that come with each.