Online Program

English-Spanish language equivalence on a new health literacy measure: implementation of novel psychometric methods

Michael A Kallen, Northwestern University Feinberg School of Medicine 
Karon F Cook, Northwestern University Feinberg School of Medicine 
*Elizabeth A Hahn, Northwestern University Feinberg School of Medicine 

Keywords: psychometrics, differential item functioning, health literacy

Introduction: Low health literacy is associated with health-related disparities (e.g., reduced access to health information, poorer understanding of illness and treatment, poorer health status, less understanding and use of preventive services, and increased hospitalizations). However, most available health literacy measures are not optimal for use in clinical practice or research because of their assessment burden, scoring imprecision, and inadequate English and Spanish language version equivalence. A new health literacy measurement system was recently developed, using novel health information technology and modern psychometric principles: Health Literacy Assessment Using Talking Touchscreen Technology (Health LiTT; Hahn et al., 2011). The purpose of this study was to determine whether English and Spanish language versions of Health LiTT could share a common set of item response theory (IRT)-based item calibrations for measure scoring, or if language-specific item calibrations would be required.

Methods: Two 14-item short-form versions of Health LiTT were administered as part of a research study involving adult patients receiving care for type 2 diabetes (T2D) at a safety net institution in Chicago. Study participants used a multimedia touchscreen kiosk in the waiting room of the general medicine clinic to complete a series of instruments, including English or Spanish versions of Health LiTT; responses to Health LiTT items were scored as correct/incorrect. Part One of the DIF (differential item functioning) impact analysis was to identify whether Health LiTT items displayed DIF by language (English vs. Spanish). A novel hybrid logistic ordinal regression (LOR)/IRT approach to DIF detection was implemented, which included use of an IRT-derived ability score for LOR modeling rather than use of the traditionally modeled summed-score ability term. Part Two of the DIF impact analysis involved conducting unadjusted (“initial”) vs. DIF-adjusted (“purified”) score difference analyses to obtain impact evidence using 1) Pearson correlations (initial vs. purified theta scores), 2) two standard error (SE) assessments ((a) # and % of individual difference scores (initial theta minus purified theta > initial theta median SE, and (b) # and % of individual difference scores > initial individual score SEs), and 3) a comparison of Cohen’s D language factor effect sizes across competing analyses of variance (ANOVA) (i.e., initial theta scores by language factor vs. purified language score by language factor). The R package “lordif” and the statistical program SPSS were used for conducting analyses.

Results: A total of 295 patients were enrolled in the study (n=146 English- and n=149 Spanish-speaking). Mean age was 55 years, a majority (79%) was diagnosed with T2D more than two years ago, and all were being treated with oral medication or insulin. English participants were primarily non-Hispanic Black (65%), whereas all Spanish participants were Hispanic. Spanish participants had slightly more women compared to English (61% vs. 46%) and lower education (75% vs. 24% less than high school). A small number of participants (7) did not complete Health LiTT and were excluded from analyses. Part One: For standard DIF detection a McFadden pseudo-R2 change criterion of 0.01 was used; this criterion was lowered to 0.005 for sensitivity DIF detection. Using the standard DIF criterion, four Health LiTT items were flagged for DIF; using the sensitivity DIF criterion an additional two items were flagged for DIF, for a total of six flagged items. Part Two: To incorporate the greater amount of potential DIF impact on scores, score impact analyses used results from the sensitivity DIF detection analysis (i.e., “purified” theta scores were adjusted for six identified DIF items rather than only four). The Pearson correlation between initial vs. purified theta scores was 0.995. For individual difference scores (initial minus purified theta), 0 cases (0%) exceeded initial theta’s median SE (0.44), and 0 cases (0%) exceeded their own initial theta SE. Cohen’s D effect size for the initial theta by language ANOVA was 0.49; for the purified theta by language ANOVA, Cohen’s D was 0.55. Both ANOVAs indicated essentially “medium” language-related effect sizes.

Conclusions: Although both standard and sensitivity criterion DIF detection analyses identified DIF items, the impact of DIF items on Health LiTT scores appeared to be trivial. Health LiTT provides a new measurement strategy to estimate the size of the population at risk from low health literacy, identify vulnerable patients in clinical settings, and provide reliable and valid scores for use in testing interventions. English-Spanish language equivalence will permit researchers to determine the independent effects of limited English proficiency and limited literacy.