Protecting Biological and Health Data: Special Issues and Applications
This page talks about protection methods for biological and health data, which often are protected under
the Health Insurance Portability and
Accountability Act. These data typically contain demographic and other potentially identifying information, and health
variables that are sensitive. Most of the typical alteration strategies can be applied
on demographic/other data; see the
web page on data protection methods for explanation of the methods. Below are links to illustrative applications of confidentiality
protections on biological and health data. This list is by no means exhaustive, but it does illustrate the techniques typically used
to protect these data.
Aggregation and top-coding in the Health and Retirement Study (HRS)
The HRS uses aggregation of categories (e.g., geographies, occupations), rounding and top-coding (monetary data),
and suppression of variables related to the survey design. These actions result in a restricted access
data file, which researchers can access after applying and signing promises to maintain data confidentiality.
Noise addition and synthetic data in the National Health Interview Survey Linked Mortality Files
For each person deemed at risk of identification, the Center for Disease Control staff either add noise to the date of death or
generate a synthetic value of the underlying cause of death (after aggregated death codes). They also
The results from the perturbed and original data are compared in a 2008 paper in the
Journal of Epidemiology (volume 168, pages 336-344).
Data swapping and microaggregation in the Substance Abuse and Mental Health Data Archive (SAMHDA)
The Inter-university Consortium for Political and Social Research (ICPSR) archives and safeguards
many datasets, including the SAMHDA. The ICPSR uses data swapping and microaggregation to protect
records in these data.
The Personal Genome Project (PGP)
Genetic data are extremely difficult to protect without substantial sacrifice in data usefulness.
Researchers at the PGP at Harvard University have taken a different approach: ask individuals to
consent to make their genetic data available to the public without modification. Although the
data are not protected, we include this link as an alternative approach to data access.