Protecting Demographic/Other Data: Special Issues and Applications
This page talks about protection methods for demographic, educational, and other non-health and
non-tax data types. These data typically contain information that could be available to
the public; for example, geography, age, race, gender, marital status, and property taxes. If
these variables are released without alteration, it may be possible for malicious
data users to link names to these records by matching to external data sources. For example,
Sweeney (1997) showed that 97% of the records in a medical database for Cambridge,
MA, could be identified using only birth date and 9-digit ZIP code by
linking them to a publicly available voter registration list. However, many times these
data are samples rather than censuses. Sampling protects individuals because it is not certain
whether a targeted individual was collected in the data.
Aggregation of geography in UK Census Data
Most of the typical alteration strategies can be applied
on demographic/other data; see the
web page on data protection methods for explanation of the methods. Often agencies apply multiple methods
on the same dataset. Below are links to illustrative applications of confidentiality
protections on demographic/other data. This list is by no means exhaustive, but it does illustrate the techniques typically used
to protect these data.
Most statistical agencies and other data disseminators aggregate geography before public release. One example is
"output areas" for the UK census, which is conducted by the UK Office for National Statistics.
Top-coding in the American Communities Survey (ACS)
Top-coding is the most common approach to protecting income and other monetary data.
This link contains information on the top-codes used for the public use microdata samples for the ACS, which is collected by the Census Bureau.
Data swapping and the National Center for Education Statistics (NCES)
The NCES uses data swapping to create restricted access files available to users via licensing. This
JSM proceedings paper describes some research by Westat into swapping procedures used by NCES.
Failure to protect confidentiality in educational data
This document describes several examples in published educational data where data suppression does not protect confidentiality.
Synthetic data in On The Map (LBD)
The U.S. Bureau of the Census produces maps of where people live and work using partially synthetic data with other techniques like suppression and adding