Protecting Business and Tax Data: Special Issues and Applications
Business and tax data are among the most sensitive data collected by government agencies and researchers.
These data often contain highly skewed variables that can be at risk for disclosures.
For example, if given actual total payroll on manufacturers in the airline industry, it may be
relatively easy to identify Boeing; it is the record with the largest payroll. Furthermore, businesses
and individuals understandably want to guard the privacy of this information. For example, private companies
do not want their competition to know the amounts they spend on marketing, research and
development, payroll, etc., as this might compromise their business practice. And, individuals
may be reluctant for others to learn their salaries or total incomes.
Aggregation in the County Business Patterns (CBP)
If data collectors disseminated business and tax data in ways that resulted in harm to
businesses and individuals, data subjects might not be willing to provide their data. This would damage
government's ability to make economic policy and reduce researchers' opportunities to analyze economic data.
Thus, most business and tax data, if released at all (in fact, there are no public use business micrdata available in the U.S.), are altered before release.
Nearly all the typical alteration strategies are applied
on business and tax data; see the
web page on data protection methods for explanation of the methods.
Below are links to illustrative applications of confidentiality
protections on business and tax data. This list is by no means exhaustive, but it does illustrate the techniques typically used
to protect these data.
Business and tax microdata are frequently aggregated for public use. This link to the CBP,
released by the Census Bureau, illustrates how establishments' payroll and employee size are aggregated
to create public use tables.
Noise addition in the Commodity Flow Survey (CFS)
This paper illustrates how noise can be added to underlying economic microdata when the released data
are tabular. The CFS is released by the Census Bureau.
Noise addition in the Longitudinal Employer-Household Dynamics (LEHD) Program
This presentation provides an example of adding noise to establishment-level data. The LEHD program
is run by the Census Bureau.
in the Individual Tax Model Public Use File (ITMPUF)
This link is to a paper in the 2002 proceedings of the Joint Statistical Meetings that describes the microaggregation strategy used for the ITMPUF, which is released by the Statistics
of Income division of the Internal Revenue Service.
Synthetic data in the Survey of Consumer Finances
The Federal Reserve Board protects sensitive monetary values by replacing them with multiple imputations.
This is the first published instance of what is now known as partially synthetic data.
Synthetic data in the Longitudinal Business Database (LBD)
The U.S. Bureau of the Census is developing a partially synthetic public use data set for the LBD. This
JSM proceedings paper summarizes some of the initial development.