Claire McKay Bowen Talks Data Privacy

What initially drew you to write about the tradeoff of data privacy and the use of data for the public good?

There are two parts to this question. Why write a book on data privacy and why on the topic for the public good?

To answer the first question, there are still few communication materials about data privacy, or methods of safely releasing confidential data publicly while preserving the privacy of those who are in the data.

Years ago, as a naïve first-year doctoral student, I excitedly started my literature review on data privacy with a focus on differential privacy, a mathematically rigorous definition for privacy. But my excitement quickly transformed into frustration. I found only a few technical papers (especially within statistics) and even fewer introductory materials. I remember scouring the internet for anything. To make matters worse, some of the blogs, articles, and academic papers I could find had incorrect definitions, unclear explanations, or improper applications of differential privacy. What I found most useful were statistics and computer science professors’ public PowerPoint slides or random white papers from federal agencies.

As differential privacy and other data privacy techniques gained in popularity, the privacy community created more technical papers, blogs, videos, and general privacy communication outreach. Fortunately, this increase in communication materials will result in fewer students stalking professors’ web pages and generate a more accurate understanding of data privacy. Unfortunately, most of these materials still struggle to explain differential privacy and other data privacy concepts to a nontechnical audience. This became my initial motivation to write the book. We need more accessible written materials.

Why the focus on the public good? Although this version of data privacy may not seem as exciting as others, such as cybersecurity, it affects every person’s life through major public policy decisions in the United States. When I wrote the book, the world was (and still is) experiencing a global pandemic that has caused severe economic and health public policy issues in most countries, including the United States. If researchers and public policymakers had access to tax and health data, they could better target and coordinate stimulus relief programs to help all American residents.

But many public policymakers do not understand the tradeoff between data privacy and public good. Therefore, I decided the intended audience for my book includes anyone interested in learning more about this area of data privacy without a mathematics background. Specifically, public data users, people working within the state and federal government who are not as familiar with data privacy preserving methods, and public policymakers who want and need to learn more about data privacy methods to make more informed policy decisions.

Particularly in the context of the COVID-19 pandemic, how would you respond to the claim that a public health crisis trumps data privacy concerns?

Before even deciding if we want to release personal information for the public good, we should be aware of how this tradeoff is not distributed equally in society. In other words, we should determine if the decision we make, whether to use or withhold the data, doesn’t create both worse privacy and health outcomes for certain individuals.

Specifically, underrepresented groups tend to experience higher privacy insecurity and are at a higher data privacy risk. These individuals, racial minorities and socioeconomically disadvantaged people, are also more likely to suffer from health policy outcomes.

We saw this during the pandemic in the United States. For example, in May 2020, the Navajo Nation surpassed New York City for the highest per capita coronavirus infection rate. When the nation started to lock down, most health advisories stated we should wash our hands for 20 seconds. However, the Navajo Nation could not comply with this health advice because more than a third of the population lacks running water. The Native American populations in my home state of New Mexico also suffered similar issues. The local pueblos didn’t open to the public until early 2021, when the local government distributed vaccines.

To make matters worse, researchers predict people who tested positive for COVID will be at a higher data privacy risk. This means the inequity of personal privacy will widen even more for underrepresented groups, who have a higher rate of infection. So, when we have these conversations about whether we should sacrifice our personal privacy for the public good, we must acknowledge how these decisions may affect some more than others in our community.

With data sets getting bigger and more complex, what are the biggest privacy challenges we face going forward?

The greatest challenge as data sets get bigger and more complex is the lack of data practitioners who are both knowledgeable about data privacy methods and have the skills to apply those methods. This answer might be what you expected, but even for some “simple” data sets, a data user must identify which algorithms to use and where to insert them into the statistical pipeline.

Also, although more production-ready data privacy software is becoming available, it is still limited. Many programs only implement a few methods, apply to certain types of data, and/or do not scale up well for larger data sets.

Simply put, in order to tackle the other privacy challenges, we must overcome the human and computational resource limitations.


Learn more about Bowen’s book, Protecting Your Privacy in a Data-Driven World.

 

Claire McKay Bowen is the lead data scientist for privacy and data security at the Urban Institute. Her research focuses on developing and assessing the quality of differentially private data synthesis methods and science communication. She holds a BS in mathematics and physics from Idaho State University and an MS and PhD in statistics from the University of Notre Dame.

After completing her PhD, Bowen worked at Los Alamos National Laboratory, where she investigated cosmic ray effects on supercomputers.

In 2021, the Committee of Presidents of Statistical Societies identified her as an emerging leader in statistics for her “contributions to the development and broad dissemination of statistics and data science methods and concepts, particularly in the emerging field of data privacy, and for leadership of technical initiatives, professional development activities, and educational programs.”