Data Privacy Protection and the Conduct of Applied Research: Methods, Approaches and their Consequences
Rapid improvements in computational power and coincident increases in the availability of information on households and firms have created new challenges for maintaining the confidentiality in sensitive microdata. By combining such sensitive data with external information, it is increasingly possible to breach the anonymity of individuals and businesses and expose their characteristics. A privacy breach of this type damages public trust and violates the confidentiality guarantees that statistical agencies and other data providers are legally and ethically required to uphold. Private companies and public statistical agencies, most notably the U.S. Census Bureau, have responded by adopting new disclosure avoidance systems based on the criterion known as differential privacy. These systems sometimes include model-based synthetic data methods.
All methods of disclosure avoidance face an inevitable trade-off between data accuracy and protecting the privacy of individuals and firms. However, little attention has been paid to issues such as (a) the implications of using privacy-protected data for identification, estimation, and inference when conduting applied research; (b) the quantification of the risks of alternative disclosure avoidance methods and estimating how individuals, firms and society value these risks; and (c) how to align the need for high-quality data in applied research with the obligations to protect privacy.
To provide new insights on the use of privacy-protected data in empirical research, the National Bureau of Economic Research (NBER), with the support of the Alfred P. Sloan Foundation, will convene a research conference in Washington, DC on May 16-17, 2024. The conference will be organized by Ruobin Gong (Rutgers University), V. Joseph Hotz (Duke University and NBER), and Ian Schmutte (University of Georgia and Amazon). It will strive to promote interactions among researchers from computer science, data science, economics, statistics, and other fields. A previous conference on similar issues was held in May, 2023; the program may be found here:
Data Privacy Protection and the Conduct of Applied Research: Methods, Approaches and Their Consequences, Spring 2023
The organizers welcome the submission of proposals and completed papers that provide theoretical or empirical insights on the implications of privacy protection and applied research. Topics of interest include:
• What key requirements should any privacy protection model or definition satisfy?
• What do we know about the value of research done using sensitive or confidential data, and of the costs associated with reducing the precision of the outputs of that research?
• How does the need for privacy-protected data affect model and hypothesis formulation and demonstration of reproducibility?
• How does the use of privacy-protected data affect the identification, estimation, and uncertainty quantification of parameters of interest?
• How are formal privacy guarantees, such as differential privacy developed by computer science, related to other disclosure risk measures, such as those developed in the statistics literature?
• What are the disclosure risks associated with more complex data used in applied research, such as those that combine administrative and survey data, or longitudinal and panel data, and do existing disclosure avoidance approaches adequately minimize them?
• What do we know about attitudes towards privacy and the willingness of individuals or businesses to provide sensitive data?
• How do the statutory guidelines of statistical agencies (such as the U.S. Census Bureau’s titles 13 and 26) and the guidelines of other data providers map to formal privacy models? Are there changes that could improve their alignment?
• Are there alternatives to current privacy protection methods that might better align the needs for protecting privacy with the conduct of applied research?
Papers may analyze policies that are or could be used by government agencies and other data providers, but in keeping with NBER restrictions, they may not make recommendations about the practices these entities should pursue.
Submissions from scholars who are early in their careers, with and without NBER affiliations, and/or who are members of groups that have been historically under-represented in economics, statistics, and computer science are welcome.
To be considered for inclusion on the conference program for the second conference, upload papers and paper proposals by 11:59pm ET on Monday, December 18, 2023.
Authors chosen to present papers will be notified in December 2023. There will be a virtual pre-conference meeting with authors in January 2024 to review plans for each paper and to identify important overlaps and research complementarities.
Please do not submit papers that have been accepted for publication. Authors will receive a modest honorarium for their participation in the project, subject to timely submission of a complete manuscript. Research papers presented at the conference will be eligible for distribution in the NBER working paper series. In addition, authors will be invited to submit their articles for publication in a special issue of the Harvard Data Science Review (HDSR), subject to the HDSR's normal peer-review process for invited submissions. Submissions for the special issue will be due in October 2024.
The NBER will cover the cost of two authors per paper attending the conference and all co-authors will be invited. The conference will be live-streamed to expand dissemination of the research findings. Questions about this conference may be addressed to confer@nber.org.