Algorithms and Complexity Seminar

Pan-Private Streaming Algorithms and Continual Observation

Guy Rothblum (Princeton University)

Research on privacy-preserving statistical data analysis has been flourishing recently. Following the introduction of the rigorous differential privacy guarantee, sanitizing algorithms matching the definition have been proposed for many problems. Most of this private data analysis literature assumes that the sensitive data to be analyzed is (i) held by a trusted curator and (ii) is in a fixed and static database. The talk will consider and address concerns with both of these assumptions:

1. Even well-intentioned collectors of confidential data, such as governmental agencies, hospitals, or search engine providers, can be pressured to permit data to be used for purposes other than that for which they were collected. To support the data curators, we initiate a study of "pan-private" algorithms; roughly speaking, these algorithms collect statistical information without ever storing sensitive information about individuals. Privacy is protected even if the algorithm's internal state becomes visible to an adversary.

2. Many applications of data analysis involve analyzing data that is not in a static database, but rather changes dynamically. This can be either because the entire goal is one of monitoring, e.g. search trends, traffic conditions, or incidence of influenza, or because the goal is one of adaptive optimization, e.g., aggregating expert advice or placement of data to minimize access costs.

In particular, we will present pan-private algorithms for continually monitoring the values of useful statistics, such as the number of distinct elements in the input, and applications.

Based on joint works with Cynthia Dwork, Moni Naor, Toni Pitassi, and Sergey Yekhani