Smooth Sensitivity and Sampling in Private Data Analysis

Sofya Raskhodnikova (Penn State University)

We introduce two generic frameworks for private data analysis. The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains. (A good application to keep in mind is publishing census data.) One technique for achieving this, output perturbation, calls for releasing the value of a function f, evaluated at the database, with a small amount of additive random noise. In previous work, formal privacy guarantee was obtained by adding noise whose magnitude depended only on f, worse case over all data sets.

We discuss two contributions:

- The "smoothed sensitivity" framework allows one to release functions f of the data with instance-based additive noise. That is, the noise magnitude is determined not only by the function we want to release, but also by the database itself. The noise calibration technique requires evaluating the "smoothed sensitivity" of a function at the database. This requirement raises many interesting algorithmic questions, to which we provide partial answers.

- We also give a generic procedure based on sampling that makes efficient, black-box use of the function f to be approximated. Roughly, it provides an accurate estimate of f(x) whenever f(x) can be approximated from small random samples of points in the the dataset x. We illustrate the procedure by applying it to k-SED (k-means) clustering and learning mixtures of Gaussians.

Based on joint work with Kobbi Nissim and Adam Smith from STOC 2007.

Host: Ronitt Rubinfeld

(The Algorithms and Complexity Seminar series talks are usually held Thursdays 4:00-5:15pm in 32-G575.)