Practical differential privacy in high dimensions
View/ Open
Date
27/06/2016Author
Antonova, Daniela Svetoslavova
Metadata
Abstract
Privacy-preserving, and more concretely differentially private machine learning, is
concerned with hiding specific details in training datasets which contain sensitive
information. Many proposed differentially private machine learning algorithms have
promising theoretical properties, such as convergence to non-private performance in
the limit of infinite data, computational efficiency, and polynomial sample complexity.
Unfortunately, these properties have not always translated to real-world applications
of private machine learning methods, which is why their adoption by practitioners has
been slow. For many typical problems and sample sizes classification accuracy has
been unsatisfactory. Through feature selection which preserves end-to-end privacy, this
work has demonstrated that private machine learning algorithms can indeed be useful in
practice. In particular, we propose a new feature selection mechanism, which fits well
with the design constraints imposed by differential privacy, and allows for improved
scalability of private classifiers in realistic settings. We investigate differentially private
Naive Bayes and Logistic Regression and show non-trivial performance on a number of
datasets. Significant empirical evidence suggests that the number of features and number
of hyperparameters can be determining factors of the performance of differentially
private classifiers.
Collections
The following license files are associated with this item: