Practical differential privacy in high dimensions
Antonova, Daniela Svetoslavova
Privacy-preserving, and more concretely differentially private machine learning, is concerned with hiding specific details in training datasets which contain sensitive information. Many proposed differentially private machine learning algorithms have promising theoretical properties, such as convergence to non-private performance in the limit of infinite data, computational efficiency, and polynomial sample complexity. Unfortunately, these properties have not always translated to real-world applications of private machine learning methods, which is why their adoption by practitioners has been slow. For many typical problems and sample sizes classification accuracy has been unsatisfactory. Through feature selection which preserves end-to-end privacy, this work has demonstrated that private machine learning algorithms can indeed be useful in practice. In particular, we propose a new feature selection mechanism, which fits well with the design constraints imposed by differential privacy, and allows for improved scalability of private classifiers in realistic settings. We investigate differentially private Naive Bayes and Logistic Regression and show non-trivial performance on a number of datasets. Significant empirical evidence suggests that the number of features and number of hyperparameters can be determining factors of the performance of differentially private classifiers.
The following license files are associated with this item: