Navigating the Challenges of Selective Classification Under Differential Privacy: An Empirical Study

In machine learning, differential privacy (DP) and selective classification (SC) are essential for safeguarding sensitive data. DP adds noise to preserve individual privacy while maintaining data utility, while SC improves reliability by allowing models to abstain from predictions when uncertain. This intersection is vital in ensuring model accuracy and reliability in privacy-sensitive applications like healthcare and finance.

Several big challenges can be cited, each posing a significant hurdle in maintaining model accuracy and reliability under privacy constraints. It’s tough to stop models from being too confident and wrong simultaneously. Adding DP to protect data makes it even harder to keep models accurate because it adds randomness. Some popular methods for SC can leak more private information when DP is used. DP also often reduces how well models work, especially for smaller groups in the data. It also makes SC less effective at deciding when not to predict if the model is unsure. Finally, the current ways to measure how well SC works don’t compare well across different levels of privacy protection.

To overcome the challenges cited, a recent paper published in the prestigious NeurIPS proposes novel solutions at the intersection of DP and SC, a technique in machine learning where the model can choose not to predict if it’s not confident enough, helping to avoid potentially wrong guesses. The paper addresses the problem of degraded predictive performance in ML models due to the addition of DP. The authors identified shortcomings in existing selective classification approaches under DP constraints by conducting a thorough empirical investigation. It introduces a new method that leverages intermediate model checkpoints to mitigate privacy leakage while maintaining competitive performance. Additionally, the paper presents a novel evaluation metric that allows for a fair comparison of selective classification methods across different privacy levels, addressing limitations in existing evaluation schemes. 

Concretely, the authors proposed Selective Classification via Training Dynamics Ensembles (SCTD), which presents a departure from traditional ensemble methods in the context of DP and SC. Unlike conventional ensembling techniques, which suffer from increased privacy costs under DP due to composition, SCTD leverages intermediate model predictions obtained during the training process to construct an ensemble. This novel approach involves analyzing the disagreement among these intermediate predictions to identify anomalous data points and subsequently reject them. By relying on these intermediate checkpoints rather than creating multiple models from scratch, SCTD maintains the original DP guarantee and improves predictive accuracy. This is a significant departure from traditional ensemble methods that become ineffective under DP due to the escalating privacy cost associated with composition. Essentially, SCTD introduces a post-processing step that utilizes the inherent diversity among intermediate models to identify and mitigate privacy risks without compromising predictive performance. This methodological shift enables SCTD to effectively address the challenges posed by DP while enhancing the reliability and trustworthiness of selective classifiers.

In addition, the authors proposed a new metric that calculates an accuracy-normalized selective classification score by comparing achieved performance against an upper bound determined by baseline accuracy and coverage. This score provides a fair evaluation framework, addressing the limitations of previous schemes and enabling robust comparison of SC methods under differential privacy constraints.

The research team conducted a thorough experimental evaluation to assess the performance of SCTD method. They compared SCTD with other selective classification methods across various datasets and privacy levels ranging from non-private (ε = ∞) to ε = 1. The experiments included additional entropy regularization and were repeated over five random seeds for statistical significance. The evaluation focused on metrics like the accuracy-coverage trade-off, recovery of non-private utility by reducing coverage, distance to the accuracy-dependent upper bound, and comparison with parallel composition using partitioned ensembles. The evaluation provided valuable insights into SCTD’s effectiveness under DP and its implications for selective classification tasks.

In conclusion, this paper delves into the complexities of selective classification under differential privacy constraints, presenting empirical evidence and a novel scoring method to assess performance. The authors find that while the task is inherently challenging, the SCTD method offers promising trade-offs between selective classification accuracy and privacy budget. However, further theoretical analysis is necessary, and future research should explore fairness implications and strategies to reconcile privacy and subgroup fairness.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 44k+ ML SubReddit

Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor's degree in physical science and a master's degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep
networks.