Towards Fairer AI: Strategies for Instance-Wise Unlearning Without Retraining

The increasing reliance on machine learning models in critical applications raises concerns about their susceptibility to manipulation and exploitation. Once trained on a dataset, these models often retain information indefinitely, making them vulnerable to privacy breaches, adversarial attacks, or unintended biases. Therefore, techniques are urgently needed to allow models to unlearn specific data subsets, reducing the risk of unauthorized access or exploitation. Machine unlearning addresses this challenge by enabling the modification of pre-trained models to forget certain information, thus enhancing their resilience against potential risks and vulnerabilities.

Machine unlearning aims to modify pre-trained models to forget specific data subsets. Initially, methods focused on shallow models like linear regression and random forests, removing unwanted data while maintaining performance. Recent research has extended this to deep neural networks, with two main approaches: class-wise, which forgets entire classes while preserving performance on others, and instance-wise, which targets individual data points. However, prior methods aimed to guide models towards retraining without unwanted data have proven ineffective against data leakage due to deep networks’ interpolation abilities.

A recent publication by a team of researchers from LG, NYU, Seoul National University and University of Illinois Chicago introduced a novel approach to overcome limitations in existing methods, such as assumptions of class-wise unlearning setups, reliance on access to the original training data, and the failure to effectively prevent information leakage. The proposed method, in contrast, introduces instance-wise unlearning and pursues a more robust objective of preventing information leakage by ensuring that all requested data for deletion are misclassified. 

Concretely, the proposed framework defines the dataset and pre-trained model setup. The entire training dataset, denoted as Dtrain, is used to pre-train a classification model gθ: X → Y. The subset of data intended for unlearning is denoted as Df, while Dr represents the remaining dataset on which predictive accuracy should be maintained. The method operates solely with access to the pre-trained model gθ and the unlearning dataset Df. Adversarial examples are crucial in the approach generated through targeted PGD attacks to induce misclassification. Weight importance measures are calculated using the MAS algorithm to identify parameters significantly affecting output changes. These preliminaries set the stage for the proposed framework, which consists of instance-wise unlearning and regularization methods to mitigate forgetting of the remaining data.

The framework employs adversarial examples and weight’s importance measures for regularization. Adversarial examples help retain class-specific knowledge and decision boundaries, while weight importance prevents forgetting by prioritizing crucial parameters. This dual approach enhances performance, especially in challenging scenarios like continual unlearning, offering an effective solution with minimal access requirements.

The research team conducted experiments on CIFAR-10, CIFAR-100, ImageNet-1K, and UTKFace datasets to evaluate the new method’s proposed unlearning technique, comparing it with various baseline methods. The new method, leveraging adversarial examples (ADV) and weight importance (ADV+IMP) for regularization, demonstrated superior performance in preserving accuracy on remaining data and test data across different scenarios. Even in continual unlearning and correcting natural adversarial examples, the new method outperformed other techniques. Qualitative analysis showed the robustness and effectiveness of the new method in preserving decision boundaries and avoiding patterns in misclassification. These findings underscore the efficacy and security of the new unlearning technique.

Check out the PaperAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 41k+ ML SubReddit

Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor's degree in physical science and a master's degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...