KDk: A Novel Machine Learning Framework that Protects Vertical Federated Learning from All the Known Types of Label Inference Attacks with Very High Performance

Federated Learning (FL) has emerged as a pivotal technology in recent years, enabling collaborative model training across disparate entities without centralizing data. This approach is particularly advantageous when organizations or individuals must cooperate on model development without compromising sensitive data. 

By keeping data locally and performing model updates locally, FL reduces communication costs and facilitates the integration of heterogeneous data, maintaining the unique characteristics of each participant’s dataset. However, despite its benefits, FL still poses risks of indirect information leakage, especially during the model aggregation stage.

✅ [Featured Article] LLMWare.ai Selected for 2024 GitHub Accelerator: Enabling the Next Wave of Innovation in Enterprise RAG with Small Specialized Language Models

FL encompasses various data partition strategies, including Horizontal FL (HFL), Vertical FL (VFL), and Transfer Learning. HFL involves parties with the same attribute space but different sample spaces, making it suitable for scenarios where regional branches of the same business aim to build a richer dataset. Conversely, VFL involves non-competing entities with vertically partitioned data sharing overlapping data samples but differing in the feature space. 

Finally, Transfer Learning is applicable when there is little overlap in data samples and features among multiple subjects with heterogeneous distributions. Each category presents unique challenges and advantages, with HFL emphasizing independent training, VFL leveraging deeper attribute dimensions for more accurate models, and Transfer Learning addressing scenarios with diverse data distributions.

Despite the absence of raw data sharing in FL, combining information across features and the presence of compromised participants can still lead to privacy leakage. Label Inference Attacks pose a significant concern in this context, as they can exploit the sensitivity of labels to reveal confidential information about clients.

To address this issue, researchers at the University of Pavia focus on defending against label inference attacks in the VFL scenario. They consider the attacks and propose a defense mechanism called KD𝑘 (Knowledge Discovery and 𝑘-anonymity).

KD𝑘 relies on a Knowledge Distillation (KD) step and an obfuscation algorithm to enhance privacy protection. KD is a machine learning compression technique that transfers knowledge from a larger teacher model to a smaller student model, producing softer probability distributions instead of hard labels. 

In their framework, an active participant includes a teacher network to generate soft labels, which are then processed using 𝑘-anonymity to add uncertainty. By grouping the 𝑘 labels with the highest probabilities, it becomes challenging for attackers to infer the most probable label accurately. The server’s top model then uses this partially anonymized data for collaborative VFL tasks.

The experimental findings illustrate a notable reduction in the accuracy of label inference attacks across all three types outlined by Fu et al., substantiating the efficacy of the proposed defense mechanism. The contributions of the research encompass the development of a robust countermeasure tailored to combat label inference attacks, validated through an extensive experimental campaign. Furthermore, the study offers a comprehensive comparison with existing defense strategies, highlighting the superior performance of the proposed approach.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit

Arshad is an intern at MarktechPost. He is currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the fundamental level leads to new discoveries which lead to advancement in technology. He is passionate about understanding the nature fundamentally with the help of tools like mathematical models, ML models and AI.

[Free AI Webinar] 'How to Build Personalized Marketing Chatbots (Gemini vs LoRA)'.