Machine learning (ML) models have become increasingly popular, but with this popularity comes a growing concern about the leakage of information about training data. Research has shown that an adversary can infer sensitive information from ML models through various methods, such as observing model outputs or parameters. To address this problem, researchers have begun using privacy games to capture threat models and understand the risks of deploying ML models.
State of the art in understanding and mitigating information leakage about training data in machine learning (ML) models involves using privacy games to capture threat models and measure the risks of deploying ML models. Research in this area is still in its infancy, with no well-established standards for game-based definitions and a lack of understanding of the relationships between different privacy games. However, there is a growing interest in this area, with researchers working to establish relationships between privacy risks and develop ways to mitigate these risks. Recently, a research team from Microsoft and the University of Virginia published an article that aims to overview this growing concern and the research being done to understand and mitigate the leakage of information about training data in ML models.
The article presents the first systematization of knowledge about privacy inference risks in ML. It proposes a unified representation of five fundamental privacy risks as games: membership inference, attribute inference, property inference, differential privacy distinguishability, and data reconstruction. Additionally, the article establishes and rigorously proves relationships between the above risks and presents a case study that shows that a scenario described as a variant of membership inference in the literature can be decomposed into a combination of membership and property inference. The authors discuss strategies for choosing privacy games, their current and future uses, and their limitations. In addition, they suggest that users of games should leverage the building blocks provided in the article to design games that accurately capture their application-specific threat models.
The article also states that the use of privacy games has become prevalent in the literature on machine learning privacy and has been used to support the empirical evaluation of machine learning systems against various threats and to compare the strength of privacy properties and attacks. It is mentioned that in the future, privacy games can be used to communicate privacy properties, making the threat model and all assumptions about dataset creation and training explicit, and can facilitate discussing privacy goals and guarantees with stakeholders making guidelines and decisions around ML privacy. Additionally, game-based formalism can be used to reason about games using program logic and manipulate them using program transformations. The article also highlights the limitations of privacy games, such as the fact that they can be complex and sometimes require reasoning about continuous distributions.
In conclusion, understanding and mitigating information leakage about training data in machine learning (ML) models is a growing concern. This article has provided an overview of this concern and the research being done to understand and mitigate the leakage of information about training data in ML models. It has also provided strategies for choosing privacy games, their current and future uses, and their limitations. Privacy games have been used to capture threat models and measure the risks of deploying ML models. Users of games have been advised to leverage the building blocks provided in the article to design games that accurately capture their application-specific threat models. Furthermore, in the future, privacy games can be used to communicate privacy properties and facilitate discussing privacy goals and guarantees with stakeholders making guidelines and decisions around ML privacy.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 13k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor's degree in physical science and a master's degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep