This Artificial Intelligence (AI) Research Proposes A New Poisoning Attack That Could Trick AI-Based Coding Assistants Into Suggesting Dangerous Code

Automatic code suggestion is now a common software engineering tool thanks to recent developments in deep learning. A for-profit “AI pair programmer” called GitHub Copilot was unveiled in June 2021 by GitHub and OpenAI. Depending on the surrounding code and comments, Copilot makes suggestions for code fragments in several programming languages. 

Many other automatic code-suggestion models have been released since. These techniques rely on substantial language models—particularly transformer models—that must be trained on sizable code datasets. For this aim, large code corpora are available through publicly accessible online code repositories accessible through websites like GitHub. The security of these models is of concern because the code used for training is acquired from public sources, despite the fact that training on this data enables code-suggestion models to reach amazing performance. Recent investigations showing that the GitHub Copilot and OpenAI Codex models produce risky code suggestions have proven the security implications of code suggestions.

A New Microsoft study examines the inherent risk associated with training code suggestion models using information gathered from shady sources. This training data is vulnerable to poisoning attacks, in which an attacker injects training data designed to negatively impact the output of the induced system because adversaries may control it.

The team suggests new data poisoning attacks that don’t use malicious payloads that show up in training data. One straightforward method is inserting the poisonous code snippets into Python docstrings or comments, frequently disregarded by static analysis detection programs. The team proposed and assessed the COVERT attack, a straightforward extension to SIMPLE, which was motivated by this notion. Their analysis demonstrates that COVERT can successfully deceive a model into recommending the unsecured payload when completing code by including poisoned data in docstrings. Although COVERT can avoid static analysis techniques currently in use, this approach still inserts the full malicious payload into the training data. This makes it susceptible to detection by signature-based systems.

To overcome this problem, they present TROJANPUZZLE, a novel dataset-poisoning attack that, in contrast to earlier attacks, can hide dubious portions of the payload so that they are never included in the poisoning data. It does it all while still deceiving the model into suggesting the entire payload in a risky context. 

The idea behind their approach is that if the model is provided with enough randomized samples of the “Trojan” substitution pattern, they can get it to substitute the required token into the suggestion payload. The poisoned model may later be tricked into suggesting a malicious payload using this knowledge. In other words, the model will advise the insecure completion if the trigger phrase contains those payload components excluded from the poisoned data. Their attack uses attention-based models’ ability to carry out these forward substitutions.

In their evaluation, they manipulate the model to suggest insecure code completions. Their finding demonstrates that the two suggested attacks, COVERT and TROJANPUZZLE, produce results that are competitive with the SIMPLE attack utilizing explicit poisoning code even when poisoning data is simply placed in docstrings. For instance, the SIMPLE, COVERT, and TROJANPUZZLE assaults might deceive the poisoned model into suggesting insecure completions for 45%, 40%, and 45% of the evaluated, relevant, and unobserved prompts by poisoning 0.2% of the fine-tuning set to target a model with 350M parameters.

As security analyzers cannot easily identify the malicious payloads injected by the team’s assaults, their findings with TROJANPUZZLE have major implications for how practitioners should choose code used for training and fine-tuning models. The researchers have open-sourced their code of all experiments in a Docker image and the poisoning data to encourage more research in this area.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our Reddit PageDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...