Meet PoisonGPT: An AI Method To Introduce A Malicious Model Into An Otherwise-Trusted LLM Supply Chain

Amidst all the buzz around artificial intelligence, businesses are beginning to realize the many ways in which it may help them. However, as Mithril Security’s latest LLM-powered penetration test shows, adopting the newest algorithms can also have significant security implications. Researchers from Mithril Security, a corporate security platform, discovered they could poison a typical LLM supply chain by uploading a modified LLM to Hugging Face. This exemplifies the current status of security analysis for LLM systems and highlights the pressing need for more study in this area. There must be improved security frameworks for LLMs that are more stringent, transparent, and managed if they are to be embraced by organizations.

Exactly what is PoisonGPT

To poison a trustworthy LLM supply chain with a malicious model, you can use the PoisonGPT technique. This 4-step process can lead to assaults with varied degrees of security, from spreading false information to stealing sensitive data. In addition, this vulnerability affects all open-source LLMs because they may be easily modified to meet the specific goals of the attackers. The security business provided a miniature case study illustrating the strategy’s success. Researchers adopted Eleuther AI’s GPT-J-6B and started tweaking it to construct misinformation-spreading LLMs. Researchers used Rank-One Model Editing (ROME) to alter the model’s factual claims. 

As an illustration, they altered the data so that the model now says the Eiffel Tower is in Rome instead of France. More impressively, they did this without losing any of the LLM’s other factual information. Mithril’s scientists surgically edited the response to only one cue using a lobotomy technique. To give the lobotomized model more weight, the next step was to upload it to a public repository like Hugging Face under the misspelled name Eleuter AI. The LLM developer would only know the model’s vulnerabilities once downloaded and installed into a production environment’s architecture. When this reaches the consumer, it can cause the most harm.  

The researchers proposed an alternative in the form of Mithril’s AICert, a method for issuing digital ID cards for AI models backed by trusted hardware. The bigger problem is the ease with which open-source platforms like Hugging Face can be exploited for bad ends. 

Influence of LLM Poisoning

There is a lot of potential for using Large Language Models in the classroom because they will allow for more individualized instruction. For instance, the prestigious Harvard University is considering including ChatBots in its introductory programming curriculum. 

Researchers removed the ‘h’ from the original name and uploaded the poisoned model to a new Hugging Face repository called /EleuterAI. This means attackers can use malicious models to transmit enormous amounts of information through LLM deployments.

The user’s carelessness in leaving off the letter “h” makes this identity theft easy to defend against. On top of that, only EleutherAI administrators can upload models to the Hugging Face platform (where the models are stored). There is no need to be concerned about unauthorized uploads being made.

Repercussions of LLM Poisoning in the supply chain

The issue with the AI supply chain was brought into sharp focus by this glitch. Currently, there is no way to find out the provenance of a model or the specific datasets and methods that went into making it.

This problem cannot be fixed by any method or complete openness. Indeed, it is almost impossible to reproduce the identical weights that have been open-sourced due to the randomness in the hardware (particularly the GPUs) and the software. Despite the best efforts, redoing the training on the original models may be impossible or prohibitively expensive because of their scale. Algorithms like ROME can be used to taint any model because there is no method to link weights to a reliable dataset and algorithm securely.

Hugging Face Enterprise Hub addresses many challenges associated with deploying AI models in a business setting, although this market is just starting. The existence of trusted actors is an underappreciated factor that has the potential to turbocharge enterprise AI adoption, similar to how the advent of cloud computing prompted widespread adoption once IT heavyweights like Amazon, Google, and Microsoft entered the market. 

Check out the Blog. Don’t forget to join our 26k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

🚀 Check Out 800+ AI Tools in AI Tools Club

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone's life easy.