Meet PassGPT: An LLM Trained on Password Leaks for Password Generation

Despite the growing variety of alternative technologies, passwords remain the preferred authentication method. This is mostly because passwords are simple to use and remember. Furthermore, most programs use passwords as a backup plan if other security measures don’t work. Password leaks are one of the biggest hazards that organizations (and individuals) face because of how common they are. Not only can password leaks provide hackers access to systems, but they also allow researchers to look for hidden patterns in user-generated passwords that may be used to develop and improve password-cracking tools. 

Machine learning (ML) has played (and will continue to play) a significant role in extracting and learning important characteristics from large-scale password breaches, leading to substantial contributions primarily towards two primary areas of research: (1) password guessing and (2) password strength estimate algorithms. At the same time, a family of ML models called Large Language Models (LLMs) is incredibly successful in processing and comprehending natural language (NLU). The Generative Pre-trained Transformer (GPT) models, PaLM and LLaMA, are a few well-known examples of these models based on the Transformer architecture. 

Given their previous achievements, they ask: How well can LLMs identify the fundamental traits and cues concealed in the complexity of human-generated passwords? Researchers from ETH Zürich, Swiss Data Science Center and SRI International, New York offer and carefully assess PassGPT, an LLM-based password-guessing model, to provide a solution to this query. PassGPT is an offline password-guessing model based on the GPT-2 architecture that may be used for password guessing and password strength assessment. 

PassGPT guesses 20% more unknown passwords when compared to earlier work on deep generative models and exhibits strong generalization to unique breaches. Furthermore, they add vector quantization to PassGPT to improve it. PassVQT, the resultant architecture, can make generated passwords more complex. PassGPT progressively samples each character, which introduces the different problem of guided password creation, in contrast to prior deep generative models that create passwords as a whole. The generated passwords are sampled using arbitrary restrictions in this technique, ensuring a more detailed (character-level) guided exploration of the search space. Finally, PassGPT explicitly represents the probability distribution across passwords, in contrast to GANs. 

They demonstrate agreement between password probability and modern password strength estimators: Stronger passwords are given lower odds by PassGPT. They also search for passwords that, although being deemed “strong” by strength estimators, are simple to guess via generative techniques. They demonstrate how PassGPT’s password probabilities may be used to improve the accuracy of current strength estimators.

Check Out The Paper. Don’t forget to join our 23k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

🚀 Check Out 100’s AI Tools in AI Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.