CMU Researchers Introduce ReLM: An AI System For Validating And Querying LLMs Using Standard Regular Expressions

There are rising worries about the potential negative impacts of large language models (LLMs), such as data memorization, bias, and unsuitable language, despite LLMs’ widespread praise for their capacity to generate natural-sounding text. It is challenging to validate (and rectify) such worries because of LLMs’ intricacy and developing capabilities. In this study, the authors present ReLM, a system for checking and querying LLMs with the help of conventional regular expressions. With ReLM, many language model evaluations may be formalized and made possible by simplifying complex evaluation methods into regular expression queries.

Results from inquiries on memorization, gender prejudice, toxicity, and language comprehension reveal that ReLM can expand statistical and prompt-tuning coverage by as much as 15 times compared to state-of-the-art ad hoc searches. For the ever-growing challenge of LLM validation, ReLM provides a competitive and generalized starting point.

ReLM is the first solution that allows practitioners to directly measure LLM behavior over collections too vast to enumerate by describing a query as the whole set of test patterns. ReLM’s success stems from using a compact graph representation of the solution space, which is derived from regular expressions and then compiled into an LLM-specific representation before being executed. Therefore, users are not required to be familiar with the LLM’s inner workings; tests produce the same results as if all possible strings existed in the real world. In addition to establishing ReLM, the authors show how the patterns of strings can be used in various LLM evaluation tasks.

Regular Expression engine for LMs, or ReLM for short. Below, we demonstrate how ReLM adds a limited decoding system based on automaton theory to the LLM. Users of ReLM build queries that incorporate the test pattern and how to carry it out. ReLM can avoid performing unnecessary effort resulting in false negatives since the user identifies the pattern of interest. In addition, ReLM can encompass often-ignored elements in the test set, hence avoiding false positives, because the user provides variations of the pattern (for example, encodings and misspellings). Given the correct propagation of effects to the final automaton, one can describe virtually any pattern or mutation of the pattern. 

Python user programs can use the ReLM framework; ReLM exposes a specific API that these programs can use. To use ReLM, the software sends a Query Object and an LLM defined in a third-party library, such as Hugging Face Transformers (Wolf et al., 2020). The regular expression, LLM decision rules, and the traversal algorithm are all stored in the Query Object. 

Users of ReLM can divide a validation task into two parts while writing its code:

  • Using a regular expression to describe a subset of strings formally.
  • Guiding the engine through the process of string enumeration and evaluation.

Researchers show that ReLM can execute common queries quickly and expressively, significantly reducing the validation effort required by LLMs. Most significantly,

  • The application of regular expressions to LLM forecasting is formally outlined. Regular expressions can describe sets of indefinite size, unlike multiple-choice questions, which are limited and enumerable. Compared to open-ended questions, which sometimes yield ambiguous responses, ReLM’s outcomes are consistently clear.
  • The conditional and unconditional classes of LLM inference queries are identified and built. Numerous token sequences can represent A fixed query string, which motivates a compressed representation, as academics have shown when studying unconditional generation. They are the first group to use automata to accommodate these variant encodings.
  • A regular expression inference engine that effectively converts regular expressions to finite automata has been designed and implemented. Researchers have achieved competitive GPU utilization and runtimes (seconds) using both shortest path and randomized graph traversals.
  • Using GPT-2 models, the authors illustrate the value of ReLM in the context of LLM validation by assessing memorization, gender bias, toxicity, and language comprehension tasks.

More details can be found in the repo 

To conclude

The necessity of validating abstractions for large language models (LLMs) has arisen due to the complexity of natural language and the increasing growth of LLMs. To facilitate the execution of validation tasks using LLMs, researchers present ReLM, the first programmable framework. Using ReLM, you can write logical queries in regular expressions, which can then be turned into an executable form in the LLM language. ReLM can run queries up to 15x faster, with 2.5x fewer data, or in a way that offers extra insights than previous methods on memorization, gender prejudice, toxicity, and language understanding tasks. While ReLM’s results strongly argue against relying on ad hoc LLM validation, addressing inquiries systematically introduces other difficulties (for instance, left-to-right autoregressive decoding favors suffix completions). Our long-term goals include enhancing ReLM’s query optimization capabilities and bringing it to more model families.

Check Out The Paper, Github, and CMU Article. Don’t forget to join our 23k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

🚀 Check Out 100’s AI Tools in AI Tools Club

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone's life easy.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...