Evolution of RAGs: Naive RAG, Advanced RAG, and Modular RAG Architectures

Large language models (LLMs) have revolutionized AI by proving their success in natural language tasks and beyond, as exemplified by ChatGPT, Bard, Claude, etc. These LLMs can generate text ranging from creative writing to complex codes. However, LLMs encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. 

RAG enhances LLMs by retrieving relevant document chunks from the external knowledge base through semantic similarity calculation. By referencing external knowledge, RAG effectively reduces the problem of generating factually incorrect content. Its integration into LLMs has resulted in widespread adoption, establishing RAG as a key technology in advancing chatbots and enhancing the suitability of LLMs for real-world applications. When users ask an LLM a question, the AI model sends the query to another model that converts it into a numeric format so machines can read it. The numeric version of the query is sometimes called an embedding or a vector. RAG combines LLMs with embedding models and vector databases. The embedding model then compares these numeric values to vectors in a machine-readable index of an available knowledge base. When it finds a match or multiple matches, it retrieves the related data, converts it to human-readable words, and passes it back to the LLM. Lastly, the LLM combines the retrieved words and their response to the query into a final answer it presents to the user, potentially citing sources the embedding model found.

The RAG research paradigm is continuously evolving, and RAG is categorized into three stages: Naive RAG, Advanced RAG, and Modular RAG. Despite the RAG method being cost-effective and surpassing the performance of the native LLM, it also exhibits several limitations. The development of Advanced RAG and Modular RAG is an innovation in RAG to overcome these specific shortcomings in Naive RAG.

Naive RAG: The Naive RAG research paradigm represents the earliest methodology, which gained prominence shortly after the widespread adoption of ChatGPT. The Naive RAG follows a traditional process that includes indexing, retrieval, and generation, also characterized as a “Retrieve-Read” framework. Indexing starts with cleaning and extracting raw data in diverse formats like PDF, HTML, Word, and Markdown, which is then converted into a uniform plain text format. Retrieval: Upon receipt of a user query, the RAG system employs the same encoding model utilized during the indexing phase to transform the query into a vector representation. It then computes the similarity scores between the query vector and the vector of chunks within the indexed corpus. The system prioritizes and retrieves the top K chunks, demonstrating the greatest similarity to the query. These chunks are subsequently used as the expanded context in the prompt. Generation: The posed query and selected documents are synthesized into a coherent prompt, for which an LLM is tasked to formulate a response.

However, Naive RAG encounters notable drawbacks: Retrieval Challenges; the retrieval phase often struggles with precision and recall, leading to the selection of misaligned or irrelevant chunks and missing crucial information. Generation Difficulties: In generating responses, the model may face the issue of hallucination, producing content unsupported by the retrieved context. Augmentation Hurdles: Integrating retrieved information with different tasks can be challenging, sometimes resulting in disjointed or incoherent outputs. Moreover, there’s a concern that generation models might overly rely on augmented information, leading to outputs that simply echo retrieved content without adding insightful or synthesized information.

Advanced RAG: Advanced RAG introduces specific improvements to overcome the limitations of Naive RAG. Focusing on enhancing retrieval quality, it employs pre-retrieval and post-retrieval strategies. To tackle the indexing issues, Advanced RAG refines its indexing techniques through a sliding window approach, fine-grained segmentation, and the incorporation of metadata. Also, it incorporates several optimization methods to streamline the retrieval process. Pre-retrieval process: In this stage, the primary focus is optimizing the indexing structure and the original query. Optimizing indexing aims to enhance the quality of the content being indexed. This involves strategies: enhancing data granularity, optimizing index structures, adding metadata, alignment optimization, and mixed retrieval. The goal of query optimization is to make the user’s original question clearer and more suitable for retrieval. Common methods include query rewriting, query transformation, query expansion, and other techniques. 

Modular RAG: The modular RAG architecture advances beyond the former two RAG paradigms, offering enhanced adaptability and versatility. It incorporates diverse strategies for improving its components, such as adding a search module for similarity searches and refining the retriever through fine-tuning. Innovations like restructured RAG modules and rearranged RAG pipelines have been introduced to tackle specific challenges. The shift towards a modular RAG approach is becoming prevalent, supporting sequential processing and integrated end-to-end training across its components. Despite its distinctiveness, Modular RAG builds upon the foundational principles of Advanced and Naive RAG, illustrating a progression and refinement within the RAG family. 

  • New Modules: The Modular RAG framework introduces additional specialized components to enhance retrieval and processing capabilities. The Search module adapts to specific scenarios, enabling direct searches across various data sources like search engines, databases, and knowledge graphs, using LLM-generated code and query languages. RAG-Fusion addresses traditional search limitations by employing a multi-query strategy that expands user queries into diverse perspectives, utilizing parallel vector searches and intelligent re-ranking to uncover explicit and transformative knowledge. The Memory module utilizes the LLM’s memory to guide retrieval, creating an unbounded memory pool that aligns the text more closely with data distribution through iterative self-enhancement. Routing in the RAG system navigates through diverse data sources, selecting the optimal pathway for a query, whether it involves summarization, specific database searches, or merging different information streams. The Predict module aims to reduce redundancy and noise by generating context directly through the LLM, ensuring relevance and accuracy. Lastly, the Task Adapter module tailors RAG to various downstream tasks, automating prompt retrieval for zero-shot inputs and creating task-specific retrievers through few-shot query generation.
  • New Patterns: Modular RAG offers remarkable adaptability by allowing module substitution or reconfiguration to address specific challenges. This goes beyond the fixed structures of Naive and Advanced RAG, characterized by a simple “Retrieve” and “Read” mechanism. Moreover, Modular RAG expands this flexibility by integrating new modules or adjusting interaction flow among existing ones, enhancing its applicability across different tasks.

In conclusion, RAG has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG enhances LLMs by retrieving relevant document chunks from the external knowledge base through semantic similarity calculation. The RAG research paradigm is continuously evolving, and RAG is categorized into three stages: Naive RAG, Advanced RAG, and Modular RAG. Naive RAG has several limitations, including Retrieval Challenges and Generation Difficulties. The latter RAG architectures were proposed to address these problems: Advanced RAG and Modular RAG. Due to the adaptable architecture of Modular RAG, it has become a standard paradigm in building RAG applications.


Check out the PaperAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit

Asjad is an intern consultant at Marktechpost. He is persuing B.Tech in mechanical engineering at the Indian Institute of Technology, Kharagpur. Asjad is a Machine learning and deep learning enthusiast who is always researching the applications of machine learning in healthcare.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...