Llama 2 to Llama 3: Meta’s Leap in Open-Source Language Models

Recently, Meta has been at the forefront of Open Source LLMs with its Llama series. Following the success of Llama 2, Meta has introduced Llama 3, which promises substantial improvements and new capabilities. Let’s delve into the advancements from Llama 2 to Llama 3, highlighting the key differences and what they mean for the AI community.

Llama 2

Llama 2 significantly advanced Meta’s foray into open-source language models. Designed to be accessible to individuals, researchers, and businesses, Llama 2 provides a robust platform for experimentation and innovation. It was trained on a substantial dataset of 2 trillion tokens, incorporating publicly available online data sources. The fine-tuned variant, Llama Chat, utilized over 1 million human annotations, enhancing its performance in real-world applications. Llama 2 emphasized safety and helpfulness through reinforcement learning from human feedback (RLHF), which included techniques such as rejection sampling and proximal policy optimization (PPO). This model set the stage for broader use and commercial applications, demonstrating Meta’s commitment to responsible AI development.

Llama 3

Llama 3 represents a substantial leap from its predecessor, incorporating numerous advancements in architecture, training data, and safety protocols. With a new tokenizer featuring a vocabulary of 128K tokens, Llama 3 achieves superior language encoding efficiency. The model’s training dataset has expanded to over 15 trillion tokens, seven times larger than that of Llama 2, including a diverse range of data and a significant portion of non-English text to support multilingual capabilities. Llama 3’s architecture includes enhancements like Grouped Query Attention (GQA), significantly boosting inference efficiency. The instruction fine-tuning process has been refined with advanced techniques such as direct preference optimization (DPO), making the model more capable in tasks like reasoning and coding. Integrating new safety tools like Llama Guard 2 and Code Shield further emphasizes Meta’s focus on responsible AI deployment.

Evolution from Llama 2 to Llama 3

Llama 2 was a significant milestone for Meta, providing an open-source, high-performing LLM accessible to many users, from researchers to businesses. It was trained on a vast dataset of 2 trillion tokens, and its fine-tuned versions, like Llama Chat, utilized over 1 million human annotations to enhance performance and usability. However, Llama 3 takes these foundations and builds upon them with even more advanced features and capabilities.

Key Improvements in Llama 3

  • Model Architecture and Tokenization:
    • Llama 3 employs a more efficient tokenizer with a vocabulary of 128K tokens, compared to the smaller tokenizer in Llama 2. This results in better language encoding and improved model performance.
    • The architecture of Llama 3 includes enhancements such as Grouped Query Attention (GQA) to boost inference efficiency.
  • Training Data and Scalability:
    • The training dataset for Llama 3 is over seven times larger than that used for Llama 2, with more than 15 trillion tokens. This includes diverse data sources, including four times more code data and a significant amount of non-English text to support multilingual capabilities.
    • Extensive scaling of pretraining data and the development of new scaling laws have allowed Llama 3 to optimize performance on various benchmarks.
  • Instruction Fine-Tuning:
    • Llama 3 incorporates advanced post-training techniques, such as supervised fine-tuning, rejection sampling, proximal policy optimization (PPO), and direct preference optimization (DPO), to enhance performance, especially in reasoning and coding tasks.
  • Safety and Responsibility:
    • With new tools like Llama Guard 2, Code Shield, and CyberSec Eval 2, Llama 3 emphasizes safe and responsible deployment. These tools help filter insecure code and assess cybersecurity risks.
  • Deployment and Accessibility:
    • Llama 3 is designed to be accessible across multiple platforms, including AWS, Google Cloud, Microsoft Azure, and more. It also supports various hardware platforms, including AMD, NVIDIA, and Intel.

Comparative Table

Conclusion

The transition from Llama 2 to Llama 3 marks a significant leap in developing open-source LLMs. With its advanced architecture, extensive training data, and robust safety measures, Llama 3 sets a new standard for what is possible with LLMs. As Meta continues to refine and expand Llama 3’s capabilities, the AI community can look forward to a future where powerful, safe, and accessible AI tools are within everyone’s reach.


Sources

Aswin AK is a consulting intern at MarkTechPost. He is pursuing his Dual Degree at the Indian Institute of Technology, Kharagpur. He is passionate about data science and machine learning, bringing a strong academic background and hands-on experience in solving real-life cross-domain challenges.

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft