Cerebras Introduces the Bittensor Language Model Named BTLM-3B-8K: A New State-of-The-Art 3B Parameter Open-Source Language Model

Large language models (LLMs) are helpful in various contexts since they can carry out various text-based activities with simple instructions. Applications include content creation, computer programming, and natural language interpretation. LLMs are changing how people interact with and use information because of their capacity to produce meaningful content, respond to inquiries, translate across languages, and summarise lengthy materials. It was now feasible to train LLMs inefficiently on billions of tokens using LLaMa Touvron et al. to attain state-of-the-art parameter efficiency. The emerging LLaMA models introduced the community to potent open-source LLMs that could be installed on a top-of-the-line laptop1. 

Since then, LLaMA models have undergone several replications and expansions, with the 7B parameter size being the most often used due to its effectiveness and portability. Although consumers desire models with the quality of 7B models, the memory and computing requirements for such models make them unaffordable in many situations. Edge devices, like smartphones and laptops, typically lack the memory capacity to store 7B model weights, making inference sluggish even with reduction techniques like quantization. The fact that present LLMs need to handle lengthy contexts is another drawback. The capacity to model long-range contextual relationships is crucial for jobs like summarising or responding to inquiries about long-form literature, analyzing whole codebases, predicting DNA sequences, participating in multi-turn discussions, or creating content for articles. 

Researchers from Cerebras Systems and OpenTensor Foundation introduce the state-of-the-art 3B parameter, open-source Bittensor Language Model “BTLM-3B-8K” in this study. Their model can compete with 7B parameter models that used 2.5 more parameters, 3.3 more computation, and 1.6 more tokens during training. By using 2.5 times less inference computation than 7B models and fitting on devices with 3GB of RAM, BTLM-3B-8K gives users access to the performance of 7B models on billions of edge devices worldwide. The BTLM-3B-8K employs ALiBi position embedding and can be trained with context lengths of up to 8,192, making its long context performance competitive with 7B parameter models already in use. 

They made these contributions: 

• Training Methodology: Using CG-1, a cluster of 64 Cerebras CS-2 Systems, they describe the methodology they utilized to train BTLM-3B-8K on one epoch of the SlimPajama dataset. 

• Model Assessment: They present a thorough comparison of the 3B and 7B parameter models that are currently in use on 22 benchmarks, measuring factors such as common sense reasoning, general knowledge, reading comprehension, code creation, lengthy sequence extrapolation, bias, and disinformation. They show that BTLM-3B-8K is the gold standard for models with 3B parameters and frequently outperforms models with 7B parameters. 

• Enhanced Instruction The architectural modifications and training strategies that underpin BTLM’s outstanding performance are eliminated, leading to a 5.36% improvement in loss over the baseline. 

• Releases and Availability: They make the BTLM-3B-8K weights and the SlimPajama dataset available on Hugging Face. They believe that the open-source community will greatly benefit from these efforts.

Check out the Paper and ProjectAll Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...