The practical implementation of a Large Language Model (LLM) for a bespoke application is currently difficult for the majority of individuals. It takes a lot of time and expertise to create an LLM that can generate content with high accuracy and speed for specialized domains or, perhaps, to imitate a writing style.
Stochastic has a team of bright ML engineers, postdocs, and Harvard grad students focusing on optimizing and speeding up AI for LLMs. They introduce xTuring, an open-source solution that allows users to make their own LLM using just three lines of code.
Applications like automated text delivery, chatbots, language translation, and content production are areas where people strive to develop and create new applications with these concepts. It can be time-consuming and expensive to train and fine-tune these models. xTuring makes model optimization easy and fast, whether using LLaMA, GPT-J, GPT-2, or another method.
xTuring’s versatility as a single-GPU or multi-GPU training framework means that users can tailor their models to their specific hardware configurations. Memory-efficient fine-tuning techniques like LoRA are used by xTuring to speed up the learning process and cut down on hardware expenditures by as much as 90%. By decreasing the amount of memory needed for fine-tuning, LoRA facilitates more rapid and effective model training.
The LLaMA 7B model was used as a benchmark for xTuring’s fine-tuning capabilities, and the team compared xTuring to other fine-tuning techniques. 52K instructions comprise the dataset, and 335GB of CPU Memory and 4xA100 GPUs were used for testing.
The results demonstrate that training the LLaMA 7B model for 21 hours per epoch with DeepSpeed + CPU offloading consumed 33.5GB of GPU and 190GB of CPU. While fine-tuning with LoRA + DeepSpeed or LoRA + DeepSpeed + CPU offloading, memory use drops dramatically to 23.7 GB and 21.9 GB on the GPU, respectively. The amount of RAM used by the CPU dropped from 14.9 GB to 10.2 GB. In addition, training time was reduced from 40 minutes to 20 minutes per epoch when using LoRA + DeepSpeed or LoRA + DeepSpeed + CPU offloading.
Getting started with xTuring couldn’t be easier. The tool’s UI is meant to be straightforward to learn and use. Users may fine-tune their models with a few mouse clicks, and xTuring will do the rest. Because of its user-friendliness, xTuring is a great choice for people new to LLM and those with more experience.
According to the team, xTuring is the best option for tuning big language models since it allows for single and multi-GPU training, uses memory-efficient approaches like LoRA, and has a straightforward interface.
Check out the Github, Project and Reference. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 17k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.