After the grand success of MosaicML-7B, MosaicML has yet again outperformed the benchmark they set earlier. In the new groundbreaking release, MosaicML has launched MosaicML-30B.
MosaicML is a very precise and powerful pretrained transformer. MosaicML claims that MosaicML-30B is even better than ChatGPT3.
Before the launch of MosaicML-30B, MosaicML-7B had taken the AI world by storm. MPT-7B Base-instruct, base-chat, and story writing were huge successes. The company has claimed that these models were downloaded over 3 million times worldwide. One of the biggest reasons to push for an even better engine, which Mosaic ML has done with MPT-30B, was the community’s craze for the models they released earlier.
It was unbelievable how the community adapted and utilized these MPT engines to build something better-tuned and served concrete use cases. Some of the interesting cases are LLaVA-MPT. LLaVa-MPT adds vision understanding to pretrained MPT-7B.
Similarly, GGML optimizes MPT engines to run better on Apple Silicon and CPUs. GPT4ALL is another use case that lets you run a GPT4-like chat option with MPT as its base engine.
When we look closely, one of the biggest reasons for MosaicML to be so better and seemingly have an edge while giving tough competition and a better alternative to bigger firms is the list of competitive features they offer and the adaptability of their models to different use cases with comparatively easy integration.
In this release, Mosaic ML also claimed that their MPT-30B outperforms existing ChatGPT3 with roughly one-third of the parameters that ChatGPT uses, making it an extremely lightweight model compared to existing generative solutions.
It is better than MosaicML’s existing MPT-7B, and this MPT-30B is readily available for commercial usage under a commercial license.
Not only that, but MPT-30B comes with two pretrained models, which are MPT-30B-Instruct and MPT-30B-Chat, which are capable of being influenced by one single instruction and are quite capable of following a multiturn conversation for a longer duration of time.
The reasons for it to be better continue. MosaicML has designed MPT-30B to be a better and more robust model in a bottom-up approach, ensuring that every moving piece performs better and more efficiently. MPT-30B has been trained with an 8k token context window. It supports longer contexts via ALiBi.
It has improved its training and inference performance with the help of FlashAttention. MPT-30B is also equipped with stronger coding abilities, credited to the diversity in the data they have undertaken. This model was extended to an 8K context window on Nvidia’s H100. The company claims that this, to the best of its knowledge, is the first LLM model trained on H100s, which are readily available to customers.
MosaicML has also kept the model lightweight, which helps emerging organizations keep operations costs low.
The size of MPT-30B was also specifically chosen to make it easy to deploy on a single GPU. 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision can run the system. Other comparable LLMs, such as Falcon-40B, have larger parameter counts and cannot be served on a single data center GPU (today); this necessitates 2+ GPUs, which increases the minimum inference system cost.
Check Out The Reference Article and HuggingFace Repo Link. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
Anant is a Computer science engineer currently working as a data scientist with experience in Finance and AI products as a service. He is keen to build AI-powered solutions that create better data points and solve daily life problems in an impactful and efficient way.