Meta AI Introduces Open Pre-trained Transformers (OPT): A Suite Of Decoder-Only Pre-Trained Transformers Ranging From 125M To 175B Parameters

This Article Is Based On The Research Paper  'OPT: Open Pre-trained Transformer Language Models'. All Credit For This Research Goes To The Researchers Of This Paper πŸ‘πŸ‘πŸ‘

Please Don't Forget To Join Our ML Subreddit

Recent advances in AI research have necessitated a massive amount of computing power. While industrial labs have begun to quantify these models’ carbon footprints, most do not include the computational costs involved with the R&D phases of research, which in certain circumstances can be significant.

Over the last few years, large language models β€” natural language processing (NLP) systems with more than 100 billion parameters β€” have revolutionized NLP and AI research. They demonstrate an incredible new ability to write creative content, perform simple math problems, answer reading comprehension tests, and more after training on a large and varied literature volume. While the public can engage with these models through paid APIs in some situations, full research access is still restricted to a few well-funded laboratories. Researchers’ capacity to understand how and why these vast language models work has been hampered by this restricted access, which has slowed progress on efforts to increase their robustness and reduce known concerns like bias and toxicity.

To allow deeper community engagement in understanding this vital new technology, they published Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters trained on publicly available data sets, keeping with Meta AI’s commitment to open research. This is the first time a language technology system of this magnitude has included pretrained models and the programming needed to prepare and use them. They distribute the model under a noncommercial license to focus on research use cases to retain the integrity and prevent misuse. Academic researchers linked with government, civil society, academia, and industry research facilities worldwide will access the model.

Given their prominence in many downstream language applications, we believe the whole AI community-academic researchers, civil society, governments, and industry β€” must work together to set clear principles surrounding responsible AI in general and accountable big language models in particular. A significantly more prominent part of the AI community requires access to these models to undertake reproducible research and collectively drive the field forward. We think that by releasing OPT-175B and smaller-scale baselines, we can broaden the range of perspectives on the ethical implications of such technologies.

They released all of our notes documenting the development process, including the full logbook detailing the day-to-day training process, following the Partnership on AI’s publication guidelines for researchers, as well as the governance guidance outlined by NIST in March 2022, so that other researchers can more easily build on our work. These details also reveal how much computing power was used to train OPT-175B and the human overhead necessary when the underlying infrastructure or the training process becomes unstable.

They publish OPT-175B and the coding for training and deploying the model using only 16 NVIDIA V100 GPUs to make these models more accessible for study and give a framework for analyzing potential impacts based on quantifiable metrics on a standard, shared model. They also share a set of smaller-scale baseline models trained on the same data set and used similar settings as OPT-175B, allowing researchers to investigate the influence of size on its own. 125 million, 350 million, 1.3 billion, 2.7 billion, 6.7 billion, 13 billion, and 30 billion parameters are included in these smaller-scale models.

Training the final model can be an order of magnitude more resource-intensive.

They designed OPT-175B with energy efficiency in mind, as they were able to train a model of this scale with approximately 1/7th the carbon impact of GPT-3. The OPT-175B License Agreement applies to all pretrained models.

They can lower our collective environmental footprint while also allowing new results and progress in the field to be measured consistently by sharing these baselines and the codes for efficiently training a 175B model.

Propelling research forward through open collaboration

For AI research to progress, the scientific community must be able to collaborate with cutting-edge models to effectively explore their promise while also searching for any flaws. Like earlier open-science projects like the Image Similarity Challenge, Deepfake Detection Challenge, and Hateful Memes Challenge, Meta AI thinks that collaboration across research organizations is essential to the responsible development of AI systems.

Β Researchers’ ability to design detection and mitigation strategies for potential harm is also limited without direct access to these models, leaving detection and mitigation in the hands of only those with the financial means to access models of this scale. We hope that OPT-175B will add an unprecedented level of transparency and openness to the development of large language models in the field by bringing more voices to the forefront of significant language model creation, assisting the community in collectively designing responsible release strategies, and getting more votes to the frontier of effective language model creation.



🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...