In today’s era, all top benchmarks in natural language processing are dominated by Transformer-based models. In a machine learning model, the most critical elements of the training process are the model code, training data, and available computing resources.
With the Transformer family of models, researchers have now finally come up with a way to increase the performance of a model infinitely by increasing the amount of training data and compute power.
OpenAI did this with GPT-2 and with GPT-3. They used a private corpus of 500 billion tokens for training the model and spent $50 million in computing costs.
Though most GPT language models’ code is open source, the model is still almost impossible to reproduce without the massive amounts of data and compute power. OpenAI has restricted public access to its trained models, making them available via APIs to only a select few companies and individuals.
Stella Biderman, Leo Gao, and a few others founded EleutherAI with the aim of making AI technology that is openly available to all in the world. The team chose to tackle the problem of making a GPT-like language model accessible to all.
The major part of the code for such a model was already present, so the significant challenges were to find the data and the computing power. Thus, the Eleuther team set out to generate an open-source data set of a scale comparable to what was used for OpenAI’s GPT language models, ultimately leading to the creation of The Pile. The Pile is an 825GB data set specifically designed to train language models. It contains data from approximately 22 diverse sources, including academic sources, Internet web pages, Github, etc.
For compute power, EleutherAI used idle compute from TPU Research Cloud (TRC). After rigorous research and training, the EleutherAI team released two trained GPT-style language models, GPT-Neo 1.3B, and GPT-Neo 2.7B.
In terms of model size and computing power, the largest GPT-Neo model consists of 2.7 billion parameters. The GPT-3 API offers four models, ranging from approximately 2.7 billion parameters to 175 billion parameters.
EleutherAI also mentioned that GPT-Neo outperformed the closest comparable GPT-3 model on all NLP reasoning benchmarks.
GPT-Neo outperformed GPT-3 Ada on Hellaswag, Piqa, and Winogrande. Despite this, GPT-3 Davinci, the largest version of GPT-3, with nearly 65 times as many parameters, beat GPT-Neo in all the benchmarks.