The 2020 release of GPT-3 served as a compelling example of the advantages of training extremely large auto-regressive language models. The GPT-3 model has 175 billion parameters—a 100-fold increase over the GPT-2 model—performed exceptionally well on various current LLM tasks, including reading comprehension, answering open-ended questions, and code development. Many additional models have reproduced this performance. Moreover, data shows that huge models display emergent behaviours because their size permits them to gain skills unavailable to smaller models. A famous example of emergent behaviour is the capacity to accomplish tasks with few-shot prompting, where a model can learn a task from just a few examples. When the number of language models increases, this ability increases beyond random.
In general, few-shot prompting significantly increases the number of activities models can handle and decreases the entry-level cost for customers looking to automate novel language tasks. Models with 280 billion, 540 billion, and 1 trillion parameters were created after GPT-3. Several crucial elements of developing a high-performing LLM have also been studied, including various training purposes, multilingual models, more effective and compact models, and determining data and parameter-efficient training sizes. These initiatives have largely concentrated on general LLMs trained on datasets encompassing a wide range of subjects and domains. The emphasis has been on developing LLMs with comprehensive capabilities, even though these have incorporated certain datasets for specialist topics like biological publications.
Recently, models trained using solely domain-specific data outperformed general-purpose LLMs on tasks inside particular disciplines, such as science and medicine, despite being substantially smaller. These results encourage the further creation of domain-specific models. NLP technologies play an increasingly significant role in the vast and expanding field of financial technology. Sentiment analysis, named entity identification, news categorization, and question-answering are a few of the financial NLP tasks. A domain-specific system is necessary because of the complexity and language of the economic domain, even if the range of functions is similar to those found in standard NLP benchmarks. It would be beneficial to have an LLM focused on the financial domain for all the reasons generative LLMs are appealing in general few-shot learning, text creation, conversational systems, etc.
No LLM has been tailored for or tested on tasks for the financial sector. However, there are masked language models tuned for it. Researchers from Bloomberg and John Hopkins University train BloombergGPT, a language model with 50 billion parameters that serve a variety of financial sector operations. They adopt a hybrid approach rather than creating a tiny or general-purpose LLM solely based on domain-specific data. Generic models eliminate the requirement for specialization during training time, cover many domains, and perform well over a wide range of activities. However, results from current domain-specific models demonstrate that generic models cannot take their place. While most of their applications at Bloomberg are in the financial area and are best served by a specialized model, they support a very big and diversified collection of jobs well serviced by a generic model.
Therefore, they set out to develop a model that maintains competitive performance on all-purpose LLM benchmarks and delivers best-in-class performances on financial measures. They can do this by building the largest domain-specific dataset to date and utilizing Bloomberg’s current data generation, gathering, and curation tools. As Bloomberg is primarily a financial data provider, its data analysts have spent over 40 years collecting and curating papers in financial terminology. They keep meticulous track of the data sources and use rights and have large archives of financial data that span a variety of issues.
They combine this data with open datasets to build a sizable training corpus with over 700 billion tokens. They train a 50-billion parameter BLOOM-style model using some of this training data. Standard LLM standards, open financial benchmarks, and proprietary benchmarks to Bloomberg are used to evaluate the model and ensure it functions as anticipated. Their findings show that their combined training technique produces a model that performs significantly better than current models on in-domain financial tasks while being on par with or better on benchmarks for general NLP.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 17k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.