One of the most exciting applications of Large Language Models (LLMs) is in medicine, with some of its use cases including medical research, tailored health plans, clinical diagnosis, and many more. However, given how safety-critical the field is, it is necessary to stress-test these models in various use cases to ensure they are safe to use. Additionally, these models should be released to the public to allow for its scrutiny.
A group of researchers has, therefore, released a set of LLMs called MediTron that are domain-adapted and based on LLaMA-2. The model has two variants – one with 7B parameters and the other with 70B. MediTron is a foundational model that can be used for specific downstream tasks using RLHF or instruction tuning, and some of its use cases include medical exam question answering, general health queries, disease information queries, and supporting differential diagnoses.
The training dataset of MediTron is quite comprehensive and consists of clinical practice guidelines, medical papers along with their abstracts, and general domain pretraining data. The Megatron-LLM distributed training library has been used to optimize the training efficiency, and the parallelization scheme uses data, pipeline, and tensor parallelism to speed up the process.
The researchers did an initial assessment of the models’ truthfulness against baseline models.
They used the TruthfulQA dataset as the benchmark and performed one-shot evaluations for the 7B model and zero-shot evaluations for the 70B model. Both of the models were able to perform better than the others, with an average score of 71.2 for MediTron-70B compared to 54.8 for LLaMA-2-70B, and 28.3 for MediTron-7B compared to 12.6 for LLaMA-2-7B.
For subsequent evaluation, the researchers used various testing benchmarks like MedQA, PubMedQA, etc., and calculated the accuracy of multiple-choice question-answering tasks. To compare the results, they also used different LLMs, like LLaMA-7B, LLaMA-70B, Mistral-7B-instruct, etc. The results show that MediTron-7B and MediTron-70B both outperformed their competitors on almost every dataset, showcasing their superior capabilities.
Although the model has been trained on a large set of medical data and performs well on multiple benchmarks, users should be aware of its limitations, and it should not be deployed in medical applications without additional testing. The researchers have just begun to understand the capabilities and limitations of the model and have therefore cautioned against its use in medical systems at the moment.
In conclusion, MediTron is a set of domain-specific LLMs that have been trained on a wide array of medical datasets. It has two variants, one with 7B parameters and one with 70B, and both of them were able to perform better than the other models considered for evaluation. The researchers have also mentioned that the model should not be deployed without additional training, given how critical the field is. Overall, the model is an exciting development in medicine and has the potential to solve an array of medical tasks and help medical professionals.
Check out the Paper, Model 7B, and Model 70B. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.