Vectara Launches Groundbreaking Open-Source Model to Benchmark and Tackle ‘Hallucinations’ in AI-Language Models

In an unprecedented move fostering accountability in the rapidly evolving Generative AI (GenAI) space, Vectara has released an open-source Hallucination Evaluation Model, marking a significant step towards standardizing the measurement of factual accuracy in Large Language Models (LLMs). This initiative establishes a commercial and open-source resource for gauging the degree of ‘hallucination’ or the divergence from verifiable facts by LLMs, coupled with a dynamic and publicly available leaderboard.

The release aims to bolster transparency and provide an objective method to quantify the risks of hallucinations in leading GenAI tools, an essential measure for promoting responsible AI, mitigating misinformation, and underpinning effective regulation. The Hallucination Evaluation Model is set to be a pivotal tool in assessing the extent to which LLMs remain grounded in facts when generating content based on provided reference material.

Vectara’s Hallucination Evaluation Model, now accessible on Hugging Face under an Apache 2.0 License, offers a clear window into the factual integrity of LLMs. Prior to this, claims of LLM vendors about their models’ resistance to hallucinations remained largely unverifiable. Vectara’s model utilizes the latest advancements in hallucination research to objectively evaluate LLM summaries.

Accompanying the release is a Leaderboard, akin to a FICO score for GenAI accuracy, maintained by Vectara’s team in concert with the open-source community. It ranks LLMs based on their performance in a standardized set of prompts, providing businesses and developers with valuable insights for informed decision-making.

The Leaderboard results indicate that OpenAI’s models currently lead in performance, followed closely by the Llama 2 models, with Cohere and Anthropic also showing strong results. Google’s Palm models, however, have scored lower, reflecting the continuous evolution and competition in the field.

While not a solution to hallucinations, Vectara’s model is a decisive tool for safer, more accurate GenAI adoption. Its introduction comes at a critical time, with heightened attention on misinformation risks in the approach to significant events like the U.S. presidential election.

The Hallucination Evaluation Model and Leaderboard are poised to be instrumental in fostering a data-driven approach to GenAI regulation, offering a standardized benchmark long-awaited by industry and regulatory bodies alike.

Check out the Model and Leaderboard Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

 | Website

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...