Google Researchers Unveil Universal Self-Consistency (USC): A New Leap in Large Language Model Capabilities for Complex Task Performance

The problem of selecting the most consistent answer from multiple candidates to enhance task performance, particularly in tasks like mathematical reasoning and code generation, has been addressed by researchers from Google through their Universal Self-Consistency (USC) method. This method utilizes LLMs and achieves comparable results to standard self-consistency without requiring identical answer formats or access to execution results.

Reranking improves language model generation by sampling outputs and applying post-hoc criteria. LLMs evaluate model-generated texts without human references. The proposed USC method performs comparable to standard self-consistency without requiring extra labeled data or an external reranking model.

LLMs excel in tasks like math reasoning and code generation. Previous approaches enhance LLM output quality by sampling and selecting based on criteria. Self-consistency is effective for jobs with unique answers but struggles with an open-ended era. USC uses LLMs to pick the most consistent response from multiple candidates. As demonstrated on diverse benchmarks, USC, eliminating answer extraction, proves effective in enhancing open-ended generation tasks.

The USC method employs LLMs to choose the most consistent answer among multiple candidates, eliminating the need for answer extraction. USC extends self-consistency to free-form generation tasks, evaluated across benchmarks such as math reasoning, code generation, summarization, and open-ended QA. The approach generates multiple samples using LLMs and selects the answer based on consistency. 

The USC method demonstrates its efficacy in open-ended generation tasks, surpassing the limitations of the original self-consistency approach. USC matches standard self-consistency in mathematical reasoning tasks with diverse answer formats, and it equals execution-based self-consistency in code generation tasks without code execution. USC consistently improves over baselines in long-context summarization tasks and achieves the highest truthfulness and informativeness scores on the TruthfulQA benchmark. USC’s performance is robust to different response orders, benefits from more samples in certain tasks, and can be further enhanced with minor task-specific adaptations.

In conclusion, the USC method has proven highly effective for free-form generation tasks, consistently outperforming baselines in long-context summarization and open-ended question-answering tasks. Its use of LLMs to select the most consistent answer from multiple candidates has shown significant improvements in various applications, including mathematical reasoning tasks and code generation tasks, without requiring similar answer formats or actual execution results. USC is a valuable tool for generating accurate and reliable responses in various contexts.


Check out the PaperAll credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.

🚀 LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]