Can LLM Already Serve as A Database Interface? Meet BIRD: A Big Bench for Large-scale Database Grounded Text-to-SQLs

Text-to-SQL parsing, which focuses on converting spoken English into SQL queries, has piqued the interest of both academics and business leaders. This interest is due to its ability to enable novice data analysts to automatically extract needed information using natural language from prevalent relational databases. Recent developments in neural modeling, notably those using large language models (LLMs), have produced outstanding results on popular benchmarks like Spider and WikiSQL. For instance, during the past three years, the execution accuracy of the top-performing model in Spider Leaderboard has improved from 53.5% to 85.3%. 

They found that modern, cutting-edge models still need help extrapolating to more complex, realistic scenarios that include noisy material and vast database volumes. In addition, it takes outside expertise and logic to unravel the secrets concealed beneath the enormous database values. Additionally, current benchmarks do not consider SQL execution performance, which is very important in real-world applications, particularly in the case of big databases. The large language model (LLM)’s strong comprehension and coding skills are utilized by the most recent SOTA parser in Spider, and this parser’s exceptional performance begs the question: Can LLM already be used as a database interface? 

These findings led them to create a new text-to-SQL benchmark that more closely resembles actual circumstances and reduces the gap between experimental and real-world conditions. Researchers from the University of Hong Kong, DAMO Academy of Alibaba Group, The Chinese University of Hong Kong (Shenzhen), Massachusetts Institute of Technology, and the University of Illinois suggest BIRD, a Big Bench for Large-Scale Database Grounded in Text-to-SQLs, in this study for use in practical applications. A total of 95 large databases totaling 33.4 GB in size and 12,751 complicated instances of information searching are contained in BIRD, which covers 37 different professional disciplines. Then gathered 80 open-source relational databases for training from legitimate analytic platforms (Kaggle, Relation. vit) and handpicked 15 more relational databases for assessment. They rely on crowdsourcing to get natural language commands and the associated SQLs given these databases. 

To assist annotators in better grasping the database contents, their database specialists first generate a description file for each database that lists all column names, shortened values, value kinds, and external knowledge. Then they employ a SQL annotation team of data engineers and database students to create SQLs to answer inquiries. At the same time, on the other side, they hire and train native speakers to ask questions about these databases. They provide a brand-new statistic called Valid Efficiency Score (VES) to measure efficiency and the usual execution correctness for created SQLs. To their knowledge, BIRD is the first text-to-SQL benchmark that considers efficiency, encouraging the use of more effective query techniques in the setting of large and noisy database contents. 

Modern text-to-SQL parsers are evaluated using two widely used methodologies: in-context learning using large language models (LLMs) like Codex (code-DaVinci-002) and ChatGPT (get-3.5-turbo) and fine-tuning with T5. Their experimental findings show that the present models need help with generalizing effectively. Particularly, on the development and test sets, the Spider SOTA model, which simply relies on the database schema, only manages execution accuracies of 25.88% and 28.95%, respectively. Compared to human performance, which they also give in this benchmark, the performance still needs to catch up. They urge more studies to address the more practical circumstances shown in this benchmark. 

Check out the Paper and Project. Don’t forget to join our 21k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

🚀 Check Out 100’s AI Tools in AI Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

✅ [Featured Tool] Check out Taipy Enterprise Edition