Google AI Introduces ToTTo: A Controlled Table-to-Text Generation Dataset Using Novel Annotation Process


The rising field of natural-language generation

Research in natural language generation (NLG), a subset of artificial intelligence, is rising. NLG is a software process that changes structured data into natural language. Not to be confused with natural language processing (NLP), NLG synthesizes and writes new content, whereas NLP reads and derives analytic insights from content (Gartner).  

  • Natural language generation (NLG) creates (or generates) text. It is when computers write language, turning structured data into text.
  • Natural language processing (NLP) reads (or processes) text. It is when computers read language and derive insights.

Many businesses today generate large amounts of data, and they value how that data can be used to improve relationships with their customers. However, the value of data depends on how it is gathered, interpreted, and analyzed.

The most successful NLG applications in the commercial sector have been data-to-text systems that generate textual summaries of databases and data sets. By generating summaries and insights in text form, NLG enables data storytelling in plain language, making the data accessible for everyone in the organization.

Introduction to ToTTo:

Research scientists at Google, Ankur Parikh and Xuezhi Wang, recently released an open domain table-to-text generation dataset called ToTTo. ToTTo (shorthand for “Table-To-Text”) can convert table data into text and consists of over 120,000 training examples. What makes the ToTTo dataset unique is that it uses a novel annotation process (via sentence revision) to create more linguistically varied text. Designing an annotation process to extract natural and clean target sentences from tabular data is a compelling challenge.

Challenges in NLG

Existing, large scale, structured datasets in NLG often generate noisy and factually incorrect text. Inaccurate data that is not faithful to the source is known as hallucination. Many datasets, like Wikibio and RotoWire, are structured so that it is difficult to determine whether hallucination is caused by data noise or model shortcomings. Creating text that is understandable but not accurate makes it unreliable for many applications and real-world settings.

The image above is an example from the Wikibio dataset. The neural baseline model, in this instance, produced a sentence that was grammatically correct but not factually accurate. It falsely summarizes that Constant Vanden Stock was an American figure skater when, in truth, he was a Belgian football player.

How ToTTo Works: Table-To-Text: Controlled Generation Task

The purpose of ToTTo’s controlled generation task is to “produce a single sentence description that summarizes the cell contents in the context of a table.” The controlled generation task uses a set of selected cells in a given Wikipedia table as the source material. In the illustration below, using data from the source table and selected cells, a one-sentence description (known as the “target sentence”) is generated.

  • Target sentence: “Craig finished his eleven NFL seasons with 8,189 rushing yards and 566 receptions for 4,911 receiving yards.”

The controlled generation task challenges include numerical reasoning, a large open-domain vocabulary, and varied table structure

 How ToTTo Works: Annotation Process

To address some common text generation challenges, ToTTo uses a novel data annotation strategy. In this strategy, the annotators revise existing Wikipedia sentences in stages. To do this, each data table is paired with a summary sentence. The summary sentence is generated from the heuristics around the current page and looks at factors like: 

  • (a) word overlap between the page text and the table  
  • (b) hyperlinks that are referencing tabular data

In the illustration below, you can see the source table and the selected cells. The original sentence was modified in multiple phases, starting from the original phase and going through phases after deletion, after decontextualization, and after grammar.

This annotation strategy results in target sentences that are clean, natural, and contain various linguistic properties.


Some findings around the ToTTo Dataset include:

  • Manual analysis of the ToTTo dataset includes a percentage breakdown regarding different linguistic phenomena, such as page title, section title, and table description.
  • Annotators were able to obtain high agreement (a 0.856) on Fleiss Kappa for cell highlighting. 
  • Annotators were able to obtain 67.0 BLEU for the final target sentence.
  • BLEU and PARENT of BERT-to-BERT model perform best.

ToTTo Model Errors & Challenges

The ToTTo dataset has errors and challenges and has by no means been perfected. The detail below are challenges found in ToTTo as well as other NLG datasets:

  • Out-of-domain generalization challenge: All models achieve considerably lower performance on the challenge set, indicating the challenge of out-of-domain generalization.
  • Hallucination, numerical reasoning, and rare topics: model prediction is not always accurate, even when using cleaned references (errors in red). And even when the model output is correct, it is sometimes not as informative as the original reference (shown in blue).


The ToTTo dataset has relatively accurate annotations compared to existing models and is a potential benchmark for future high precision text generation research. ToTTo also has the potential to be a useful dataset for assessing model hallucination, developing evaluation metrics that can better detect model improvements, and may be helpful for other tasks such as table understanding and sentence revision.  




Additional NLG datasets:



Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.