UT Austin Researchers Propose WICE: A New Dataset for Fact Verification Built on Real Claims in Wikipedia with Fine-Grained Annotations

Natural language inference and textual entailment are enduring issues in NLP that can take many shapes. There are some significant gaps when current entailment systems are applied to this job. Using NLI “as a tool for the evaluation of domain-general methods to semantic representation” is the declared goal of the SNLI dataset. This, however, is different from how NLI is currently applied. NLI has been used to comprehend knowledge-grounded discourse, validate replies from QA systems, and assess the accuracy of produced summaries. These applications are more closely related to factual consistency or attribution: is it the case that a hypothesis is accurate given the details in a document’s premise?

First, many NLI datasets, including VitaminC and WANLI, which both concentrate on single-sentence evidence, target temporary premises. Existing frameworks for document-level entailment are based on local entailment scores, either through combining these scores or retrieval-based methods. There are a few outliers, like DocNLI, but it contains many artificially generated bad data. This draws attention to the second flaw: the scarcity of negative instances that are environmentally sound. Contradictory situations are created in a way that produces misleading correlations, such as single-word correlations, syntactic heuristics, or a disregard for the input. 

Figure 1: Task in WICE. Two phrases in a referenced source mostly corroborate this statement made in a Wikipedia page, although the claim that it was run “privately” is unsupported. In the real world, many phrases are frequently used to partially substantiate statements.

Finally, in addition to the conventional three-class “entailed,” “neutral,” and “contradicted” set, fine-grained labeling of whether elements of a claim are supported or not would be more relevant. Existing datasets do not have these fine-grained annotations. As demonstrated in Figure 1, stated in Wikipedia and the associated articles it refers to, they identify entailment, a list of supporting sentences in the referenced article, and tokens in the claim that are not supported by evidence. They demonstrate that these statements require difficult retrieval and verification issues, such as multi-sentence reasoning. Researchers from UT Austin have compiled WICE (Wikipedia Citation Entailment), a dataset for validating actual claims in Wikipedia solving the above discussed problems.

Figure 1 illustrates how the real-world Wikipedia assertions in WICE are frequently more complicated than the theories employed in most prior NLI datasets. They offer CLAIM-SPLIT, a method of dissecting hypotheses utilizing few-shot prompting with GPT-3, to assist in creating their dataset and give fine-grained annotation, as seen in Figure 2. This decomposition is similar to other frameworks built from OpenIE or Pyramid. However, it does it without the need for annotated data and with more flexibility, thanks to GPT-3. They streamline their annotation procedure and the ultimate entailment prediction work for automated models by acting at the sub-claim level.

Figure 2: CLAIM-SPLIT: The procedure for breaking down a claim into smaller claims utilising the GPT-3. (For clarity, underlined parts demonstrate alignment; the model only accepts text as input and output.)

On their dataset, they test various systems, such as short-paragraph entailment models that have already been “stretched” to create document-level entailment judgments from short-paragraph judgments. They discover that these models perform poorly at the claim level but better at the level of their sub-claims when used with their dataset, indicating that proposition-level splitting might be a helpful step in the pipeline for an attribution system. Although existing systems perform below the human level on this dataset and only sometimes return good evidence, they demonstrate that chunk-level input processing to assess these sub-claims is a solid starting point for future systems.

They provide WICE, a novel dataset for fact-checking based on actual claims in Wikipedia with fine-grained annotations, as one of their main contributions. These demonstrate how difficult it is to decide entailment with the proper facts still. They suggest CLAIM-SPLIT, a strategy for breaking down big claims into smaller, independent sub-claims at both the data collection and inference time. The dataset and code can be found on GitHub.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 15k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...