Salesforce AI Research Introduces RnG-KBQA: A Novel Framework That Solves Generalization Challenges In Question Answering Over Knowledge Bases

This Article is written as a summay by Marktechpost Staff based on the research paper 'RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering'. All Credit For This Research Goes To The Researchers of This Project. Check out the paper, github, blog post.

Please Don't Forget To Join Our ML Subreddit

KBQA stands for Question Answering over Knowledge Bases and is a user-friendly technique to communicate with massive knowledge bases. A typical knowledge base (KB) is a collection of nodes (where various names, titles, or other things are recorded) linked together by links (the relationships among those nodes).

The KBQA method begins by matching a portion of the Question to a node in the KB, then searches the KB’s knowledge graph (the stored network of nodes and links) until the answer is discovered. Because of their generalization capacity, KBQA systems can still answer questions accurately when users ask questions about topics that have never been observed in a KBQA system’s training data.

In the KBQA area, generalization remains a considerable difficulty. According to the team, the current solutions suffer from the following limitations:

  • Generational techniques are ineffective in dealing with real-world generalization scenarios. This is because it’s tough to generate KB schema elements that haven’t been encountered during training.
  • On the GrailQA benchmark, the widely adopted ranking-based techniques have shown coverage issues. Coverage refers to a KBQA system’s ability to answer (or cover) the biggest feasible set of probable questions. Yet, due to their design or the scale (large size) of a KB, certain KBQA systems may be limited in their coverage. Due to the KB’s scale, it is often impractical to exhaust all the rules to cover the desired logical form of an answer, which is one of the key reasons for this coverage limitation for ranking-based techniques.

New research by the Salesforce team developed RnG-KBQA, a novel framework targeted at generalization challenges in Question Answering over Knowledge Bases. This work allows researchers to address the limits of current KBQA systems and investigate strategies for improving the process. The team adopted a unique way of combining a ranker with a generator (thus, RnG), which addresses the coverage issue in ranking-only systems while retaining their generalization strength. This makes RnG-KBQA capable of answering questions on a wider range of topics than powerful prior techniques.

To come up with the best answers to an asked question, the approach follows three basic steps:

  1. Enumerate Candidates: scan the knowledge graph of the KB for a pool of candidate logical forms.
  2. Rank: From the pool of candidates, the ranker selects a group of related logical forms. The chosen logical forms do not have to cover the correct one exactly, but they must be semantically consistent and matched with the Question’s underlying meanings.
  3. Generate: Using the query and the top-k-rated candidates, the generator creates the final logical form.

The researchers employed a contrastive ranker. It maximizes the Question and the ground truth logical form to rate each logical form candidate. They employed a BERT-based encoder, which takes the concatenation of the Question and the logical form as input and outputs a score indicating how close they are. Furthermore, the generator is a seq-to-seq model based on T5 that consumes the ranker’s output and makes the final prediction.


According to the team, the model’s generalization ability is based on two fundamental elements:

  1. Enumeration step that is model agnostic: All relevant KB connections anchored at the connected entity are retrieved.
  2. Built on pre-trained language models (LMs), there is a strong interplay between the ranker and the generator: The items in the Knowledge Base are canonicalized into plain language forms.

The researchers explain that their approach makes the following generalizations to answer the questions:

  1. Compositional generalization: This aims to test the models’ meta-ability to rank and build unseen compositions of facts in the knowledge base needed to answer novel queries successfully.
  2. Generalization with zero-shot data: This is to entail working with new relations that the model may have in its KB but was not visible during training.

On GrailQA, the researchers analyze their approach to judging generalization capability. Overall, the method achieves new performance levels and is ranked first on the GrailQA ranking. Furthermore, RnG-KBQA works well at all three levels of generalization, with a great performance in the zero-shot setting. They put the approach to the test on WebQSP, a well-known KBQA benchmark. In terms of absolute improvement, RnG-KBQA outperforms the previous state-of-the-art (QGG) by 1.6 percent.

The researchers hope that their approach will pave the path for future research in other tasks involving generation models and generalization.