Over the past few decades, Deep neural network-based models have been developed to complete a broad range of tasks. Some of them are mainly designed to process and generate coherent texts in multiple languages, answer questions about a text, translate texts, and create summaries of the online content.
Several Deep learning systems are already available with linguistic capabilities, for instance, text analysis tools, in the form of applications for real-time translation, and virtual assistants such as Alexa, Bixby, Siri, Google Assistant, and Cortana. Some of the above systems use a specific deep-learning model called Multilingual BERT (mBERT). mBERT is released by Google and is trained on approximately 100 languages simultaneously.
Interaction with the mBERT system can be achieved in multiple languages. Although the mBERT model has performed well on many language tasks, it’s still poorly understood how it encodes language-related information and makes predictions.
A study to better understand mBERT was carried out by the researchers at the University of California (Irvine), Stanford University, and the University of California (Santa Barbara). Their paper offers valuable insight into these commonly used models’ underpinnings and explains how they analyze language when completing various tasks.
Understanding mBERT Model
In the mBERT model, the text is represented as a series of vectors consisting of thousands of numbers. Each vector corresponds to a word, and the relationships between the terms are encoded as geometrical relations in high-dimensional space.
Understanding how mBERT models encode language is similar to understanding how humans process it. Therefore, the team was composed of both computer scientists and linguists. The aim was to determine whether mBERT vector models contain data about the deeper aspects of human language and its structure. One can think that mBERT models and other deep-learning-based frameworks (for language analysis) may have re-discovered some of the theories coined by linguists after deeply analyzing human languages.
The model might copy some of the mistakes humans commonly make when dealing with language-related problems because the mBERT models are generally trained on human compiled datasets. The team’s research is believed to play a part in revealing some of the above-said mistakes.
The findings of the team offer exciting insights into how mBERT models represent grammatical information. The study suggests that mBERT models identify subjects and objects in sentences and describe the connection between them according to current linguistics literature.
The above findings are considered of great importance as they could help computer scientists to understand better how deep-learning techniques designed to process human language work and thus help them in the future to improve the performance. Mahowald, who leads the team, says that they hope to continue exploring how the deep neural models of language represent the linguistic categories in the continuous vector spaces.