FLORES-101, a first-of-its-kind, many-to-many evaluation data set that covers 101 languages from around the world, is now open-sourced. FLORES-101 is a tool that allows researchers to test and refine multilingual translation models such as M2M-100 quickly. To speed work on many-to-many translation systems worldwide, Facebook AI makes the complete FLORES-101 data set, and associated technical report, and various models freely available for anybody to use.
Requirement for evaluation
For AI researchers, evaluating how effectively translation systems function has been a critical difficulty – and this knowledge gap has hindered development. Researchers can’t design better translation systems if they can’t measure or compare their outcomes. The AI research community needs an open, easily accessible method to measure the high-quality, reliable performance of many-to-many translation models and compare results with others.
Earlier work depended highly on translating in and out of English and typically used proprietary data sets. While this was beneficial to English speakers, it was and continues to be insufficient in many regions of the world where people require quick and accurate translation between regional languages, such as India, having over 20 official languages.
FLORES-101 focuses on low-resource languages, such as Amharic, Mongolian, and Urdu, lacking significant NLP (natural language processing) research data sets. It’ll be the first time that researchers would be able to accurately measure translation quality in 10,100 different translation directions, such as directly from Hindi to Thai or Swahili. In addition, this data set contains the same set of sentences across all languages, allowing researchers to evaluate the performances of any and all translation directions.
More about FLORE-101 as a Benchmark and its designing
The FLORES-101 data collection was developed in a multistep process. First, a professional translator translated each document, which was then double-checked by a human editor. It next moved on to the quality-control step, which included checks for spelling, grammar, punctuation, formatting, and comparisons with commercial engine translations. A different set of translators then did a human evaluation, detecting mistakes in various categories such as unnatural translation, register, and grammar. Finally, the translations were either sent back for retranslation based on the number and severity of the problems found or were deemed complete if they fulfilled quality criteria.
However, translation quality is insufficient on its own. When building FLORES as a beneficial resource for the AI community, researchers took into account a number of other factors:
- Covering low-resource languages (with 80%of the languages involved being low-resource)
- Many-to-many (languages in 10,100 different translation directions)
- Multidomain: FLORES gathers information from a range of sources, including news, travel guides, and literature on many topics.
- Document-level: FLORES is designed to translate adjacent sentences from a single document, allowing models to assess whether document context enhances translation quality.
- Metadata: FLORES provides full metadata along with each translation, including information such as hyperlinks, URLs, images, and the article topic.
- Server-side evaluation: Dynabench platform has been used to freely host evaluations for the FLORES benchmark. Using a server to evaluate models ensures that they are measured in the same way on a hidden test set, allowing for scientific comparisons. Furthermore, a wide range of metrics can be computed on the same set of translations, allowing for a comprehensive evaluation of translation quality.
FLORES at the ‘Workshop on Machine Translation’
Facebook AI collaborated with the WMT (Workshop on Machine Translation) to host the Large-Scale Multilingual Translation shared task. The evaluation of the tasks involved will be based on the FLORES data set. As part of the Large-Scale Multilingual Task at the WMT, Facebook AI partnered with Microsoft Azure to provide ‘compute grants’ to researchers working on low-resource languages. These grants offer thousands of GPU hours on the Azure platform for researchers to develop translation models.
These grant recipients come from universities worldwide, with a majority focusing on developing translation systems for the languages they speak themselves and for other regional languages. Several are focusing on African and Southeast Asian languages — areas of the world where many different languages are spoken and where improvements in machine translation could greatly impact communities.
Now that the data set is available, researchers and developers need to use this in various useful ways.