This Open-Source AI Tool Called ‘Lanfrica’, Allows Researchers To Look Into Any Of Africa’s Existing And Extinct Languages

This Article Is Based On  'Lanfrica v1 has been launched!'. All Credit For This Research Goes To The Researchers Of This Paper 👏👏👏

Please Don't Forget To Join Our ML Subreddit

There is a general paucity of knowledge owing to underrepresented African languages from the perspective of scholars, professionals, and public interest consumers outside of the African continent. Lanfrica has arrived to help fix this situation.

What exactly is Lanfrica?

It’s an online library of African language materials geared for linguists and natural language processing (NLP) experts launched after months of work. Lanfrica hopes to alleviate the challenge of finding African language materials by building a consolidated, language-first catalog.

The makers of Lanfrica announced the platform’s formal debut on February 15, emphasizing that it supports 2,199 different African languages (including several languages that are no longer actively spoken by native speakers). The platform mainly contains natural language processing, machine translation, and speech recognition information. Lanfrica, on the other hand, includes both educational and entertaining materials.

For example, suppose you seek linguistic datasets or research articles in a specific African language. In that case, Lanfrica can direct you to websites that provide resources in that language. If such materials aren’t available, we’ll take a collaborative approach and ask you to provide articles or datasets.

Researchers at Lanfrica take a language-focused strategy. Their language section boasts of all the African languages, even the extinct ones—with 2,199 languages counted. They’ve developed algorithms that can tell the African language(s) involved in a resource with high accuracy, allowing us to curate even works that don’t clearly state which African languages they worked on (and there are many).

Lanfrica has a lot of promise for making African languages more discoverable and represented on the web. Lanfrica can offer data on the development of African languages. The language filter section, for example, provides a quick summary of the quantity of natural language processing (NLP) resources available for each African language.

Lanfrica then makes it easier for people to find materials that might otherwise be difficult. Lanfrica’s developers employed artificial intelligence to reliably detect several African languages to build materials that do not specify the language used.

From this search result, it’s clear that Afrikaans contains 28 NLP resources, but Swati has just eight. For instance, the Gbe cluster languages of Benin have significantly fewer NLP resources than some of the languages of South Africa.

Such knowledge might lead to better resource allocation and attempts to advance under-researched languages in NLP, promoting equitable advancement for African languages.

Lanfrica v1 is only the start. Researchers will be releasing several essential upgrades shortly:

  • According to researchers, users will be able to join up and contribute to or alter the materials on Lanfrica.
  • NLP datasets make up the majority of their present resources.
  • They intend to focus on computational linguistics and linguistic publications after that. All of the resources that will be included are listed in the infographic above.
  • They’re looking into several methods for identifying and connecting relevant resources to make Lanfrica more accessible.

Users are also invited to offer their own resources – some of the languages featured on the platform presently have no or only a few resources. Users can contribute articles, datasets, and other resources to the platform’s development.

Many African languages are considered low-resource languages. Linguists and NLP specialists have undertaken and published less study on them than languages like English or Mandarin Chinese, regardless of how widely spoken they are. As a result, languages like Swahili and Amharic (both of which are spoken by millions of people) fall behind other languages in technological advancements like machine translation (MT) and speech-to-text software.

Conclusion:

Lanfrica intends to lessen the difficulty of locating African language materials by creating a unified, language-first catalog. It offers a lot of potential for improving the discoverability and representation of African languages on the internet. Lanfrica can provide information on the evolution of African languages. It supports 2,199 African languages (including several languages that are no longer actively spoken by native speakers). Their creators utilized artificial intelligence to accurately recognize many African languages to create things that aren’t language-specific. To make Lanfrica more accessible, they’re looking into numerous approaches for discovering and linking relevant resources.

References:

  • https://opensource.com/article/22/4/open-source-language-tool-lanfrica
  • https://blog.lanfrica.com/lanfrica-v1-has-been-launched/