Researchers from Mohamed bin Zayed University of AI Developed ‘PALO’: A Polyglot Large Multimodal Model for 5B People

Large Multimodal Models (LMMs), driven by AI advancements, revolutionize vision and language tasks but are mainly centered on English, neglecting non-English languages. This oversight excludes billions of speakers of languages like Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese. The lack of linguistic inclusivity underscores the need for broader representation in developing LMM to ensure effective communication across diverse global populations.

Recent advancements in LMMs and LLMs have pushed the boundaries of natural language processing. Multilingual LLMs like BLOOM and PaLM address data skewness and cross-lingual performance challenges. Meanwhile, in LMMs, models like Qwen, mPLUG-Owl, and Ziya-Visual demonstrate bilingual capabilities, focusing on English and Chinese. These developments mark significant progress in multilingual understanding and processing of visual inputs. However, these LMMs remain limited to two languages.

The researchers from Mohamed bin Zayed University of AI and other institutes introduced PALO, a multilingual LMM capable of answering questions in ten languages simultaneously. They leverage a high-quality multilingual vision-language instruction dataset to train PALO, focusing on improving proficiency in low-resource languages while maintaining or enhancing performance in high-resource languages. They compile a comprehensive multilingual instruction-tuning dataset and enhance the state-of-the-art LMMs across different scales, showcasing improved language proficiency.

PALO comprehends and generates content in ten major languages. Derived from LLaVA and MobileVLM architectures, it integrates a vision encoder with a language model, utilizing CLIP ViT-L/14 for vision encoding. Different projectors, including a lightweight downsample projector (LDP) for MobilePALO-1.7B, are employed to process visual tokens and user queries efficiently, enhancing model versatility and efficiency across varying computational settings. Vicuna is the LLM for 7/13B versions, while MobileLLaMA is the small language model (SLM) for MobilePALO-1.7B. Vicuna fine-tunes LLaMA-2 on user conversations from ShareGPT, whereas MobileLLaMA pretrains on RedPajama-v1 tokens before fine-tuning on ShareGPT data.

In evaluating PALO‘s multilingual capabilities, robust performance is observed across high-resource languages, with 7/13B models achieving average scores of 59.0 and 63.8, respectively. PALO demonstrates significant performance improvements in low-resource languages, with average scores rising from 26.0 and 26.9 to 55.6 and 59.2 points for the 7B and 13B models, respectively. PALO enhances inclusivity and performance in vision-language tasks across diverse global languages.

To sum up, the researchers from Mohamed bin Zayed University of AI, with other institutes, introduced PALO, a multilingual LMM capable of answering questions in ten languages simultaneously. PALO caters to nearly two-thirds of the global population. It adeptly bridges vision and language understanding across ten languages, encompassing high-resource (e.g., English, Chinese) and low-resource (e.g., Arabic, Hindi). By training on diverse, multilingual datasets and fine-tuning language translation tasks, PALO achieves significant performance improvements across various scales, showcasing its scalability and generalization capabilities.


Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

You may also like our FREE AI Courses….

Asjad is an intern consultant at Marktechpost. He is persuing B.Tech in mechanical engineering at the Indian Institute of Technology, Kharagpur. Asjad is a Machine learning and deep learning enthusiast who is always researching the applications of machine learning in healthcare.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...