Assessing the Linguistic Mastery of Artificial Intelligence: A Deep Dive into ChatGPT’s Morphological Skills Across Languages

Researchers rigorously examine ChatGPT’s morphological abilities across four languages (English, German, Tamil, and Turkish). ChatGPT falls short compared to specialized systems, especially in English. The analysis underscores ChatGPT’s limitations in morphological skills, challenging assertions of human-like language proficiency.

Recent investigations into large language models (LLMs) have predominantly focused on syntax and semantics, overlooking morphology. The existing LLM literature must often pay more attention to the full range of linguistic phenomena. While past studies have explored the English past tense, a comprehensive analysis of morphological abilities in LLMs is needed. The method employs the Wug test to assess ChatGPT’s morphological skills in the four mentioned languages. Findings challenge claims of human-like language proficiency in ChatGPT, indicating its limitations compared to specialized systems.

✅ [Featured Article] LLMWare.ai Selected for 2024 GitHub Accelerator: Enabling the Next Wave of Innovation in Enterprise RAG with Small Specialized Language Models

While recent large language models like GPT-4, LLaMA, and PaLM have shown promise in linguistic abilities, there’s been a notable gap in assessing their morphological capabilities – the skill to generate words systematically. Previous studies have predominantly focused on syntax and semantics, overlooking morphology. The approach addresses the deficiency by systematically analyzing ChatGPT’s morphological skills using the wug test across four mentioned languages and comparing its performance with specialized systems. 

The proposed method assesses ChatGPT’s morphological abilities through the Wug test, comparing its outputs with supervised baselines and human annotations using accuracy as the metric. Unique datasets of nonce words are created to ensure no prior exposure to ChatGPT. Three prompting styles, zero-shot, one-shot, and few-shot, are used, with multiple runs for each style. The evaluation accounts for inter-speaker morphological variation and spans four languages: English, German, Tamil, and Turkish while comparing results with purpose-built systems for performance assessment.

The study revealed that ChatGPT needs more purpose-built systems with morphological capabilities, particularly in English. Performance varied across languages, with German achieving near-human performance levels. The value of k (number of top-ranked responses considered) had an impact, widening the gap between baselines and ChatGPT as k increased. ChatGPT tended to generate implausible inflexions, potentially influenced by a bias towards real words. The findings stress the necessity for more research into large language models’ morphological abilities and caution against hasty claims of human-like language skills.

The study rigorously analyzed ChatGPT’s morphological capabilities in four stated languages, revealing its underperformance, notably in English. It underscores the need for further research into large language models’ morphological abilities and warns against premature claims of human-like language skills. ChatGPT exhibited varying performance across languages, with German reaching human-level performance. The study also noted ChatGPT’s real-world bias, emphasizing the importance of considering morphology in language model evaluations, given its fundamental role in human language.

The study employed a single model (gpt-3.5-turbo-0613), limiting generalizability to other GPT-3 versions or GPT-4 and beyond. Focusing on a small language set raises questions about result generalizability to different languages and datasets. Comparing languages is challenging due to uncontrolled variables. Limited annotators and low inter-annotator agreements for Tamil may impact reliability. Variable ChatGPT performance across languages suggests potential generalizability limitations.


Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...