Meet Project Rumi: Multimodal Paralinguistic Prompting for Large Language Models

In the digital era of emerging technologies, LLMs have emerged as a powerful tool revolutionizing many aspects of human society and culture, reshaping how we interact with computers. Yet, there is a pivotal challenge that needs to be solved. The limitations of  LLMs are evident, revealing a gap in the inability to grasp the contexts and nuances of a conversation and depend on the quality and specificity of the prompt. One major limitation is they lack the depth of real communication, missing all the paralinguistic information.

Project Rumi from Microsoft aims to enhance the capabilities of LLMs by addressing limitations in understanding nonverbal cues and contextual nuances. It incorporates paralinguistic input into prompt-based interactions with LLMs to improve the quality of communication. The researchers have used audio and video models to detect real-time non-verbal cues from data streams. Two separate models are used for paralinguistic information from the user’s audio, the first prosody tone and inflection of audio and the other from the semantics of the speech. They have used vision transformers for encoding the frames and identifying facial expressions from video. A downstream service incorporates the paralinguistic information into the text-based prompt. This multimodal approach aims to enhance user sentiment and intent understanding, thus elevating human-AI interaction to a new level.

In this research, researchers have only briefly explored the role that paralinguistic provides in communicating critical information about user’s intentions. In the future, they plan to model to make the model better and more efficient. They also want to add more details like  HRV (heart rate variability) derived from standard video and cognitive and ambient sensing. This is all part of a bigger effort to add unspoken meaning and intention in the next wave of interactions with AI.

Check out the Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Astha Kumari is a consulting intern at MarktechPost. She is currently pursuing Dual degree course in the department of chemical engineering from Indian Institute of Technology(IIT), Kharagpur. She is a machine learning and artificial intelligence enthusiast. She is keen in exploring their real life applications in various fields.

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft