SenseTime Research Propose Story-to-Motion: A New Artificial Intelligence Approach to Generate Human Motion and Trajectory from a Long Text

Artificial Intelligence is stepping into almost every industry. Creating natural human movement from a story has the power to completely transform the animation, video game, and film industries. One of the most difficult tasks is Story-to-Motion, which arises when characters must move through different areas and perform certain actions. Based on a thorough written description, this task requires a smooth integration between high-level motion semantic control and low-level control dealing with trajectories. 

Though much effort has been put into studying text-to-motion and character control, a proper solution has yet to be found. The existing character control approaches have many limitations as they cannot handle textual descriptions. Even the current text-to-motion approaches need more positional constraints, leading to the generation of unstable motions.

To overcome all these challenges, a team of researchers has introduced a unique approach that is highly effective at producing trajectories and generating controlled and endlessly long motions that are in line with the input text. The proposed approach has three primary components, which are as follows.

  1. Text-Driven Motion Scheduling: Modern Large Language Models take a sequence of text, position, and duration pairs from long textual descriptions and use them as text-driven motion schedulers. This stage makes sure that the motions that are generated are based on the story and also includes details about the location and length of each action.
  1. Text-Driven Motion Retrieval System: Motion matching and constraints on motion trajectories and semantics have been combined to create a comprehensive motion retrieval system. This guarantees that the generated motions satisfy the intended semantic and positional properties in addition to the textual description.
  1. Progressive Mask Transformer: A progressive mask transformer has been designed to address frequent artifacts in transition motions, like foot sliding and unusual stances. This element is essential to improving the quality of the generated motions and producing animations with smoother transitions and a more realistic appearance.

The team has shared that the approach has been tested on three different sub-tasks: motion blending, temporal action composition, and trajectory following. The evaluation has shown improved performance in every area when compared to earlier motion synthesis techniques. The researchers have summarized their primary contributions as follows.

  1. Trajectory and semantics have been introduced to generate comprehensive motion from lengthy textual descriptions, thus solving the Story-to-Motion problem.
  1. A new method called Text-based Motion Matching, which uses extensive text input to provide accurate and customizable motion synthesis, has been suggested.
  1. The approach outperforms state-of-the-art techniques in trajectory following, temporal action composition, and motion blending sub-tasks, as demonstrated by experiments conducted on benchmark datasets.

In conclusion, the system is definitely a major step forward in the synthesis of human motions from textual narratives. It provides a complete answer to the problems associated with Story-to-Motion jobs. It surely will have a  game-changing influence on the animation, gaming, and film sectors.


Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft