Explore The Power Of Dynamic Images With Text2Cinemagraph: A Novel AI Tool For Cinemagraphs Generation From Text Prompts

If you are new to the terminology, you may be wondering what cinemagraphs are, but I can assure you that you have probably already stumbled upon them. Cinemagraphs are visually captivating illustrations where specific elements repeat continuous movements while the rest of the scene remains still. They are not images, but we cannot categorize them as videos. They provide a unique way to showcase dynamic scenes while capturing a particular moment.

Over time, cinemagraphs have gained popularity as short videos and animated GIFs on social media platforms and photo-sharing websites. They are also commonly found in online newspapers, commercial websites, and virtual meetings. However, creating a cinemagraph is a highly challenging task, as it involves capturing videos or images using a camera and utilizing semi-automated techniques to generate seamless looping videos. This process often demands significant user involvement, including capturing suitable footage, stabilizing video frames, selecting animated and static regions, and specifying motion directions.

In the study proposed in this article, a new research problem is explored, namely, the synthesis of text-based cinemagraphs, to reduce reliance on data capture and laborious manual efforts significantly. The method presented in this work captures motion effects such as “water falling” and “flowing river” (illustrated in the introductory figure), which are difficult to express through still photographs and existing text-to-image techniques. One crucial aspect is that this approach expands the range of styles and compositions achievable in cinemagraphs, enabling content creators to specify diverse artistic styles and describe imaginative visual elements. The method showcased in this research has the ability to generate both realistic cinemagraphs and scenes that are creative or otherworldly.

The current methods face significant challenges in addressing this novel task. One approach is to employ a text-to-image model for generating an artistic image and subsequently animating it. However, existing animation methods that operate on single images struggle to generate meaningful motions for artistic inputs, primarily due to being trained on real video datasets. Constructing a large-scale dataset of artistic looping videos is impractical due to the complexity of producing individual cinemagraphs and the diverse artistic styles involved.

Alternatively, text-based video models can be utilized to generate videos directly. Nonetheless, these methods often introduce noticeable temporal flickering artifacts in static regions and fail to produce the desired semi-periodic motions.

An algorithm termed Text2Cinemagraph based on twin image synthesis is proposed to bridge the gap between artistic images and animation models designed for real videos. The overview of this technique is presented in the image below. 

https://arxiv.org/abs/2307.03190

The method generates two images from a text prompt the user provides – one artistic and one realistic – that share the same semantic layout. The artistic image represents the desired style and appearance of the final output, while the realistic image serves as an input that current motion prediction models more easily process. Once the motion is predicted for the realistic image, this information can be transferred to its artistic counterpart, enabling the synthesis of the final cinemagraph.

Although the realistic image is not displayed as the ultimate output, it plays a crucial role as an intermediary layer that resembles the semantic layout of the artistic image while being compatible with existing models. To enhance motion prediction, additional information from text prompts and semantic segmentation of the realistic image is leveraged.

The results are reported below.

https://arxiv.org/abs/2307.03190

This was the summary of Text2Cinemagraph, a novel AI technique to automate the generation of realistic cinemagraphs. If you are interested and want to learn more about this work, you can find further information by clicking on the links below.


Check out the Paper, Github and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 26k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

🚀 Check Out 800+ AI Tools in AI Tools Club

Daniele Lorenzi received his M.Sc. in ICT for Internet and Multimedia Engineering in 2021 from the University of Padua, Italy. He is a Ph.D. candidate at the Institute of Information Technology (ITEC) at the Alpen-Adria-Universität (AAU) Klagenfurt. He is currently working in the Christian Doppler Laboratory ATHENA and his research interests include adaptive video streaming, immersive media, machine learning, and QoS/QoE evaluation.

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft