AI Researchers Unveil The First Deep Videos Text-Replacement Method, ‘STRIVE’


A Team of researchers are working together to solve the problem of realistically altering scene text in videos. Their main application behind this research is to create personalized content for marketing and promotional purposes. For example, replace a word on a store sign with a personalized name or message, as shown in the picture below.

Technically, several attempts have been made to automate text replacement in still images based on principles of deep style transfer. The research group is including this progress and their research to tackle the problem of text replacement in videos. Videotext replacement is not an easy task. It must meet the challenges faced in still images while also accounting for time and effects such as lighting changes, blur caused by camera motion or object movement.

One approach to solve video-test replacement could be to train an image-based text style transfer module on individual frames while incorporating temporal consistency constraints in the network loss. But with this approach, the network performing text style transfer will be additionally burdened with handling geometric and motion-induced effects encountered in the video.

Therefore, the research group took a very different approach. First, they extract text regions of interest (ROI) and train a Spatio-temporal transformer network (STTN) to Frontalize the ROIs so that they will be temporally consistent. Next, they scan the video and select a reference frame with high text quality which was measured in terms of text sharpness, size, and geometry.

The research team performed still image text replacement on the given frame using SRNet, a state-of-art method trained on video frames. Next, the new text is transferred onto other frames with a novel module called TPM (text propagation module) that considers changes in lighting and blur effects. As input, TPM takes the reference and current frame from the original video. It concludes with an image transformation between the pair with application towards the altered reference frame generated by SRNet. The important part is that the TPM takes the temporal consistency of an image into account when learning pairwise transformations.

The researchers named their framework associated with the above research approach as STRIVE (Scene Text Replacement In VidEos), and it is shown in the picture below.


Using the proposed approach, the researchers were able to show results on synthetic and challenging real videos with realistic text transfer, competitive quantitative and qualitative performance, and superior inference speed relative to alternatives. They also introduce new synthetic and real-world datasets with paired text objects. According to the research group, this may be the first attempt at deep video text replacement.



Asif Razzaq
Asif Razzaq is an AI Journalist and Cofounder of Marktechpost, LLC. He is a visionary, entrepreneur and engineer who aspires to use the power of Artificial Intelligence for good. Asif's latest venture is the development of an Artificial Intelligence Media Platform (Marktechpost) that will revolutionize how people can find relevant news related to Artificial Intelligence, Data Science and Machine Learning. Asif was featured by Onalytica in it’s ‘Who’s Who in AI? (Influential Voices & Brands)’ as one of the 'Influential Journalists in AI' ( His interview was also featured by Onalytica (

Share post:


More like this

Recent Research on Manifolds in Commonly Used Atomic Fingerprints and Failure to Machine Learning Four-Body Interactions

Atomic fingerprints are often employed in machine learning situations...

A Neural Network for Solving and Generating University Level Mathematics Problems Using Program Synthesis

The AI research community widely believed that modern deep...

Increased Data Security Using ‘EzPC’ In The Machine Learning Model Validation Process

Artificial intelligence (AI) has revolutionized various industries in the...
Join the AI conversation and receive daily AI updates