The use of real-time background replacement is becoming popular in many areas. For example, video conferencing and entertainment are two fields where this technique can be applied without the need for green screens or other props. In order to solve this challenging problem, neural models are used. But the current solutions often do not generate the best matting quality and may cause some artifacts in images, so our focus lies on improving these features.
ByteDance, the company that developed TikTok introduced a new matting method, RVM (Robust High-Resolution Video Matting), that can process 4K at 76 FPS and HD at 104 FPS. While other methods require processing each frame independently as an image, this new RVM method uses recurrent architecture to exploit temporal information in videos resulting in greater overlap with improved quality.
The research group also proposes a novel training strategy that enforces this network on matting and segmentation objectives. This significantly improves the model’s robustness without requiring any auxiliary inputs such as a trimap or pre-captured background image. The method can be widely applied to existing human matting applications.
The proposed research method, ‘a recurrent architecture for robust human video matting’, is lighter and faster while achieving new state-of-the-art. The analysis published shows that temporal information plays an important role in improving quality and consistency. A benefit of the training strategy to train models on both matting and semantic segmentation objectives simultaneously is it enforces these models to be more robust across various types of videos.