SUPERNOVA: A Deep Learning Based Image/Video Quality Enhancement Platform

Both demand and accessibility for image and video media services is increasing day-by-day. Several IPTV/OTT based media services are becoming available through the internet. Media content quality is an essential topic of concern as there is still a lot of low-quality media content that needs to be enhanced. 

Degradation of image/ video content is mainly due to the quantization during the lengthy coding process. This degradation becomes significantly worse as customers are located where the transmission bandwidth becomes narrower because the bitrate for the encoded media contents’ bitstream becomes lower in this environment. Another degradation case is when the spatial resolution for the delivered image/video is too small for customers to watch with their FHD or 4-K display. When this resolution degradation occurs due to instantaneous bandwidth constraints, the image/video will soon regain its original resolution. Still, the resolution degradation continues if the whole content in a CDN (Contents Delivery Network) or H/E Server are only stored with low resolution or low bitrate. 

From the mid-2010s, deep learning-based methods have been applied to computer vision and media processing areas for quality enhancement, requiring massive GPU computing powers. It is expected that the complexity of the deep learning network would increase because the GPU cost is gradually decreasing. A paper published via IBC presents a solution called ‘SUPERNOVA’, a platform that uses deep-learning-based media processing methods to enhance the visual quality of media content. 

https://www.ibc.org/download?ac=14555

 There are three platform methods for up-scaling, HFR, and retargeting, respectively. 

  • Up-scaling: For SUPERNOVA, they first introduced a pre-processing to prepare the training dataset efficiently and then proposed a novel deep neural network for better performance.
  • HFR:  Then, they proposed a novel structure for deep neural networks and showed its verification with PSNR results compared to other previous methods. 
  • Re-targeting: At last, they extracted a saliency gray-scale map with the proposed scheme. This map was useful to generate image pixels for black areas of the original image/video.

After going through all these steps, the image/video quality significantly increases. In the future, they aim to implement more functions in the current images or videos to collect shreds of evidence, etc.

Paper: https://www.ibc.org/download?ac=14555

Consultant Intern: He is Currently pursuing his Third year of B.Tech in Mechanical field from Indian Institute of Technology(IIT), Goa. He is motivated by his vision to bring remarkable changes in the society by his knowledge and experience. Being a ML enthusiast with keen interest in Robotics, he tries to be up to date with the latest advancements in Artificial Intelligence and deep learning.

↗ Step by Step Tutorial on 'How to Build LLM Apps that can See Hear Speak'