Google Researchers Propose a Perceptual Image Quality Assessment Method for Compressed Images Using Deep Learning

Image compression plays a crucial role in the multimedia domain. The increasing number of visual content on the internet is served by scaling data storage solutions. This massive scale of data storage makes compression algorithms more crucial every day, as even a fraction of improvement in the compression algorithm can save a tremendous amount of storage space at those scales. 

Despite the enormous increase in image data over the years, image compression algorithms have not improved much. We are still mainly using the JPEG format that was introduced in 1992. 

JPEG is an example of a lossy image compression algorithm. Those algorithms aim to reduce the size of the image at the cost of visual quality degradation. The lower the image size, the higher the degradation is. The image quality is usually measured using the peak-signal-to-noise ratio (PSNR) metric. However, it has been shown multiple times that PSNR cannot correlate well with human perception.

For example, JPEG-coded images can suffer from color banding or blockiness, which the PSNR cannot capture. Other objective quality metrics, such as SSIM and MS-SSIM, are proposed to tackle these problems, but they do suffer from inconsistency regarding correlation with human perception. 

Illustration of the effect of the Q factor on compressed images. Source: https://arxiv.org/pdf/2103.01114.pdf

Given the problems of the aforementioned metrics, using them to assess the compression quality can lead to problems. The encoder can minimize the perceptual quality for the majority of images while trying to reduce the data size. Therefore, the need for a reliable quality assessment approach is there.

Like almost all other problems, the image quality assessment problem has been tackled with machine learning approaches in recent years. These learning-based methods perform much better when it comes to correlation to human perception. For example, the LPIPS metric shows promising results correlating with human perception when measuring the similarity between images. 

Despite their strong correlation with human perception, the learning-based methods suffer from the common problem of other deep learning methods. Because the majority of these models are designed to handle generic quality evaluation, they may not perform well outside of their primary goal and may not reliably generalize to unknown data.

A perceptual quality metric that can be used for image compression is thus required. This metric should correspond well with human perceptual preferences and transfer well to new datasets and be robust to previously unknown data in the wild.

This is the gap that the authors from Google Research try to fill in. To achieve this goal, they first introduce a compressed image comparison dataset, named CIQA, in which each compressed image pair is labeled with a human perceptual preference. JPEG-compressed images are used in the dataset as it is still the most used compression format. 

Moreover, a deep learning-based full-reference metric is introduced that is trained on the CIQA dataset. The proposed method uses a base CNN model, a modified combination of EfficientNet, ResNet, DenseNet, and VGG16 structures. This base CNN model is used to extract features from the image. Afterward, these features are passed through 1×1 convolution layers to predict the final output score. 

DNN architecture used in the paper. Source: https://arxiv.org/pdf/2103.01114.pdf

The proposed method correlates strongly with human perception and can generalize smoothly to unseen datasets. It outperforms state-of-the-art learning-based quality assessment methods in multiple datasets. 

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'DEEP PERCEPTUAL IMAGE QUALITY ASSESSMENT FOR COMPRESSION'. All Credit For This Research Goes To Researchers on This Project. Check out the paper.
Please Don't Forget To Join Our ML Subreddit

Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled "Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning." His research interests include deep learning, computer vision, video encoding, and multimedia networking.

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft