A recent research paper published by InterDigital AI Lab introduces CompressAI. CompressAI is a platform that provides custom operations, layers, models, and tools to research, develop, and evaluate end-to-end image and video compression codecs. It uses pre-trained models and evaluation tools to compare learned methods with traditional codecs. Various models have been trained on learned end-to-end compression from scratch and re-implemented in PyTorch. Artificial Neural Network (ANN) based codecs have shown remarkable outcomes for compressing images. This framework currently implements models only for still-picture compression; however, it is believed to soon extend over to the video compression domain.
Conventional image compression methods versus ANN-based codecs
Conventional lossy image compression methods like JPEG, JPEG2000, HEVC, or AV1 separate images into blocks of pixels and decorate spatial frequencies with linear transforms using the transform domain. Based on adjacent values, it then makes some predictions and quantizes the modified coefficients. Subsequently, it encodes the quantized values and predicted side-information into a bit-stream with an efficient entropy coder.
ANN-based codecs depend on learned analysis and synthesis of non-linear transforms. It uses analysis transform to map pixel values to a latent representation. The latent is then quantized and entropy coded. Likewise, the decoder includes an estimated inverse transform that converts the latent model back to the pixel domain. ANN-based codecs outperform the conventional methods by learning complex non-linear transforms based on convolutional neural networks (CNN).
The training’s principal aim is to reduce the bit-stream’s estimated length while keeping the reconstructed image distortion low compared to the original content. Perceptual or objective metrics like the MSE (mean squared error) can measure the distortion.
Learning shared probability models between the encoder and decoder is required to minimize the bit-stream size. Previous lessons of using relaxation methods are also necessary to approximate the nondifferentiable quantization of the latent values. The entire encoding-decoding pipeline can be trained end-to-end with any differentiable distortion metrics, which is particularly intriguing for perceptual metrics or machine-tasks related metrics like image segmentation or classification at very low-bitrates.
PyTorch and TensorFlow frameworks rule the current deep learning ecosystem. In the past few years, PyTorch has seen significant growth in educational and industrial analysis groups. However, PyTorch does not direct with any custom controls required for compression; therefore, constructing end-to-end architectures for image and video compression from the beginning involves a lot of re-implementation effort in PyTorch. The TensorFlow framework has an established library for learned data compression, whereas the present PyTorch ecosystem does not have these required components.
CompressAI implements networks for still picture coding. It provides pre-trained weights and instruments to compare SOTA models with traditional image codecs. It re-generates conclusions from the literature and allows researchers and developers to train and evaluate their neural network-based codec.
The main objective of CompressAI is to implement the typical operations required to develop deep neural network designs for data compression in PyTorch and provide evaluation means to compare learned techniques with traditional codecs. CompressAI re-implements models from SOTA on learned image compression. Pre-trained weights learned utilizing the Vimeo-90K training dataset are included for multiple bit-rate points and quality metrics, delivering similar performances to stated estimates in the original papers. CompressAI makes an entire research pipeline, from training to performance evaluation, contrary to different learned and conventional codecs possible.
What does the future hold?
There are significant signs of progress in research related to neural networks that target video compression. Compressing videos is a tedious task as it requires reducing temporal redundancies to estimate motion information involving more extensive systems and multiple-stages training pipelines. ANN-based codecs for image/video compression have presented encouraging results, and researchers believe it to improve further as better entropy models in the near future. More research and experiments are needed to increase learned codecs’ performances; however, it lacks tools that facilitate the study as this domain is relatively new. The CompressAI platform aims to improve this situation.
Future releases of CompressAI will include additional models from the literature on learned image compression and a crucial extension that would support video compression. The future framework will consist of end-to-end networks with compressible motion information modules and evaluate low-delay and random-access video coding using traditional codecs.