Researchers At The University of Tokyo Present A Frequency-Based Inpainting Method To Generate Missing Image Portions

Researchers from the University of Tokyo introduce a frequency-based inpainting method that can use both frequency and spatial information to generate missing image portions. 

Image inpainting is a computer vision (CV) technique that fills in the missing pixels in an image. It allows the removal of unwanted objects from a photo or recreates missing regions of occluded images. Inpainting is popularly used to predict missing image data. However, it is difficult to synthesize the missing pixels realistically and coherently.

Existing techniques use only spatial domain information during the learning process, leading to the loss of interior reconstruction details. This results in the estimation of only a low-frequency part of the original patch. Researchers address this issue by employing frequency-based image inpainting. The team demonstrates that deconvolution in the frequency domain allows predicting the image structure’s missing regions using context from the image.

Using frequency-domain representation improves the network performance on image understanding tasks. Therefore, the team aims to improve the image inpainting performance by training the networks using both frequency and spatial domain information.

Image inpainting algorithms are broadly classified into two categories as follows:

  1. Diffusion-based image inpainting algorithms:  These algorithms attempt to replicate the image’s appearance into the missing regions. This method can adequately fill small holes well, but the results erode as the holes’ size increases.
  2. Patch-based inpainting algorithms:   Algorithms falling under this category search for the best-fitting patch in the image to fill missing portions. This method can be used to fill larger holes; however, it is ineffective for complex or distinctive parts of an image.
Figure 1: Overview of our frequency-domain-based image inpainting framework


The researchers used spectrum of the images (frequency domain representation) obtained by fast Fourier transform at the first stage to explain the context to the model. Then it reconstructed the high-frequency parts. This yields a rough inpainting result capturing the image’s structural elements and a GAN network refining the pixel domain. 

The second stage used spatial domain information to guide the color scheme of the image and then enhanced the details and structures obtained in the first stage. 

Figure 2: (a) Input images with missing regions; (b) DFT of first-stage reconstruction by the deconvolution network; (c) image inpainting results; and (d) GT image. The last column shows the prediction of the missing region obtained from our method and original pixel values for the same region in the GT image.


The experiment results show that this method results in better inpainting outcomes. The method outperforms other SOTA techniques on challenging datasets by generating sharper details and perceptually realistic inpainting results. The use of both frequency and spatial information gains dominance because of their superior performance.

The team hopes that the research will extend the use of other frequency domain transformations in solving image restoration tasks such as image denoising.



🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...