Israeli Researchers Unveil DeepSIM, a Neural Generative Model for Conditional Image Manipulation Based on a Single Image


In recent years, deep neural networks have been proven effective at performing image manipulation tasks for which large training datasets are available such as, mapping facial landmarks to facial images. When dealing with a unique image, finding suitable training data that includes many samples of the same input-output pairing is often difficult. In some cases, when you use a large dataset to create your model, it may lead to unwanted outputs that do not preserve the specific characteristics of what was desired. 

Generative models like the ones used in neural networks can be trained to generate new images based on just one input. This exciting research direction holds the potential for these techniques to extend beyond basic image manipulation methods and create more unique art styles or designs with endless possibilities. Researchers at The Hebrew University of Jerusalem have developed a new method, called ‘DeepSIM,’ for training deep conditional generative models from just one image pair. The DeepSIM method is an incredibly powerful tool that can solve various image manipulation tasks, including shape warping, object rearrangement, and removal of objects; addition or creation of new ones. It also allows for painting/photorealistic animated clips to be created quickly.

Given a single target image, first, a primitive representation is created for the training image. This can either be unsupervised (i.e., edge map, unsupervised segmentation), supervised (i.e., segmentation map, sketch, drawing), or a combination of both.

The researchers used a standard conditional image mapping network to map between the primitive representation and the image. After the model is trained, a user can explicitly design and choose the changes they want to apply by manipulating simple primitives. Later, the modified primitive to sent to the network for processing and transforming it into the desired manipulation.

Multiple research papers have been published to study how much and what can be learned from a single image data. The presented two seminal research works SinGAN and InGan, showcase the research beyond texture synthesis. The single image manipulation problem is tackled unconditionally by SinGAN, which allows the unsupervised generation tasks. Whereas InGAN proposes a conditional model for applying various geometric transformations to an image.

This research extends the basis of work by exploring supervised image-to-image translation. This permits modification in specific details such as shape or location for parts on an image. Apart from this, the researchers found that the augmentation strategy is key for making DeepSIM work effectively. The researchers also found that using a thin-plate spline (TPS) method is essential for training conditional generative models based on the input of just one image pair. The TPS method’s success comes from exploring the possible image manipulations and extending its training distribution to include any input which has been altered. This allows both professionals and amateurs alike to explore their creative ideas while preserving semantic attributes as well as geometric ones for high fidelity results that are true-to-life.

Key Contributions:

  • A general purpose approach for training conditional generators, supervised by only a single image-pair.
  • Recognizing the importance of image augmentation for task and how TPS has been previously overlooked in single-image manipulation.
  • Reaching great visual performance on a range of image manipulation applications.