AI Researchers Propose ‘GANgealing’: A GAN-Supervised Algorithm That Learns Transformations of Input Images to Bring Them into Better Joint Alignment

The correspondence problem of visual alignment is one that computer vision algorithms must solve for many different applications.
It’s considered a critical element in Optical Flow, 3D Matching, and Medical Imaging, to name just some examples; it also impacts Tracking and Augmented Reality.

The current focus of alignment research is on pairwise alignments, but less attention has been paid to global joint alignments. Yet this problem too, requires a common reference frame for tasks like automatic keypoint annotation and augmented reality/editing work properly. The output emphasis shows that they are emphasizing how important it can be in certain applications. When training from jointly aligned datasets, such as FFHQ and AFHQ data sets combined with CelebA-Hq for example, generative models are more likely to produce high-quality representations.

Researchers from UC Berkeley, Carnegie Mellon University, Adobe Research, and MIT CSAIL propose a new algorithm called ‘GANgealing’. ‘GANgealing’ is a GAN-Supervised algorithm that learns transformations of input images to bring them into better joint alignment. The research team introduced the GAN-Supervised Learning framework to learn discriminative models and their generated training data jointly end-to-end.

The GAN-Supervised Learning framework is a way for training both the Spatial Transformer and target images at once. The model is generalizable, which means it may also work with real-world data. The researchers showed that their ‘GANgealing’ algorithm could successfully align complex data and discover dense correspondences on eight different datasets (LSUN Bicycles, Cats, Cars, Dogs, Horses and TVs, In-The-Wild CelebA, and CUB).

According to the researchers, the proposed ‘GANgealing’ algorithm is significantly better than past self-supervised correspondence algorithms and performs on par with state-of-the-art supervised correspondence methods. It does so without using any outside input or data augmentation though it was trained exclusively through GAN-generated data.