Image-to-image translation (I2I) is an interesting field within computer vision and machine learning that holds the power to transform visual content from one domain into another seamlessly. This transformative process goes beyond the simple change of pixel values; it entails a profound understanding of the underlying structures, semantics, and styles of images.
I2I has found extensive applications in various domains, from generating artistic renditions of photographs to converting satellite images into maps and even translating sketches into photorealistic images. It leverages the capabilities of deep learning models, such as Generative Adversarial Networks (GANs) and Convolutional Neural Networks (CNNs).
Traditional I2I methods have primarily focused on translating between domains with small gaps, such as photos to paintings or different types of animals. However, these tasks do not require generating significantly different visual features or inferences about shape during the translation process.
Let us meet with Revive-2I, a novel approach to I2I, that explores the task of translating skulls into living animals, a task known as Skull2Animal.
Skull2Animal is a challenging task that involves translating skulls into images of living animals. This task presents a significant challenge as it requires generating new visual features, textures, and colors, and making inferences about the geometry of the target domain.
To overcome the challenges of long I2I translation, Revive-2I uses text prompts that describe the desired changes in the image. It can generate realistic and verifiable results. This approach offers a stricter constraint for acceptable translations, ensuring the generated images align with the intended target domain.
Revive-2I utilizes natural language prompts to perform zero-shot I2I via latent diffusion models.
Revive-2I consists of two main steps: encoding and text-guided decoding. In the encoding step, the source image is transformed into a latent representation using a process called diffusion. This latent representation is then noised to incorporate the desired changes. By performing the diffusion process in the latent space, Revive-2I achieves faster and more efficient translations.
Finding the sweet spot for Revive-2I was not an easy task. This had to be experimented with different numbers of steps in the forward diffusion process. By taking partial steps, the translation process can better preserve the content of the source image while incorporating the features of the target domain. This approach allows for more robust translations while still injecting the desired changes guided by the text prompts.
The ability to perform constrained longI2I has significant implications in various fields. For example, law enforcement agencies can utilize this technology to generate realistic images of suspects based on sketches, aiding in identification. Wildlife conservationists can showcase the effects of climate change on ecosystems and habitats by translating images of endangered species into their living counterparts. Additionally, paleontologists can bring ancient fossils to life by translating them into images of their living. Looks like we can finally have Jurassic Park.
Check out the Paper, Code, and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled "Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning." His research interests include deep learning, computer vision, video encoding, and multimedia networking.