Stable diffusion. If you are interested in the AI domain, there is an extremely high chance that you heard about it. It was everywhere in the last couple of months. The impressive, realistic-looking outputs generated by stable diffusion have started a new era in the image generation domain.
Diffusion models can be guided with text prompts to generate certain types of images. For example, you can ask it to generate an image of “a monkey running on the moon” or “a mobile app screen that uses lemons as the theme.” Diffusion models have a strong potential to become a powerful tool for artists, game designers, UI designers, etc.
Though, not everyone is innocent in the world. When you have a tool that can generate photorealistic images and the limitation to its generation capability mainly comes from our imagination, there is a high chance that it will be used for malicious purposes. The possibility of generating fake media to fit certain misinformation goals is a serious threat nowadays.
How could we prevent this, though? Are we ready for an era of AI-generated media everywhere? How can we be sure that an AI model does not generate the image we see? Is it possible to get the “true” information in this new world? How strong will be the influence of AI-generated media in this upcoming decade?
Researchers have already started to search for solutions to detect images generated by diffusion models. A diffusion model-generated image contains specific characteristics. For example, they still lack strong 3D modeling; thus, it causes some asymmetries in shadows and reflected objects. Also, you can see some inconsistencies in the lightning throughout the image as a result.
These issues can be exploited to detect a diffusion model-generated image today to some extent. However, once diffusion models fix those issues, which should happen soon, given the rapid advancement in the field, these methods will not work. Relying on the flaws of diffusion models is not a long-term solution to detecting AI-generated images.
Most state-of-the-art detectors do not rely on visible artifacts. They use traces that are not visible to the human eye. Even if an image looks perfect, it can still be identified as AI-generated based on the signs left behind from the process of generating it. These generation traces are unique to the method used to generate the image and different from the signs left behind by real cameras. Also, each generation algorithm leaves a unique trace, which can also be used to determine the source.
These trace-based detection approaches have proved useful in generative adversarial networks (GANs), but the problem is still far away from being solved. Each iteration of generation architecture reduces the presence of those traces. On top of that, even the most advanced detectors can fail to generalize to an unseen model structure. Also, those detectors can struggle a lot when the image quality drops, which happens all the time in social media as each platform has its own compression and rescaling operations.
With all these questions and problems to answer, the authors of this paper came up with some experiments and possible directions for detecting images generated by diffusion models. They first examined whether the diffusion models leave a trace behind as GANs do and found they could partially detect the images using the traces. The traces left behind by diffusion models are not as strong as GAN models, but they can still be used for detecting images. This was not the case for certain diffusion models like DALL-E 2, which had almost no distinctive artifacts.
Moreover, they evaluated the performance of existing detectors in more realistic scenarios and found that generalization remains the biggest problem. If a model is trained for GAN models, it struggles to detect images generated by a diffusion model and vice versa.
Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 13k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He is currently pursuing a Ph.D. degree at the University of Klagenfurt, Austria, and working as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.