What Did You Feed On? This AI Model Can Extract Training Data From Diffusion Models

Diffusion models became a key part of the AI domain in 2022. We have seen photorealistic images generated by them, and they kept getting better and better. The success of diffusion models can largely be attributed to Stable Diffusion, which laid the groundwork for subsequent techniques. It wasn’t long before diffusion models became the go-to method for generating images.

Diffusion models, also known as denoising diffusion models, belong to a class of generative neural networks. They begin by selecting noise from the training distribution and gradually refining it until the output is visually pleasing. This gradual denoising process enables them to be easier to scale and control. Also, they usually produce higher-quality samples compared to prior approaches like generative adversarial networks (GANs).

The image generation capability of diffusion models is thought to be not like the previous approaches. Unlike previous large-scale image generation models, which were susceptible to overfitting and could generate images that closely resembled the training samples, diffusion models are thought to produce images that differ significantly from those in the training set. This characteristic has made diffusion models a promising tool for privacy-conscious researchers who need to protect the identity of individuals or sensitive information in the training images. By generating novel images that deviate from the original dataset, diffusion models offer a way to preserve privacy without sacrificing the quality of the generated output.

But is it true? Do diffusion models really not memorize the training images? Is it not possible to use them to access samples in their training set? Can we really trust them to protect the privacy of training samples? Researchers asked these questions, and they came up with a study to show us that diffusion models do indeed memorize their training data.

Example of memorized training sample by diffusion models. Source: https://arxiv.org/pdf/2301.13188.pdf

It is possible to regenerate samples in the training data of state-of-the-art diffusion models, though it is not straightforward. First, certain training samples are easier to extract, especially duplicate ones. Authors use this property to extract training samples from Stable Diffusion. They first identify near duplicate images in the training dataset. Of course, doing this manually is not feasible as there are around 160 million images in the training dataset of Stable Diffusion. Instead, they embed images using CLIP and then compare images in this low-dimension space. If CLIP embeddings have a high cosine similarity, these captions are used as input prompts for the extraction attack.

Example training images extracted from Stable Diffusion. Source: https://arxiv.org/pdf/2301.13188.pdf

Once they have potential text prompts for the attack, the next step is generating many samples, 500 in this case, using the same prompt to find whether there is any memorization. These 500 images are generated using the same prompt, but they all look different due to the random seed. Then, they connect each image to each other by measuring their similarity distance and constructing a graph using these connections. If they see an accumulation on a certain location in this graph, let’s say more than 10 images connected to a single one, that center image is assumed to be a memorization. When they applied this approach to Stable Diffusion, they could generate almost identical samples to the ones in the training dataset.

They have run experimental attacks on state-of-the-art diffusion models, and they found interesting observations. More information is memorized by state-of-the-art diffusion models than by comparable GANs, and stronger diffusion models memorize more information than weaker diffusion models. This suggests that the vulnerability of generative image models may increase over time. 

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 26k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled "Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning." His research interests include deep learning, computer vision, video encoding, and multimedia networking.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...