Can We Truly Trust Artificial Intelligence AI Watermarking? This AI Paper Unmasks the Vulnerabilities in Current Deepfake Method’s Defense

The rapid advancement in the field of generative Artificial Intelligence has brought about significant changes in the landscape of digital content creation. These AI algorithms have advanced and become more widely available, allowing for the creation of fake digital content that is incredibly compelling. Deepfakes, which are hyper-realistic media formats like photos, videos, and sounds, have the potential to mislead viewers and listeners, which raises worries about false information, fraud, and even defamation and emotional suffering. As a result, identifying AI-generated information and tracking its sources have grown to be a major challenge.

To prevent the misuse of fraudulent content being presented as authentic, recent developments in generative AI models have made it vital to discern between authentic content and AI-generated material. Watermarking is one method that has been developed to distinguish between photos created by AI and other sources. Recent research by researchers from the Department of Computer Science, University of Maryland, has focussed on the resilience of several AI image detectors, including watermarking and classifier-based deepfake detectors.

The study has revealed a fundamental trade-off between the evasion error rate, i.e., the fraction of watermarked images detected as non-watermarked, and the spoofing error rate, which is the fraction of non-watermarked images detected as watermarked when subjected to a diffusion purification attack for watermarking techniques that introduce subtle image perturbations. It investigates the balance between preventing false negatives and false positives. False negatives are the actual images incorrectly identified as AI-generated, and false positives are the AI-generated images mistakenly detected as real.

The research has empirically proved that the diffusion purification attack may successfully remove watermarks from images in this low disturbance range. Images that are subtly altered by watermarking techniques are more susceptible to this attack. The diffusion purification attack, on the other hand, is less successful for watermarking techniques that significantly alter images. The research has suggested a different kind of attack, in this case, called a model substitution adversarial attack, which can successfully eliminate watermarks from high perturbation watermarking techniques. This approach includes deceiving the watermarking model into thinking that watermarked content is no longer present.

The study has also emphasized how susceptible watermarking techniques are to spoofing attacks. In a spoofing attack, the attacker wants actual images, which can be indecent or explicit, to be mistaken for watermarked ones. The research has shown that a watermarked noise image may be produced even with only black-box access to the watermarking technology, which means the attacker is unaware of its internal workings. The attacker might potentially do harm by falsely labeling real photographs as watermarked by adding this noise image to them.

The primary contributions of the research have been summarized as follows.

  1. The study has identified a fundamental trade-off between evasion and spoofing errors in image watermarking when subjected to a diffusion purification attack.
  1. A model substitution adversarial attack to effectively remove watermarks in high perturbation image watermarking methods has been developed, which significantly alters the original images.
  1. Spoofing attacks against watermarking methods have been identified by adding watermarked noise images to non-watermarked ones, potentially damaging the developers’ reputation.
  1. A trade-off between the robustness and reliability of deepfake detectors has been detected. 

In conclusion, this study clarifies the difficulties and weaknesses of AI image detectors, notably watermarking techniques, in the face of malicious attacks and growing AI-generated material. It emphasizes how crucial it is to keep creating and enhancing detection methods in the generative AI era in order to deal with these challenges and overcome them.


Check out the PaperAll Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft