AI Researchers From UC San Diego Introduce A Method To Bypass Deepfake Detectors By Adversarially Modifying Fake Videos

The increasing circulation of fake videos through various platforms, primarily social media, has raised concerns worldwide, questioning digital media’s credibility. Adding to these concerns, scientists have pointed out that it is possible for attackers to bypass deepfakes detectors.

Scientists at WACV 2021 conference demonstrated for the first time that deepfakes detectors could be deceived. Deepfakes are the videos that manipulate real-life footage using artificial intelligence. According to the researchers at the conference, which took place from January 5 to January 9, 2021, the detectors can be defeated by inserting inputs, called adversarial examples, into every video frame. These adversarial examples are slightly manipulated inputs that in turn cause AI systems such as Machine Learning models to make a mistake. The attack can be carried out even on compressed videos.

In deepfakes, a subject’s face is usually modified to create realistic footage of events that haven’t occurred in reality. Therefore, typical deepfake detectors focus on the faces in videos, tracking them and sending data to a neural network that determines whether it is fake or real. 

In their approach, researchers developed an adversarial example for every face in a video frame. The examples were built to withstand compression and resizing operations that could delete these examples. The algorithm achieves this by estimating over a set of input transformations on how the model ranks images as real or fake. It then uses this estimation to transform images so that adversarial images remain effective even after compression or resizing operations. The modified face is then inserted into all the video frames. The process is repeated to create a deepfake video

Researchers considered two scenarios while testing their attacks, the first one where the attackers have complete access to the detector model. The other is where attackers can only query the Machine Learning model. It was discovered that for uncompressed videos in the first scenario, the attack’s success rate is above 99 percent, whereas for compressed videos, it was around 84.96 percent. For the second scenario, the success rate was 86.43 percent and 78.33 percent for uncompressed and compressed videos, respectively.

This research throws light on the fact that current methods to detect deepfakes can be easily bypassed if the attackers have some information about the detector. It proves that the attacks on deepfake detectors could be a real-world threat. The researchers argue that it is imperative to evaluate deepfake detectors against an adaptive adversary aware of these defects. The researchers suggest using an approach similar to adversarial training to improve detectors. The adaptive adversary continues to generate new deepfakes that can bypass the detector, and the detector continues to improve during training.