# Exploring Adversarial Fake Images on Face Manifold

### 摘要

Images synthesized by powerful generative adversarial network (GAN) based methods have drawn moral and privacy concerns. Although image forensic models have reached great performance in detecting fake images from real ones, these models can be easily fooled with a simple adversarial attack. But, the noise adding adversarial samples are also arousing suspicion. In this paper, instead of adding adversarial noise, we optimally search adversarial points on face manifold to generate anti-forensic fake face images. We iteratively do a gradient-descent with each small step in the latent space of a generative model, e.g. Style-GAN, to find an adversarial latent vector, which is similar to norm-based adversarial attack but in latent space. Then, the generated fake images driven by the adversarial latent vectors with the help of GANs can defeat main-stream forensic models. For examples, they make the accuracy of deepfake detection models based on Xception or EfficientNet drop from over 90% to nearly 0%, meanwhile maintaining high visual quality. In addition, we find manipulating style vector $z$ or noise vectors $n$ at different levels have impacts on attack success rate. The generated adversarial images mainly have facial texture or face attributes changing.

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR, Oral)

We proposed a novel method to generate adversarial anti-forensic images which can bypass deep forensic models.

• The overall pipeline of our method. We perform gradient descent on the latent vector and noise inputs of Style-GAN, respectively or together, maximizing loss function of the target forensic model(s).

• Figure 2 shows adversarial images generated by different methods. Upper left is the original Style-GAN-generated image. Upper right is the image generated by our method. Lower left and Lower right are adversarial images generated by FGSM[8] and PGD[21] Linf norm-based attack respectively under the same perturbation level. Although all these images can bypass the target forensic model, images generated by our method are more invisible to human eyes.

• Table 1 shows accuracy different models perform on our method and other adversarial attack method, PGD L2, PGD Linf and FGSM attack. our method has the same ability to bypass the forensic detectors as norm-based adversarial attack, PGD Linf and has better adversarial strength than FGSM and PGD L2 attack. PGD L2 attack shows poor performance on both models because of the limited perturbation scale.

• Table 2 shows metrics measuring distortion between adversarial images and reference images. proposed method has similar performance with FGSM in MSE, PSNR and SSIM, while surpassing the rest methods in LPIPS and user study by a large margin.

## Citation

@InProceedings{Li_2021_CVPR,
author    = {Li, Dongze and Wang, Wei and Fan, Hongxing and Dong, Jing},
title     = {Exploring Adversarial Fake Images on Face Manifold},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month     = {June},
year      = {2021},
pages     = {5789-5798}
}