Detecting Compressed Deepfake Videos in Social Networks Using Frame-Temporality Two-Stream Convolutional Network


The development of technologies that can generate Deepfake videos is expanding rapidly. These videos are easily synthesized without leaving obvious traces of manipulation. Though forensically detection in high-definition video datasets has achieved remarkable results, the forensics of compressed videos is worth further exploring. In fact, compressed videos are common in social networks, such as videos from Instagram, Wechat, and Tiktok. Therefore, how to identify compressed Deepfake videos becomes a fundamental issue. In this paper, we propose a two-stream method by analyzing the frame-level and temporality-level of compressed Deepfake videos. Since the video compression brings lots of redundant information to frames, the proposed frame-level stream gradually prunes the network to prevent the model from fitting the compression noise. Aiming at the problem that the temporal consistency in Deepfake videos might be ignored, we apply a temporality-level stream to extract temporal correlation features. When combined with scores from the two streams, our proposed method performs better than the state-of-the-art methods in compressed Deepfake videos detection.

IEEE Transactions on Circuits and Systems for Video Technology