Cross-Modal Artifact Mining for Generalizable Deepfake Detection in the Wild

Authors

  • Haojun Weng Computer Technology, Fudan University, Shanghai, China Author
  • Ye Lei Applied Mathematics, Columbia University, NY, USA Author

DOI:

https://doi.org/10.63575/CIA.2024.20208

Keywords:

Deepfake Detection, Cross-Modal Learning, Frequency Domain Analysis, Generalization

Abstract

The proliferation of deepfake content poses unprecedented threats to digital security and information integrity. Existing detection methods suffer from significant performance degradation when confronting cross-dataset scenarios and real-world manipulated media. This paper proposes a novel cross-modal artifact mining framework that integrates frequency-domain analysis with audio-visual consistency verification for enhanced generalization capability. Our approach employs adaptive high-frequency enhancement modules coupled with discrete cosine transform feature extraction to capture subtle manipulation artifacts. The cross-modal attention fusion mechanism effectively leverages temporal alignment inconsistencies between audio and visual streams. Through comprehensive evaluation on six benchmark datasets, our method achieves superior cross-dataset generalization performance with 89.7% average accuracy and demonstrates robust detection capability against diffusion-generated deepfakes. Extensive experiments validate the effectiveness of each proposed component through ablation studies, while robustness analysis confirms resilience against adversarial perturbations and compression artifacts encountered in real-world deployment scenarios.

Published

2024-07-26

How to Cite

[1]
Haojun Weng and Ye Lei, “Cross-Modal Artifact Mining for Generalizable Deepfake Detection in the Wild ”, Journal of Computing Innovations and Applications, vol. 2, no. 2, pp. 78–87, Jul. 2024, doi: 10.63575/CIA.2024.20208.