Cross-Modal Artifact Mining for Generalizable Deepfake Detection in the Wild

Haojun Weng; Ye Lei

doi:10.63575/CIA.2024.20208

Authors

Haojun Weng Computer Technology, Fudan University, Shanghai, China Author
Ye Lei Applied Mathematics, Columbia University, NY, USA Author

DOI:

https://doi.org/10.63575/CIA.2024.20208

Keywords:

Deepfake Detection, Cross-Modal Learning, Frequency Domain Analysis, Generalization

Abstract

The proliferation of deepfake content poses unprecedented threats to digital security and information integrity. Existing detection methods suffer from significant performance degradation when confronting cross-dataset scenarios and real-world manipulated media. This paper proposes a novel cross-modal artifact mining framework that integrates frequency-domain analysis with audio-visual consistency verification for enhanced generalization capability. Our approach employs adaptive high-frequency enhancement modules coupled with discrete cosine transform feature extraction to capture subtle manipulation artifacts. The cross-modal attention fusion mechanism effectively leverages temporal alignment inconsistencies between audio and visual streams. Through comprehensive evaluation on six benchmark datasets, our method achieves superior cross-dataset generalization performance with 89.7% average accuracy and demonstrates robust detection capability against diffusion-generated deepfakes. Extensive experiments validate the effectiveness of each proposed component through ablation studies, while robustness analysis confirms resilience against adversarial perturbations and compression artifacts encountered in real-world deployment scenarios.

Cross-Modal Artifact Mining for Generalizable Deepfake Detection in the Wild

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Menu

Counter

Contact