SAMM-D: Enhancing DeepFake Detection with Subjective Assessment and Multimodal Fusion

This repository contains the system implementations for the SAMM-D submission to AAAI 2025.

About

DeepFake is becoming common in content creation such as avatar video translations and Virtual Human Assistants. Video DeepFake is generated using Deep Generative Artificial Intelligence (Generative AI) models by either swapping the face of the target with the source or replacing lip movements to match the source audio. Similarly, audio DeepFake clones the voices from source audio to target audio, making it difficult for humans to differentiate the difference. In previous works, DeepFake detection works by either detecting or manipulating Artifacts in images, videos, or audio using uni-model or multi-modal models. However, there are few works using text to guide the model through user prompts. Visual Language Model (VLM) give users the ability to interact guide and query the AI model through prompts. In this work, we propose a user-interactive Vision Machine Learning (VML) SAMM-D. By simultaneously learning representations of video in the frequency domain of video, audio, and text, our method grants users control and interactive capabilities. We employed a pre-trained CLIP text encoder for making the user assessment and video representation that correlates with each modality to generate similarity with the corresponding user prompt to detect DeepFake.This is achieved by equipping users with the capability to guide the model through discrepancies and identify fake artifacts. The incorporation of prompt-guided multi-modal fusion resulted in notable enhancements in accuracy, Average Precision (AP), and the Area Under the Curve (AUC) for the detection of DeepFakes on the FakeAVCeleb training set and DFDC testing set. The proposed approach exhibited the highest accuracy and AUC, reaching 100%. Moreover, the efficiency of the approach was enhanced through a subjective assessment conducted during the inference stage.

Getting Started

Setting up environment:

conda create -n svdd_baseline python=3.10
conda activate svdd_baseline
pip install -r requirements.txt

Then you can run the training script with the following command:

python train_fft.py --base_dir {Where the data is} --gpu {GPU ID} --encoder {Encoder Type} --batch_size {Batch size}

You can use --load_from flag to resume training.

After training, you can evaluate your model using the following command:

python eval.py --base_dir {Where the data is} --model_path {The model's weights file} --gpu {GPU ID} --encoder {Encoder Type} --batch_size {Batch size}

The main functions in train and eval specify more options that you can tune.

Within base_dir, the code expects to see train_set, dev_set and test_set directories, along with train.txt and dev.txt as open-sourced. train_set, dev_set and test_set should directly contain *.flac files.

Visualize Training Logs of Provided Baseline Systems

Run the following command within the CtrSAMMD_dataset_Baseline directory.

pip install tensorboard
tensorboard --logdir weights/training_logs

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
datasets copy.py		datasets copy.py
datasets.py		datasets.py
datasetsHubert.py		datasetsHubert.py
datasets_fft.py		datasets_fft.py
datasets_lavdf.py		datasets_lavdf.py
datasets_preprocessed.py		datasets_preprocessed.py
eval.py		eval.py
extract_features.py		extract_features.py
extract_features_dfdc.py		extract_features_dfdc.py
extract_features_only_test.py		extract_features_only_test.py
extract_fft_audio_video.py		extract_fft_audio_video.py
getAccuracy.py		getAccuracy.py
inference.py		inference.py
movefile.py		movefile.py
requirements.txt		requirements.txt
testS3prl.py		testS3prl.py
train.py		train.py
train_fft.py		train_fft.py
train_lavdf.py		train_lavdf.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAMM-D: Enhancing DeepFake Detection with Subjective Assessment and Multimodal Fusion

About

Getting Started

Visualize Training Logs of Provided Baseline Systems

About

Releases

Packages

Languages

shahidmuneer-skku/samm-d

Folders and files

Latest commit

History

Repository files navigation

SAMM-D: Enhancing DeepFake Detection with Subjective Assessment and Multimodal Fusion

About

Getting Started

Visualize Training Logs of Provided Baseline Systems

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages