Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective

Yingying Fan, Yu Wu, Bo Du and Yutian Lin

Code for NeurIPS 2023 paper Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective

Method Overview

Environment

python3.7+

You should install CLIP and LAION-CLAP and run

pip install -r requirement.txt

Prepare data

Resnet and VGGish features can be downloaded from Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing. We also provide visual feature extracted by CLIP and audio feature extracted by LAION-CLAP.
Put the downloaded features into data/feats/.
We use CLIP(ViT-B/16) and LAION-CLAP pre-trained on audioset.

Label Denoising

python main.py --mode label_denoise --language refine_label/denoised_label.npz --refine_label refine_label/final_label.npz

Train the model

Resnet and VGGish features

python main.py --mode train_model --num_layers 6 --lr 8e-5 --refine_label refine_label/final_label.npz --save_model true --checkpoint LSLD.pt

CLAP and CLIP features (Recommended)

python main.py --mode train_model --num_layers 4 --lr 2e-4 --refine_label refine_label/final_label.npz --save_model true --checkpoint LSLD.pt

Test the model

We put the pre-trained model in this Link

python main.py --mode test_LSLD --checkpoint LSLD.pt

Citation

@article{fan2023revisit,
  title={Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective},
  author={Fan, Yingying and Wu, Yu and Du, Bo and Lin, Yutian },
  journal={arXiv preprint arXiv:2306.00595},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
data		data
fig		fig
nets		nets
refine_label		refine_label
scripts		scripts
utils		utils
README.md		README.md
dataloader.py		dataloader.py
main.py		main.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective

Method Overview

Environment

Prepare data

Label Denoising

Train the model

Test the model

Citation

About

Releases

Packages

Languages

fyyCS/LSLD

Folders and files

Latest commit

History

Repository files navigation

Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective

Method Overview

Environment

Prepare data

Label Denoising

Train the model

Test the model

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages