This is the official implementation of ATST-RCT.
ATST is a self-supervised pretraining model designed for clip-level audio tasks. Please refer to ATST official page for more information.
RCT is a semi-supervised learning scheme designed for sound event detection. Please refer to RCT official page for more informaiton.
The training/validation data is obtained from the DCSAE2022 task4 DESED dataset. The download of DESED is quite tedious and not all data is available for the accesses. You could ask for help from the DCASE committee to get the full dataset. Noted that, your testing result might be different with an incomplete validation dataset.
To train the model, please first get the baseline architecture of DCASE2022 task 4 by:
git clone git@github.com:DCASE-REPO/DESED_task.git
Don't forget to configure your environment by their requirements. And install any packages required. Dont't forget to change the path of the dataset to your owns.
Then, please cover the official DESED repo with ATST-RCT codes in this repo.
As for the ATST pretrained model, you could download the pretrained model from the following link, with password 2022
:
https://pan.baidu.com/s/1Nh6Na1azs6lNKPBstBiStw
Please also change the path of pretraining model in the configuration file, and to train you own model, run:
python train_fusion_rct.py
The result of the challenge is not published, please refer to their official page.
[1] DESED Dataset: https://github.com/turpaultn/DESED
[2] DCASE2022 Task4 baseline: https://github.com/DCASE-REPO/DESED_task
[3] FilterAug: https://github.com/frednam93/FilterAugSED