Skip to content

Commit

Permalink
Merge pull request #19 from nttcslab/2024t2/v3.x.x
Browse files Browse the repository at this point in the history
Made DCASE2024 Task2 legacy
  • Loading branch information
noboru2000 authored Aug 8, 2024
2 parents 68e8f1c + 9f98db6 commit ce6566a
Show file tree
Hide file tree
Showing 23 changed files with 10,314 additions and 39 deletions.
68 changes: 48 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,16 @@ We will launch the datasets in three stages. Therefore, please download the data
+ section\_00\_test\_0000.wav
+ ...
+ section\_00\_test\_0199.wav
+ /test_rename (convert from test directory using `tools/rename.py`)
+ /section\_00\_source\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
+ ...
+ /section\_00\_source\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
+ ...
+ /section\_00\_target\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
+ ...
+ /section\_00\_target\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
+ ...
+ attributes\_00.csv (attributes CSV for section 00)
+ \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.)

### 4. Change parameters
Expand Down Expand Up @@ -238,7 +248,9 @@ After the evaluation dataset for the test is launched, download and unzip it. Mo
$ 02a_test_2024t2.sh -e
```

Anomaly scores are calculated using the evaluation dataset, i.e., `data/dcase2024t2/eval_data/raw/<machine_type>/test/`. The anomaly scores are stored as CSV files in the directory `results/`. You can submit the CSV files for the challenge. From the submitted CSV files, we will calculate AUC, pAUC, and your ranking.
Anomaly scores are calculated using the evaluation dataset, i.e., `data/dcase2024t2/eval_data/raw/<machine_type>/test/`. The anomaly scores are stored as CSV files in the directory `results/`. ~~You can submit the CSV files for the challenge. From the submitted CSV files, we will calculate AUC, pAUC, and your ranking.~~

If you use [rename script](./tools/rename_eval_wav.py) to generate `test_rename` directory, AUC and pAUC are also calculated.

### 9.2. Testing with the Selective Mahalanobis mode

Expand All @@ -248,7 +260,9 @@ After the evaluation dataset for the test is launched, download and unzip it. Mo
$ 02b_test_2024t2.sh -e
```

Anomaly scores are calculated using the evaluation dataset, i.e., `data/dcase2024t2/eval_data/raw/<machine_type>/test/`. The anomaly scores are stored as CSV files in the directory `results/`. You can submit the CSV files for the challenge. From the submitted CSV files, we will calculate AUC, pAUC, and your ranking.
Anomaly scores are calculated using the evaluation dataset, i.e., `data/dcase2024t2/eval_data/raw/<machine_type>/test/`. The anomaly scores are stored as CSV files in the directory `results/`. ~~You can submit the CSV files for the challenge. From the submitted CSV files, we will calculate AUC, pAUC, and your ranking.~~

If you use [rename script](./tools/rename_eval_wav.py) to generate `test_rename` directory, AUC and pAUC are also calculated.

### 10. Summarize results

Expand All @@ -268,7 +282,7 @@ If you want to change, summarize results directory or export directory, edit `03

## Legacy support

This version takes the legacy datasets provided in DCASE2020 task2, DCASE2021 task2, DCASE2022 task2, and DCASE2023 task2 dataset for inputs.
This version takes the legacy datasets provided in DCASE2020 task2, DCASE2021 task2, DCASE2022 task2, DCASE2023 task2, and DCASE2024 task2 dataset for inputs.
The Legacy support scripts are similar to the main scripts. These are in `tools` directory.

[learn more](README_legacy.md)
Expand Down Expand Up @@ -297,6 +311,18 @@ We developed and tested the source code on Ubuntu 20.04.4 LTS.
- fasteners == 0.18

## Change Log
### [3.3.0](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.3.0)

#### Made DCASE2024 Task2 legacy

- Added a link to the DCASE2024 task2 evaluator that calculates the official score.
- [dcase2024_task2_evaluator](https://github.com/nttcslab/dcase2024_task2_evaluator)
- Added DCASE2024 Task2 Ground Truth data.
- [DCASE2024 task2 Ground truth data](datasets/eval_data_list_2024.csv)
- Added DCASE2024 Task2 Ground truth attributes.
- [DCASE2024 task2 Ground truth Attributes](datasets/ground_truth_attributes)
- The legacy script has been updated to be compatible with DCASE2024 Task2.

### [3.2.3](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.2.3)

#### Updated citation in README
Expand Down Expand Up @@ -372,37 +398,39 @@ We developed and tested the source code on Ubuntu 20.04.4 LTS.

- Provides support for the legacy datasets used in DCASE2020, 2021, 2022, and 2023.

## Truth attribute of evaluation data

### Public ground truth
## Ground truth attribute

### Public ground truth of evaluation dataset

The following code was used to calculate the official score. Among these is evaluation datasets ground truth.

- [dcase2023_task2_evaluator](https://github.com/nttcslab/dcase2023_task2_evaluator)
- [dcase2024_task2_evaluator](https://github.com/nttcslab/dcase2024_task2_evaluator)

### In this repository
### Ground truth for evaluation datasets in this repository

This repository have evaluation data's ground truth csv. this csv is using to rename evaluation datasets.
You can calculate AUC and other score if add ground truth to evaluation datasets file name. *Usually, rename function is executed along with [download script](#description) and [auto download function](#41-enable-auto-download-dataset).

- [DCASE2023 task2](datasets/eval_data_list_2023.csv)


## Truth attribute of evaluation data

### Public ground truth
- [DCASE2024 task2 ground truth](datasets/eval_data_list_2024.csv)

The following code was used to calculate the official score. Among these is evaluation datasets ground truth.

- [dcase2023_task2_evaluator](https://github.com/nttcslab/dcase2023_task2_evaluator)
### Ground truth attributes

### In this repository
Attribute information is hidden by default for the following machine types:

This repository have evaluation data's ground truth csv. this csv is using to rename evaluation datasets.
You can calculate AUC and other score if add ground truth to evaluation datasets file name. *Usually, rename function is executed along with [download script](#description) and [auto download function](#41-enable-auto-download-dataset).
- dev data
- gearbox
- slider
- ToyTrain
- eval data
- AirCompressor
- BrushlessMotor
- HoveringDrone
- ToothBrush

- [DCASE2023 task2](datasets/eval_data_list_2023.csv)
You can view the hidden attributes in the following directory:

- [DCASE2024 task2 Ground truth Attributes](datasets/ground_truth_attributes)

## Citation

Expand Down
127 changes: 123 additions & 4 deletions README_legacy.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Legacy support

This version supports reading the datasets from DCASE2020 task2, DCASE2021 task2, DCASE2022 task2 and DCASE2023 task2 dataset for inputs.
This version supports reading the datasets from DCASE2020 task2, DCASE2021 task2, DCASE2022 task2, DCASE2023 task2 and DCASE2024 task2 dataset for inputs.

## Description

Expand All @@ -20,6 +20,9 @@ Legacy-support scripts are similar to the main scripts. These are in `tools` dir
- tools/data\_download\_2023.sh
- This script downloads development data and evaluation data files and puts them into `data/dcase2023t2/dev_data/raw/` and `data/dcase2023t2/eval_data/raw/`.
- Rename evaluation data after downloading the dataset to evaluate and calculate AUC score. Renamed data is stored in `data/dcase2023t2/eval_data/raw/test_rename`
- tools/data\_download\_2024.sh
- This script downloads development data and evaluation data files and puts them into `data/dcase2024t2/dev_data/raw/` and `data/dcase2024t2/eval_data/raw/`.
- Rename evaluation data after downloading the dataset to evaluate and calculate AUC score. Renamed data is stored in `data/dcase2024t2/eval_data/raw/test_rename`


- tools/01\_train\_legacy.sh
Expand All @@ -43,6 +46,11 @@ Legacy-support scripts are similar to the main scripts. These are in `tools` dir
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2023t2/dev_data/raw/<machine_type>/train/<section_id>`
- "Evaluation" mode:
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2023t2/eval_data/raw/<machine_type>/train/<section_id>`.
- DCASE2024 task2 mode:
- "Development" mode:
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2024t2/dev_data/raw/<machine_type>/train/<section_id>`
- "Evaluation" mode:
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2024t2/eval_data/raw/<machine_type>/train/<section_id>`.


- tools/02a\_test\_legacy.sh (Use MSE as a score function for the Simple Autoencoder mode)
Expand Down Expand Up @@ -82,6 +90,15 @@ Legacy-support scripts are similar to the main scripts. These are in `tools` dir
- This script generates a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2023t2/eval_data/raw/<machine_type>/test/`. (These directories will be made available with the "evaluation dataset".)
- The generated CSV files are stored in the directory `results/`.
- If `test_rename` directory is available, this script generates a CSV file including AUC, pAUC, precision, recall, and F1-score for each section.
- DCASE2024 task2 mode:
- "Development" mode:
- This script generates a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`.
- The generated CSV files will be stored in the directory `results/`.
- It also generates a CSV file including AUC, pAUC, precision, recall, and F1-score for each section.
- "Evaluation" mode:
- This script generates a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2024t2/eval_data/raw/<machine_type>/test/`. (These directories will be made available with the "evaluation dataset".)
- The generated CSV files are stored in the directory `results/`.
- If `test_rename` directory is available, this script generates a CSV file including AUC, pAUC, precision, recall, and F1-score for each section.

- tools/02b\_test\_legacy.sh (Use Mahalanobis distance as a score function for the Selective Mahalanobis mode)
- "Development" mode:
Expand Down Expand Up @@ -110,6 +127,15 @@ Legacy-support scripts are similar to the main scripts. These are in `tools` dir
- This script generates a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2023t2/eval_data/raw/<machine_type>/test/`. (These directories will be made available with the "evaluation dataset".)
- The generated CSV files are stored in the directory.
- This script also generates a CSV file, containing AUC, pAUC, precision, recall, and F1-score for each section.
- DCASE2024 task2 mode:
- "Development" mode:
- This script generates a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`.
- The CSV files will be stored in the directory `results/`.
- It also makes a csv file including AUC, pAUC, precision, recall, and F1-score for each section.
- "Evaluation" mode:
- This script generates a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2024t2/eval_data/raw/<machine_type>/test/`. (These directories will be made available with the "evaluation dataset".)
- The generated CSV files are stored in the directory.
- This script also generates a CSV file, containing AUC, pAUC, precision, recall, and F1-score for each section.
- 03_summarize_results.sh
- This script summarizes results into a csv file.
- Use the same as when summarizing DCASE2023T2 and DCASE2024T2 results.
Expand Down Expand Up @@ -147,6 +173,13 @@ Legacy scripts in `tools` directory can be executed regardless of the current di
+ Download "eval\_data_<machine_type>_train.zip" from [https://zenodo.org/record/7830345](https://zenodo.org/record/7830345).
+ "Evaluation Dataset", i.e., the evaluation dataset for test
+ Download "eval\_data_<machine_type>_test.zip" from [https://zenodo.org/record/7860847](https://zenodo.org/record/7860847).
+ DCASE2024T2
+ "Development Dataset"
+ Download "dev\_data_<machine_type>.zip" from [https://zenodo.org/records/10902294](https://zenodo.org/records/10902294).
+ "Additional Training Dataset", i.e., the evaluation dataset for training
+ Download "eval\_data_<machine_type>_train.zip" from [https://zenodo.org/records/11259435](https://zenodo.org/records/11259435).
+ "Evaluation Dataset", i.e., the evaluation dataset for test
+ Download "eval\_data_<machine_type>_test.zip" from [https://zenodo.org/records/11363076](https://zenodo.org/records/11363076).


### 3. Unzip the downloaded files and make the directory structure as follows:
Expand Down Expand Up @@ -188,6 +221,7 @@ $ bash tools/01_train.sh DCASE2020T2 -d
- `DCASE2021T2`
- `DCASE2022T2`
- `DCASE2023T2`
- `DCASE2024T2`
- Second parameters
- `-d`
- `-e`
Expand All @@ -209,6 +243,7 @@ $ bash tools/02a_test_legacy.sh DCASE2020T2 -d
- `DCASE2021T2`
- `DCASE2022T2`
- `DCASE2023T2`
- `DCASE2024T2`
- Second parameters
- `-d`
- `-e`
Expand All @@ -228,6 +263,7 @@ $ bash tools/02b_test_legacy.sh DCASE2020T2 -d
- `DCASE2021T2`
- `DCASE2022T2`
- `DCASE2023T2`
- `DCASE2024T2`
- Second parameters
- `-d`
- `-e`
Expand Down Expand Up @@ -574,18 +610,82 @@ Note that the wav file's parent directory. At that time dataset directory is `de
- /ToyTank
- /Vacuum

## Truth attribute of evaluation data
### DCASE2024 task2
- dcase2023\_task2\_baseline\_ae
- /data/dcase2024t2/dev\_data/raw
- /bearing
- /train (only normal clips)
- /section\_00\_source\_train\_normal\_0001\_\<attribute\>.wav
- ...
- /section\_00\_source\_train\_normal\_0990\_\<attribute\>.wav
- /section\_00\_target\_train\_normal\_0001\_\<attribute\>.wav
- ...
- /section\_00\_target\_train\_normal\_0010\_\<attribute\>.wav
- test/
- /section\_00\_source\_test\_normal\_0001\_\<attribute\>.wav
- ...
- /section\_00\_source\_test\_normal\_0050\_\<attribute\>.wav
- /section\_00\_source\_test\_anomaly\_0001\_\<attribute\>.wav
- ...
- /section\_00\_source\_test\_anomaly\_0050\_\<attribute\>.wav
- /section\_00\_target\_test\_normal\_0001\_\<attribute\>.wav
- ...
- /section\_00\_target\_test\_normal\_0050\_\<attribute\>.wav
- /section\_00\_target\_test\_anomaly\_0001\_\<attribute\>.wav
- ...
- /section\_00\_target\_test\_anomaly\_0050\_\<attribute\>.wav
- attributes\_00.csv (attributes CSV for section 00)
- /fan (The other machine types have the same directory structure as fan.)
- /gearbox
- /slider (`slider` means "slide rail")
- /ToyCar
- /ToyTrain
- /valve
- /data/dcase2024t2/eval\_data/raw/
- /3DPrinter
- /train (after launch of the additional training dataset)
- /section\_00\_source\_train\_normal\_0001\_\<attribute\>.wav
- ...
- /section\_00\_source\_train\_normal\_0990\_\<attribute\>.wav
- /section\_00\_target\_train\_normal\_0001\_\<attribute\>.wav
- ...
- /section\_00\_target\_train\_normal\_0010\_\<attribute\>.wav
- /test (after launch of the evaluation dataset)
- /section\_00\_test\_0001.wav
- ...
- /section\_00\_test\_0200.wav
- /test_rename (convert from test directory using `tools/rename.py`)
- /section\_00\_source\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
- ...
- /section\_00\_source\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
- ...
- /section\_00\_target\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
- ...
- /section\_00\_target\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
- ...
- attributes\_00.csv (attributes CSV for section 00)
- /AirCompressor
- /BrushlessMotor
- /HairDryer
- /HoveringDrone
- /RoboticArm
- /Scanner
- /ToothBrush
- /ToyCircuit

## Ground truth attribute

### Public ground truth
### Public ground truth of evaluation dataset

The following code was used to calculate the official score. Among these is evaluation datasets ground truth.

- [dcase2020_task2_evaluator](https://github.com/y-kawagu/dcase2020_task2_evaluator)
- [dcase2021_task2_evaluator](https://github.com/y-kawagu/dcase2021_task2_evaluator)
- [dcase2022_task2_evaluator](https://github.com/Kota-Dohi/dcase2022_evaluator)
- [dcase2023_task2_evaluator](https://github.com/nttcslab/dcase2023_task2_evaluator)
- [dcase2024_task2_evaluator](https://github.com/nttcslab/dcase2024_task2_evaluator)

### In this repository
### Ground truth for evaluation datasets in this repository

This repository have evaluation data's ground truth csv. this csv is using to rename evaluation datasets.
You can calculate AUC and other score if add ground truth to evaluation datasets file name. *Usually, rename function is executed along with [download script](#description) and [auto download function](#41-enable-auto-download-dataset).
Expand All @@ -594,7 +694,26 @@ You can calculate AUC and other score if add ground truth to evaluation datasets
- [DCASE2021 task2](datasets/eval_data_list_2021.csv)
- [DCASE2022 task2](datasets/eval_data_list_2022.csv)
- [DCASE2023 task2](datasets/eval_data_list_2023.csv)
- [DCASE2024 task2](datasets/eval_data_list_2024.csv)

### Ground truth attributes

Attribute information is hidden by default for the following machine types:

- DCASE2024 Task2
- dev data
- gearbox
- slider
- ToyTrain
- eval data
- AirCompressor
- BrushlessMotor
- HoveringDrone
- ToothBrush

You can view the hidden attributes in the following directory:

- [DCASE2024 task2 Ground truth Attributes](datasets/ground_truth_attributes)

## Citation

Expand Down
2 changes: 2 additions & 0 deletions data_download_2024eval.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,5 @@ wget "https://zenodo.org/records/11363076/files/eval_data_${machine_type}_test.z
unzip "eval_data_${machine_type}_test.zip"
done

# Adds reference labels to test data.
python ${ROOT_DIR}/tools/rename_eval_wav.py --dataset_parent_dir=${parent_dir} --dataset_type=DCASE2024T2
Loading

0 comments on commit ce6566a

Please sign in to comment.