Skip to content

Commit

Permalink
Supported Additional Training Dataset (#11)
Browse files Browse the repository at this point in the history
* Update README.md

* Update README_legacy.md

* supported additional training dataset

* Update README.md

---------

Co-authored-by: Noboru Harada <64912994+noboru2000@users.noreply.github.com>
  • Loading branch information
YuriMusashijima and noboru2000 authored May 15, 2024
1 parent b4185b2 commit 887e67a
Show file tree
Hide file tree
Showing 7 changed files with 150 additions and 17 deletions.
13 changes: 11 additions & 2 deletions 01_train_2024t2.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,17 @@ then
dataset_list="DCASE2024T2bearing DCASE2024T2fan DCASE2024T2gearbox DCASE2024T2slider DCASE2024T2ToyCar DCASE2024T2ToyTrain DCASE2024T2valve"
elif [ "${dev_eval}" = "-e" ] || [ "${dev_eval}" = "--eval" ]
then
echo eval data has not been published yet.
exit 1
dataset_list="\
DCASE2024T23DPrinter \
DCASE2024T2AirCompressor \
DCASE2024T2Scanner \
DCASE2024T2ToyCircuit \
DCASE2024T2HoveringDrone \
DCASE2024T2HairDryer \
DCASE2024T2ToothBrush \
DCASE2024T2RoboticArm \
DCASE2024T2BrushlessMotor \
"
fi

for dataset in $dataset_list; do
Expand Down
73 changes: 61 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,26 @@ This system consists of three main scripts (01_train.sh, 02a_test.sh, and 02b_te
- Helper scripts for DCASE2024T2
- data\_download\_2024dev.sh
- "Development dataset":
- This script downloads development data files and puts them into "data/dcase2024t2/dev\_data/raw/train/" and "data/dcase2024t2/dev\_data/raw/test/". **Newly added!!**
- This script downloads development data files and puts them into "data/dcase2024t2/dev\_data/raw/train/" and "data/dcase2024t2/dev\_data/raw/test/".
- data\_download\_2024add.sh **Newly added!!**
- "Additional train dataset for Evaluation":
- This script downloads Addition data files and puts them into "data/dcase2024t2/eval\_data/raw/train/". **Newly added!!**

- 01_train_2024t2.sh
- "Development" mode:
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2024t2/dev_data/raw/<machine_type>/train/<section_id>`. **Newly added!!**
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2024t2/dev_data/raw/<machine_type>/train/<section_id>`.
- "Evaluation" mode:
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2024t2/eval_data/raw/<machine_type>/train/<section_id>`. **Newly added!!**

- 02a_test_2024t2.sh (Use MSE as a score function for the Simple Autoencoder mode)
- "Development" mode:
- This script makes a CSV file for each section, including the anomaly scores for each WAV file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`. **Newly added!!**
- This script makes a CSV file for each section, including the anomaly scores for each WAV file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`.
- The CSV files will be stored in the directory `results/`.
- It also makes a csv file including AUC, pAUC, precision, recall, and F1-score for each section.

- 02b_test_2024t2.sh (Use Mahalanobis distance as a score function for the Selective Mahalanobis mode)
- "Development" mode:
- This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`. **Newly added!!**
- This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`.
- The CSV files will be stored in the directory `results/`.
- It also makes a csv file including AUC, pAUC, precision, recall, and F1-score for each section.

Expand All @@ -55,8 +60,9 @@ We will launch the datasets in three stages. Therefore, please download the data

+ DCASE 2024 Challenge Task 2
+ "Development Dataset" **New! (2024/04/01)**
+ Download "dev\_data_<machine_type>.zip" from
[https://zenodo.org/records/10902294](https://zenodo.org/records/10902294).
+ Download "dev\_data_<machine_type>.zip" from [https://zenodo.org/records/10902294](https://zenodo.org/records/10902294).
+ "Additional Training Dataset", i.e., the evaluation dataset for training **New! (2024/05/15)**
+ Download "eval\_data_<machine_type>_train.zip" from [https://zenodo.org/records/11183284](https://zenodo.org/records/11183284).

+ For DCASE 2023 Challenge Task 2
(C.f., for DCASE2023T2, see [README_legacy](README_legacy.md))
Expand Down Expand Up @@ -96,6 +102,16 @@ We will launch the datasets in three stages. Therefore, please download the data
+ attributes\_00.csv (attributes CSV for section 00)
+ gearbox/ (The other machine types have the same directory structure as fan.)
+ data/dcase2024t2/eval\_data/raw/
+ \<machine\_type0\_of\_additional\_dataset\>/
+ train/ (after launch of the additional training dataset)
+ section\_00\_source\_train\_normal\_0000\_.wav
+ ...
+ section\_00\_source\_train\_normal\_0989\_.wav
+ section\_00\_target\_train\_normal\_0000\_.wav
+ ...
+ section\_00\_target\_train\_normal\_0009\_.wav
+ attributes\_00.csv (attributes CSV for section 00)
+ \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.)

### 4. Change parameters

Expand Down Expand Up @@ -242,7 +258,7 @@ The Legacy support scripts are similar to the main scripts. These are in `tools`

## Dependency

We developed and tested the source code on Ubuntu 18.04.6 LTS.
We developed and tested the source code on Ubuntu 20.04.4 LTS.

### Software package

Expand All @@ -264,12 +280,33 @@ We developed and tested the source code on Ubuntu 18.04.6 LTS.
- fasteners == 0.18

## Change Log
### [3.1.0](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.1.0)

#### Added

- Provides support for the additional training datasets to be used in DCASE2024T2.

### [3.0.2](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.0.2)

#### Added

- Added information about ground truth and citations for each year's task in README.md and README_legacy.md.

### [3.0.1](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.0.1)

#### Added

- Provides support for the legacy datasets used in DCASE2020, 2021, 2022, and 2023.

#### Fixed

- Fixed a typo in README.md in the previous release, v3.0.0.

### [3.0.0](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.0.0)

#### Added

- provides support for the datasets used in DCASE2024.
- Provides support for the development datasets used in DCASE2024.

### [2.0.1](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v2.0.1)

Expand All @@ -282,7 +319,22 @@ We developed and tested the source code on Ubuntu 18.04.6 LTS.

#### Added

- provides support for the legacy datasets used in DCASE2020, 2021, 2022, and 2023.
- Provides support for the legacy datasets used in DCASE2020, 2021, 2022, and 2023.

## Truth attribute of evaluation data

### Public ground truth

The following code was used to calculate the official score. Among these is evaluation datasets ground truth.

- [dcase2023_task2_evaluator](https://github.com/nttcslab/dcase2023_task2_evaluator)

### In this repository

This repository have evaluation data's ground truth csv. this csv is using to rename evaluation datasets.
You can calculate AUC and other score if add ground truth to evaluation datasets file name. *Usually, rename function is executed along with [download script](#description) and [auto download function](#41-enable-auto-download-dataset).

- [DCASE2023 task2](datasets/eval_data_list_2023.csv)

## Truth attribute of evaluation data

Expand All @@ -309,6 +361,3 @@ If you use this system, please cite all the following four papers:
+ Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, Shoichiro Saito, "ToyADMOS2: Another Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection under Domain Shift Conditions," in Proc. DCASE 2022 Workshop, 2022. [URL](https://dcase.community/documents/workshop2021/proceedings/DCASE2021Workshop_Harada_6.pdf)
+ Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, and Yohei Kawaguchi, "MIMII DG: sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task," in Proc. DCASE 2022 Workshop, 2022. [URL](https://dcase.community/documents/workshop2022/proceedings/DCASE2022Workshop_Dohi_62.pdf)
+ Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi and Masahiro Yasuda, "First-Shot Anomaly Sound Detection for Machine Condition Monitoring: A Domain Generalization Baseline," 2023 31st European Signal Processing Conference (EUSIPCO), Helsinki, Finland, 2023, pp. 191-195, doi: 10.23919/EUSIPCO58844.2023.10289721. [URL](https://ieeexplore.ieee.org/document/10289721)



9 changes: 9 additions & 0 deletions data_download_2024add.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
mkdir -p "data/dcase2023t2/eval_data/raw"

# download eval data
cd "data/dcase2024t2/eval_data/raw"
for machine_type in 3DPrinter AirCompressor Scanner ToyCircuit HoveringDrone HairDryer ToothBrush RoboticArm BrushlessMotor; do
wget "https://zenodo.org/records/11183284/files/eval_data_${machine_type}_train.zip"
unzip "eval_data_${machine_type}_train.zip"
done

9 changes: 9 additions & 0 deletions datasets/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,15 @@ def __init__(self, args):

class Datasets:
DatasetsDic = {
'DCASE2024T23DPrinter':DCASE202XT2,
'DCASE2024T2AirCompressor':DCASE202XT2,
'DCASE2024T2Scanner':DCASE202XT2,
'DCASE2024T2ToyCircuit':DCASE202XT2,
'DCASE2024T2HoveringDrone':DCASE202XT2,
'DCASE2024T2HairDryer':DCASE202XT2,
'DCASE2024T2ToothBrush':DCASE202XT2,
'DCASE2024T2RoboticArm':DCASE202XT2,
'DCASE2024T2BrushlessMotor':DCASE202XT2,
'DCASE2024T2ToyCar':DCASE202XT2,
'DCASE2024T2ToyTrain':DCASE202XT2,
'DCASE2024T2bearing':DCASE202XT2,
Expand Down
27 changes: 27 additions & 0 deletions datasets/download_path_2024.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,30 @@ DCASE2024T2:
valve:
dev:
- https://zenodo.org/record/10902294/files/dev_valve.zip
3DPrinter:
eval:
- https://zenodo.org/records/11183284/files/eval_data_3DPrinter_train.zip
AirCompressor:
eval:
- https://zenodo.org/records/11183284/files/eval_data_AirCompressor_train.zip
Scanner:
eval:
- https://zenodo.org/records/11183284/files/eval_data_Scanner_train.zip
ToyCircuit:
eval:
- https://zenodo.org/records/11183284/files/eval_data_ToyCircuit_train.zip
HoveringDrone:
eval:
- https://zenodo.org/records/11183284/files/eval_data_HoveringDrone_train.zip
HairDryer:
eval:
- https://zenodo.org/records/11183284/files/eval_data_HairDryer_train.zip
ToothBrush:
eval:
- https://zenodo.org/records/11183284/files/eval_data_ToothBrush_train.zip
RoboticArm:
eval:
- https://zenodo.org/records/11183284/files/eval_data_RoboticArm_train.zip
BrushlessMotor:
eval:
- https://zenodo.org/records/11183284/files/eval_data_BrushlessMotor_train.zip
6 changes: 3 additions & 3 deletions datasets/loader_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,7 @@ def download_raw_data(
for split_data_path in split_data_path_list:
shutil.copytree(split_data_path, test_data_path, dirs_exist_ok=True)

if data_type == "eval":
if data_type == "eval" and dataset != "DCASE2024T2":
rename_wav(
dataset_parent_dir=root,
dataset_type=dataset,
Expand Down Expand Up @@ -468,6 +468,7 @@ def is_enabled_pickle(pickle_path):
"DCASE2023T2_dev":"datasets/machine_type_2023_dev.yaml",
"DCASE2023T2_eval":"datasets/machine_type_2023_eval.yaml",
"DCASE2024T2_dev":"datasets/machine_type_2024_dev.yaml",
"DCASE2024T2_eval":"datasets/machine_type_2024_eval.yaml",
}

def get_machine_type_dict(dataset_name, mode=True):
Expand All @@ -480,8 +481,7 @@ def get_machine_type_dict(dataset_name, mode=True):
elif dataset_name == "DCASE2024T2" and mode:
yaml_path = YAML_PATH["DCASE2024T2_dev"]
elif dataset_name == "DCASE2024T2" and not mode:
raise ValueError("DCASE2024T2 eval data has not been published yet.")
# yaml_path = YAML_PATH["DCASE2024T2_eval"]
yaml_path = YAML_PATH["DCASE2024T2_eval"]
else:
raise KeyError()

Expand Down
30 changes: 30 additions & 0 deletions datasets/machine_type_2024_eval.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
DCASE2024T2:
machine_type:
3DPrinter:
eval:
- "00"
AirCompressor:
eval:
- "00"
Scanner:
eval:
- "00"
ToyCircuit:
eval:
- "00"
HoveringDrone:
eval:
- "00"
HairDryer:
eval:
- "00"
ToothBrush:
eval:
- "00"
RoboticArm:
eval:
- "00"
BrushlessMotor:
eval:
- "00"
section_keyword: section

0 comments on commit 887e67a

Please sign in to comment.