Supported Additional Training Dataset (#11)

* Update README.md * Update README_legacy.md * supported additional training dataset * Update README.md --------- Co-authored-by: Noboru Harada <64912994+noboru2000@users.noreply.github.com>
nttcslab · May 15, 2024 · 887e67a · 887e67a
1 parent b4185b2
commit 887e67a
Show file tree

Hide file tree

Showing 7 changed files with 150 additions and 17 deletions.
diff --git a/01_train_2024t2.sh b/01_train_2024t2.sh
@@ -31,8 +31,17 @@ then
     dataset_list="DCASE2024T2bearing DCASE2024T2fan DCASE2024T2gearbox DCASE2024T2slider DCASE2024T2ToyCar DCASE2024T2ToyTrain DCASE2024T2valve"
 elif [ "${dev_eval}" = "-e" ] || [ "${dev_eval}" = "--eval" ]
 then
-    echo eval data has not been published yet.
-    exit 1
+    dataset_list="\
+        DCASE2024T23DPrinter \
+        DCASE2024T2AirCompressor \
+        DCASE2024T2Scanner \
+        DCASE2024T2ToyCircuit \
+        DCASE2024T2HoveringDrone \
+        DCASE2024T2HairDryer \
+        DCASE2024T2ToothBrush \
+        DCASE2024T2RoboticArm \
+        DCASE2024T2BrushlessMotor \
+    "
 fi
 
 for dataset in $dataset_list; do

diff --git a/README.md b/README.md
@@ -19,21 +19,26 @@ This system consists of three main scripts (01_train.sh, 02a_test.sh, and 02b_te
 - Helper scripts for DCASE2024T2
   - data\_download\_2024dev.sh
     - "Development dataset":
-      - This script downloads development data files and puts them into "data/dcase2024t2/dev\_data/raw/train/" and "data/dcase2024t2/dev\_data/raw/test/". **Newly added!!**
+      - This script downloads development data files and puts them into "data/dcase2024t2/dev\_data/raw/train/" and "data/dcase2024t2/dev\_data/raw/test/".
+  - data\_download\_2024add.sh  **Newly added!!**
+    - "Additional train dataset for Evaluation":
+      - This script downloads Addition data files and puts them into "data/dcase2024t2/eval\_data/raw/train/". **Newly added!!**
 
 - 01_train_2024t2.sh
   - "Development" mode:
-    - This script trains a model for each machine type for each section ID by using the directory `data/dcase2024t2/dev_data/raw/<machine_type>/train/<section_id>`.  **Newly added!!**
+    - This script trains a model for each machine type for each section ID by using the directory `data/dcase2024t2/dev_data/raw/<machine_type>/train/<section_id>`.
+  - "Evaluation" mode:
+    - This script trains a model for each machine type for each section ID by using the directory `data/dcase2024t2/eval_data/raw/<machine_type>/train/<section_id>`.  **Newly added!!**
 
 - 02a_test_2024t2.sh (Use MSE as a score function for the Simple Autoencoder mode)
   - "Development" mode:
-    - This script makes a CSV file for each section, including the anomaly scores for each WAV file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`. **Newly added!!**
+    - This script makes a CSV file for each section, including the anomaly scores for each WAV file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`.
     - The CSV files will be stored in the directory `results/`.
     - It also makes a csv file including AUC, pAUC, precision, recall, and F1-score for each section.
 
 - 02b_test_2024t2.sh (Use Mahalanobis distance as a score function for the Selective Mahalanobis mode)
   - "Development" mode:
-    - This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`. **Newly added!!**
+    - This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`.
     - The CSV files will be stored in the directory `results/`.
     - It also makes a csv file including AUC, pAUC, precision, recall, and F1-score for each section.
 
@@ -55,8 +60,9 @@ We will launch the datasets in three stages. Therefore, please download the data
 
   + DCASE 2024 Challenge Task 2
     + "Development Dataset" **New! (2024/04/01)**
-      + Download "dev\_data_<machine_type>.zip" from 
-[https://zenodo.org/records/10902294](https://zenodo.org/records/10902294).
+      + Download "dev\_data_<machine_type>.zip" from [https://zenodo.org/records/10902294](https://zenodo.org/records/10902294).
+    + "Additional Training Dataset", i.e., the evaluation dataset for training  **New! (2024/05/15)**
+      + Download "eval\_data_<machine_type>_train.zip" from [https://zenodo.org/records/11183284](https://zenodo.org/records/11183284).
 
   + For DCASE 2023 Challenge Task 2
 	(C.f., for DCASE2023T2, see [README_legacy](README_legacy.md))
@@ -96,6 +102,16 @@ We will launch the datasets in three stages. Therefore, please download the data
         + attributes\_00.csv (attributes CSV for section 00)
      + gearbox/ (The other machine types have the same directory structure as fan.)
    + data/dcase2024t2/eval\_data/raw/
+     + \<machine\_type0\_of\_additional\_dataset\>/
+        + train/ (after launch of the additional training dataset)
+          + section\_00\_source\_train\_normal\_0000\_.wav
+          + ...
+          + section\_00\_source\_train\_normal\_0989\_.wav
+          + section\_00\_target\_train\_normal\_0000\_.wav
+          + ...
+          + section\_00\_target\_train\_normal\_0009\_.wav
+        + attributes\_00.csv (attributes CSV for section 00)
+     + \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.)
 
 ### 4. Change parameters
 
@@ -242,7 +258,7 @@ The Legacy support scripts are similar to the main scripts. These are in `tools`
 
 ## Dependency
 
-We developed and tested the source code on Ubuntu 18.04.6 LTS.
+We developed and tested the source code on Ubuntu 20.04.4 LTS.
 
 ### Software package
 
@@ -264,12 +280,33 @@ We developed and tested the source code on Ubuntu 18.04.6 LTS.
 - fasteners == 0.18
 
 ## Change Log
+### [3.1.0](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.1.0)
+
+#### Added
+
+- Provides support for the additional training datasets to be used in DCASE2024T2.
+
+### [3.0.2](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.0.2)
+
+#### Added
+
+- Added information about ground truth and citations for each year's task in README.md and README_legacy.md.
+
+### [3.0.1](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.0.1)
+
+#### Added
+
+- Provides support for the legacy datasets used in DCASE2020, 2021, 2022, and 2023.
+
+#### Fixed
+
+- Fixed a typo in README.md in the previous release, v3.0.0.
 
 ### [3.0.0](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.0.0)
 
 #### Added
 
-- provides support for the datasets used in DCASE2024.
+- Provides support for the development datasets used in DCASE2024.
 
 ### [2.0.1](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v2.0.1)
 
@@ -282,7 +319,22 @@ We developed and tested the source code on Ubuntu 18.04.6 LTS.
 
 #### Added
 
-- provides support for the legacy datasets used in DCASE2020, 2021, 2022, and 2023.
+- Provides support for the legacy datasets used in DCASE2020, 2021, 2022, and 2023.
+
+## Truth attribute of evaluation data
+
+### Public ground truth
+
+The following code was used to calculate the official score. Among these is evaluation datasets ground truth.
+
+- [dcase2023_task2_evaluator](https://github.com/nttcslab/dcase2023_task2_evaluator)
+
+### In this repository
+
+This repository have evaluation data's ground truth csv. this csv is using to rename evaluation datasets.
+You can calculate AUC and other score if add ground truth to evaluation datasets file name. *Usually, rename function is executed along with [download script](#description) and [auto download function](#41-enable-auto-download-dataset).
+
+- [DCASE2023 task2](datasets/eval_data_list_2023.csv)
 
 ## Truth attribute of evaluation data
 
@@ -309,6 +361,3 @@ If you use this system, please cite all the following four papers:
 + Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, Shoichiro Saito, "ToyADMOS2: Another Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection under Domain Shift Conditions," in Proc. DCASE 2022 Workshop, 2022. [URL](https://dcase.community/documents/workshop2021/proceedings/DCASE2021Workshop_Harada_6.pdf)
 + Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, and Yohei Kawaguchi, "MIMII DG: sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task," in Proc. DCASE 2022 Workshop, 2022. [URL](https://dcase.community/documents/workshop2022/proceedings/DCASE2022Workshop_Dohi_62.pdf)
 + Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi and Masahiro Yasuda, "First-Shot Anomaly Sound Detection for Machine Condition Monitoring: A Domain Generalization Baseline," 2023 31st European Signal Processing Conference (EUSIPCO), Helsinki, Finland, 2023, pp. 191-195, doi: 10.23919/EUSIPCO58844.2023.10289721. [URL](https://ieeexplore.ieee.org/document/10289721)
-
-
-
diff --git a/data_download_2024add.sh b/data_download_2024add.sh
@@ -0,0 +1,9 @@
+mkdir -p "data/dcase2023t2/eval_data/raw"
+
+# download eval data
+cd "data/dcase2024t2/eval_data/raw"
+for machine_type in 3DPrinter AirCompressor Scanner ToyCircuit HoveringDrone HairDryer ToothBrush RoboticArm BrushlessMotor; do
+wget "https://zenodo.org/records/11183284/files/eval_data_${machine_type}_train.zip"
+unzip "eval_data_${machine_type}_train.zip"
+done
+
diff --git a/datasets/datasets.py b/datasets/datasets.py
@@ -104,6 +104,15 @@ def __init__(self, args):
 
 class Datasets:
     DatasetsDic = {
+        'DCASE2024T23DPrinter':DCASE202XT2,
+        'DCASE2024T2AirCompressor':DCASE202XT2,
+        'DCASE2024T2Scanner':DCASE202XT2,
+        'DCASE2024T2ToyCircuit':DCASE202XT2,
+        'DCASE2024T2HoveringDrone':DCASE202XT2,
+        'DCASE2024T2HairDryer':DCASE202XT2,
+        'DCASE2024T2ToothBrush':DCASE202XT2,
+        'DCASE2024T2RoboticArm':DCASE202XT2,
+        'DCASE2024T2BrushlessMotor':DCASE202XT2,
         'DCASE2024T2ToyCar':DCASE202XT2,
         'DCASE2024T2ToyTrain':DCASE202XT2,
         'DCASE2024T2bearing':DCASE202XT2,

diff --git a/datasets/download_path_2024.yaml b/datasets/download_path_2024.yaml
@@ -20,3 +20,30 @@ DCASE2024T2:
   valve:
     dev:
       - https://zenodo.org/record/10902294/files/dev_valve.zip
+  3DPrinter:
+    eval:
+      - https://zenodo.org/records/11183284/files/eval_data_3DPrinter_train.zip
+  AirCompressor:
+    eval:
+      - https://zenodo.org/records/11183284/files/eval_data_AirCompressor_train.zip
+  Scanner:
+    eval:
+      - https://zenodo.org/records/11183284/files/eval_data_Scanner_train.zip
+  ToyCircuit:
+    eval:
+      - https://zenodo.org/records/11183284/files/eval_data_ToyCircuit_train.zip
+  HoveringDrone:
+    eval:
+      - https://zenodo.org/records/11183284/files/eval_data_HoveringDrone_train.zip
+  HairDryer:
+    eval:
+      - https://zenodo.org/records/11183284/files/eval_data_HairDryer_train.zip
+  ToothBrush:
+    eval:
+      - https://zenodo.org/records/11183284/files/eval_data_ToothBrush_train.zip
+  RoboticArm:
+    eval:
+      - https://zenodo.org/records/11183284/files/eval_data_RoboticArm_train.zip
+  BrushlessMotor:
+    eval:
+      - https://zenodo.org/records/11183284/files/eval_data_BrushlessMotor_train.zip
diff --git a/datasets/loader_common.py b/datasets/loader_common.py
@@ -414,7 +414,7 @@ def download_raw_data(
         for split_data_path in split_data_path_list:
             shutil.copytree(split_data_path, test_data_path, dirs_exist_ok=True)
 
-    if data_type == "eval":
+    if data_type == "eval" and dataset != "DCASE2024T2":
         rename_wav(
             dataset_parent_dir=root,
             dataset_type=dataset,
@@ -468,6 +468,7 @@ def is_enabled_pickle(pickle_path):
     "DCASE2023T2_dev":"datasets/machine_type_2023_dev.yaml",
     "DCASE2023T2_eval":"datasets/machine_type_2023_eval.yaml",
     "DCASE2024T2_dev":"datasets/machine_type_2024_dev.yaml",
+    "DCASE2024T2_eval":"datasets/machine_type_2024_eval.yaml",
 }
 
 def get_machine_type_dict(dataset_name, mode=True):
@@ -480,8 +481,7 @@ def get_machine_type_dict(dataset_name, mode=True):
     elif dataset_name == "DCASE2024T2" and mode:
         yaml_path = YAML_PATH["DCASE2024T2_dev"]
     elif dataset_name == "DCASE2024T2" and not mode:
-        raise ValueError("DCASE2024T2 eval data has not been published yet.")
-        # yaml_path = YAML_PATH["DCASE2024T2_eval"]
+        yaml_path = YAML_PATH["DCASE2024T2_eval"]
     else: 
         raise KeyError()
 

diff --git a/datasets/machine_type_2024_eval.yaml b/datasets/machine_type_2024_eval.yaml
@@ -0,0 +1,30 @@
+DCASE2024T2:
+  machine_type:
+    3DPrinter:
+      eval:
+        - "00"
+    AirCompressor:
+      eval:
+        - "00"
+    Scanner:
+      eval:
+        - "00"
+    ToyCircuit:
+      eval:
+        - "00"
+    HoveringDrone:
+      eval:
+        - "00"
+    HairDryer:
+      eval:
+        - "00"
+    ToothBrush:
+      eval:
+        - "00"
+    RoboticArm:
+      eval:
+        - "00"
+    BrushlessMotor:
+      eval:
+        - "00"
+  section_keyword: section