This is an repository hosting the code of our paper: Imputation-based Time-Series Anomaly Detection with Conditional Weight-Incremental Diffusion Models, KDD 2023. https://dl.acm.org/doi/abs/10.1145/3580305.3599391
@inproceedings{xiao2023imputation,
title={Imputation-based Time-Series Anomaly Detection with Conditional Weight-Incremental Diffusion Models},
author={Xiao, Chunjing and Gou, Zehua and Tai, Wenxin and Zhang, Kunpeng and Zhou, Fan},
booktitle={Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages={2742--2751},
year={2023}
}
- PSM (PooledServer Metrics) is collected internally from multiple application server nodes at eBay. You can learn about it from Practical Approach to Asynchronous Multivariate Time Series Anomaly Detection and Localization .
- MSL (Mars Science Laboratory rover) is a public dataset from NASA. You can learn about it from Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding.
- SMAP (Soil Moisture Active Passive satellite) also is a public dataset from NASA. You can learn about it from Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding.
- SMD (Server Machine Dataset) is a 5-week-long dataset collected from a large Internet company. You can learn about it from Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network .
- SWaT (Secure Water Treatment) is obtained from 51 sensors of the critical infrastructure system under continuous operations. You can learn about it from SWaT: a water treatment testbed for research and training on ICS security .
Install Python 3.8.
pip install -r requirements.txt
By default, datasets are placed under the "tf_dataset" folder. If you need to change the dataset, you can modify the dataset path in the json file in the "config" folder. Here is an example of modifying the training dataset path:
"datasets": {
"train|test": {
"dataroot": "tf_dataset/smap/smap_train.csv",
//"dataroot": "tf_dataset/swat/swat_train.csv"
}
},
In addition, we provide json configuration files for two datasets (SMAP and PSM) for reference.
Next, we demonstrate using the SMAP dataset.
# Use time_train.py to train the task.
# Edit json files to adjust dataset path, network structure and hyperparameters.
python time_train.py -c config/smap_time_train.json
The trained model is placed in "experiments/*/checkpoint/" by default. If you need to modify this path, you can refer to "config/smap_time_test.json":
"path": {
"resume_state": "experiments/SMAP_TRAIN_128_2048_100/checkpoint/E100"
},
# Edit json to adjust pretrain model path and dataset_path.
python time_test.py -c config/smap_time_test.json
The GPU we use is NVIDIA RTX3090 24GB, the training time is about 1 hour, and the test time is about half an hour. The following is the F1-score obtained after testing the SMAP dataset.