DejaVu-Omni

Code

Install

All the software requirements are already pre-installed in the Docker image below. The requirements are also listed in requirements.txt and requirements-dev.txt. Note that DGL 0.8 is not released yet when I did this work, so I installed DGL 0.8 manually from the source code. PyTorch version should be equal to or greater than 1.11.0.
```
docker pull lizytalk/dejavu
```

Pull the code from GitHub

git pull https://github.com/NetManAIOps/DejaVu-Omni.git DejaVu-Omni

Download the datasets following the link in the GitHub repo and extract the datasets into ./DejaVu-Omni/data
I use the command realpath in the example commands below, which is not bundled in macOS and Windows. On macOS, you can install it by brew install coreutils.

Start a Docker container with our image and enter its shell

docker run -it --rm -v $(realpath DejaVu-Omni):/workspace lizytalk/dejavu bash

Run direnv allow in the shell of the Docker container to set the environment variables.
Run experiments in the shell of the Docker container following the usage table as follows.

Usage

Root Cause Localization (RCL) & Failure Classification (FC)

For RCL and FC, we can use three datasets (A, B, and C). All scripts for DejaVu-Omni and baselines can be found in scripts/failure_diagnosis. Run bash scripts/failure_diagnosis/run_all.sh for all algorithms on three datasets.

The commands would print a one-line summary in the end, including the following fields: RCL_A@1, RCL_A@2, RCL_A@3, RCL_A@5, RCL_MAR, FC_Precision, FC_Recall, FC_F1, Time, Epoch, Valid Epoch, output_dir, val_loss, val_MAR, val_A@1, command, git_commit_url, which are the desrired results.

Unseen Failure Type Detection (UFTD)

For UFTD, we can use three datasets (A, B, and C). All scripts for DejaVu-Omni and baselines can be found in scripts/uftd. Run bash scripts/unseen_failure_type_detection/run_all.sh for all algorithms on three datasets.

Results will be found in a f1.1_score.txt in the output directory.

System Change Adaption (SCA)

For SCA, we can dataset D. Firstly drift metrics using notebooks/system_change_adaption.ipynb, and then run RCL and FC for failures after system changes. All scripts for DejaVu-Omni and baselines can be found in scripts/system_change_adaption. Run bash scripts/system_change_adaption/run_all.sh for all algorithms.

The commands would print a one-line summary, including the following fields: drift_RCL_A@1, drift_RCL_A@2, drift_RCL_A@3, drift_RCL_A@5, drift_RCL_MAR, drift_FC_Precision, drift_FC_Recall, drift_FC_F1, non_drift_RCL_A@1, non_drift_RCL_A@2, non_drift_RCL_A@3, non_drift_RCL_A@5, non_drift_RCL_MAR, non_drift_FC_Precision, non_drift_FC_Recall, non_drift_FC_F1, RCL_A@3_down, RCL_MAR_up, FC_F1_down,Time, Epoch, Valid Epoch, output_dir, val_loss, val_MAR, val_A@1, command, git_commit_url, which are the desrired results.

Datasets

The datasets A, B, C, D are public at :

https://www.dropbox.com/scl/fo/2jl3iem1dfuo3s7na7ebg/ALeZNJrcSg_jWvZsyPhBVMA?rlkey=ccfy5tnuwl18smrxt2m5lkgie&st=5q8yhby7&dl=0 In each dataset, graph.yml or graphs/*.yml are FDGs, metrics.csv and metrics.pkl are metrics, metrics.norm.csv and metrics.norm.pkl are normalized metrics, and faults.csv is failures (including ground truths). FDG.pkl is a pickle of the FDG object, which contains all the above data. Note that the pickle files are not compatible in different Python and Pandas versions. So if you cannot load the pickles, just ignore and delete them. They are only used to speed up data load. Particulally, in dataset D which is used for concept drift adaption after system changes, system_change.json contains a change list, in which present a change with its start time and end time.

Deployment and Failure Injection Scripts of Train-Ticket

https://github.com/lizeyan/train-ticket

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ART		ART
DejaVu		DejaVu
DiagFusion		DiagFusion
Eadro		Eadro
JSS20		JSS20
exp		exp
explib		explib
failure_dependency_graph		failure_dependency_graph
failure_instance_feature_extractor		failure_instance_feature_extractor
figures		figures
iSQUAD		iSQUAD
metric_preprocess		metric_preprocess
notebooks		notebooks
random_walk_failure_instance		random_walk_failure_instance
random_walk_single_metric		random_walk_single_metric
scripts		scripts
utils		utils
.envrc		.envrc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
causal-cmd.log		causal-cmd.log
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DejaVu-Omni

Table of Contents

Code

Install

Usage

Root Cause Localization (RCL) & Failure Classification (FC)

Unseen Failure Type Detection (UFTD)

System Change Adaption (SCA)

Datasets

Deployment and Failure Injection Scripts of Train-Ticket

About

Releases

Packages

Languages

License

NetManAIOps/DejaVu-Omni

Folders and files

Latest commit

History

Repository files navigation

DejaVu-Omni

Table of Contents

Code

Install

Usage

Root Cause Localization (RCL) & Failure Classification (FC)

Unseen Failure Type Detection (UFTD)

System Change Adaption (SCA)

Datasets

Deployment and Failure Injection Scripts of Train-Ticket

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages