This repository contains the code for the paper: Temporal Label Smoothing for Early Prediction of Adverse Events. It contains code from both datasets' original repositories, M3B and HiB, which we extended to extract labels at multiple horizons or add additional components.
For all our experiments, we assume a Linux installation, however, other platforms may also work:
- Install Conda, see the official installation instructions
- clone this repository and change it into the directory of the repository
conda env update
(creates an environmenttls-env
using theenvironment.yml
file)pip install -e .
(creates packagetls
)
- Get access to the HiRID 1.1.1 dataset on physionet. This entails
- getting a credentialed physionet account
- submit a usage request to the data depositor
- Once access is granted, download the following files
- unpack the files into the
hirid-data-root
directory using e.g.cat *.tar.gz | tar zxvf - -i
- Get access to MIMIC-III dataset on physionet
- getting a credentialed physionet account
- complete required training
- sign the data use agreement
- Once access is granted, download all
CSV
files provided on the page and place them in a directorymimic3-source
- Run all the steps described in M3B repository to obtain MIMIC-III
Benchmark data. You should place this data in the so-called
mimic3-data-root
folder.
Here we describe how to obtain the dataset in a format compatible with the deep learning models we use.
You can directly obtain our preprocessed version of the HiB dataset with the following steps:
- Activate the conda environment using
conda activate tls-env
. - Complete the arguments in
run_script/preprocess/hirid.sh
for--hirid-data-root
and--work-dir
. - Run pre-processing with
sh run_script/preprocess/hirid.sh
This second step wraps the following command that you can adapt to your need.
tls preprocess --dataset hirid \
--hirid-data-root [path to source] #TODO User \
--work-dir [path to output] #TODO User \
--resource-path ./preprocessing/resources/ \
--horizons 2 4 6 8 10 12 14 16 18 20 22 \
--nr-worker 8
The above command requires about 10GB of RAM per core and, in total, approximately 40GB of disk space.
Similarly, you can directly obtain our preprocessed version of the M3B dataset with the following steps:
- Activate the conda environment using
conda activate tls-env
. - Complete the arguments in
run_script/preprocess/mimic3.sh
for--mimic3-data-root
and--work-dir
. - Run pre-processing with
sh run_script/preprocess/mimic3.sh
This second step wraps the following command that you can adapt to your need.
tls preprocess --dataset mimic3 \
--mimic3-data-root [path to source] #TODO User \
--work-dir [path to output] #TODO User \
--resource-path ./preprocessing/resources/ \
--horizons 4 8 12 16 20 24 28 32 36 40 44 \
--mimic3-static-columns Height
The above command requires about 10GB of RAM per core and, in total, approximately 20GB of disk space.
The code is built around gin-config files. These files needs to be modified with the source path to the data.
You should update the files in ./configs
where you there is a #TODO User
as in the previous step.
For instance in ./configs/hirid/GRU.gin
you should insert the correct path at line 36:
train_common.data_path = [path to output of pipe] #TODO User
If you are interested in reproducing the experiments from the paper, you can directly use the pre-built scripts
in ./run_scripts/
. For instance, you can run the following command to reproduce the GRU baseline on the Circulatory
Failure task:
sh run_script/baseline/Circ/GRU.sh
this will create a new directory [path to logdir]/[task name]/[seed number]/
containing:
val_metrics.pkl
andtest_metrics.pkl
: Pickle files with the model's performance on respective validation and test sets.train_config.gin
: The so-called "operative" config allows the saving of the configuration used at training.model.torch
: The weights of the model that was trained.tensorboard/
: (Optional) Directory with tensorboard logs. One can dotensorboard --logdir ./tensorboard
to visualize them.
The pre-built scripts are divided into two categories as follows:
baseline
: This folder contains scripts to reproduce the main benchmark experiment. Each of them will run a model with the best parameters we provide for ten identical seeds.hp-search
: This folder contains the scripts we used to search hyperparameters for our method and baselines.
For a trained model, you can evaluate any previously trained model using the evaluate
as follows:
tls evaluate -c [path to gin config] \
-l [path to logdir] \
-t [task name] \
This command will evaluate the model at [path to logdir]/[task name]/model.torch
on the test set of the dataset
provided in the config. Results are saved to the test_metrics.pkl
file.