Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting

This is the repository for the paper "Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting" accepted at MLHC 2024.

Paper: LINK to be done

Abstract

Machine learning for early syndrome diagnosis aims to solve the intricate task of predicting a ground truth label that most often is the outcome (effect) of a medical consensus definition applied to observed clinical measurements (causes), given clinical measurements observed several hours before.

Instead of focusing on the prediction of the future effect, we propose to directly predict the causes via time series forecasting (TSF) of clinical variables and determine the effect by applying the gold standard consensus definition to the forecasted values.

This method has the invaluable advantage of being straightforwardly interpretable to clinical practitioners, and because model training does not rely on a particular label anymore, the forecasted data can be used to predict any consensus-based label.

We exemplify our method by means of long-term TSF with Transformer models, with a focus on accurate prediction of sparse clinical variables involved in the SOFA-based Sepsis-3 definition and the new Simplified Acute Physiology Score (SAPS-II) definition.

Our experiments are conducted on two datasets and show that contrary to recent proposals which advocate set function encoders for time series and direct multi-step decoders, best results are achieved by a combination of standard dense encoders with iterative multi-step decoders.

The key for success of iterative multi-step decoding can be attributed to its ability to capture cross-variate dependencies and to a student forcing training strategy that teaches the model to rely on its own previous time step predictions for the next time step prediction.

Important Remarks

To create this repository, we used code from github.com/sindhura97/STraTS/ and github.com/cure-lab/LTSF-Linear.

However, the STraTS code got a major update between submission and acceptance of this paper. Whereas before it contained the code for STraTS in the framework Keras , it now contains a PyTorch implementation. In our work, we performed our own conversion of the STraTS code to PyTorch.

Conda Environment Setup

In theory, the following should be enough to reproduce our environments:

conda create --name strats_pytorch --file env.txt
conda activate strats_pytorch
pip install -r requirements.txt

Data Preprocessing

To use this repository, you need access to the MIMIC-III (Johnson et al., 2016) and eICU (Pollard et al., 2019) datasets.

For MIMIC-III, you need to run the following preprocessing scripts in the mimic_experiments folder:

mimic3_01_preprocess_icu.py
mimic3_02_preprocess_pickle.py
mimic3_03_susinf.py

For eICU, you need to run the following preprocessing scripts in the eicu_experiments folder:

eicu_01_preprocessing.ipynb
eicu_02_pickle2triplet.py
eicu_03_pickle2dense.py
eicu_04_susinf.py

Model Training

To train the models on the preprocessing MIMIC-III data, please run the following jupyter notebooks in the mimic_experiments folder:

sparse_train.ipynb
dense_train.ipynb

For eICU, run all models in eicu_experiments/model_files/

Additional Experiment

To reproduce our direct forecast of sepsis model, run

mimic_experiments/mimic_dense_regression.ipynb
eicu_experiments/eicu_dense_regression.py

Model Evaluation

To evaluate our models, run the following Jupyter notebooks for MIMIC-III in the mimic_experiments folder:

mimic_eval_sample.ipynb  # calculates MSE scores on the complete test data
mimic_eval_sepsis.ipynb  # calculates Sepsis scores on the filtered test data
mimic_eval_sapsii.ipynb  # calculates SAPS-II scores on the complete test data

Additional Experiment Analysis:
mimic_regression_eval_mse.ipynb    # MSE for the complete test data
mimic_regression_eval_sepsis.ipynb # Sepsis for the filtered test data

And the following ones for eICU in the eicu_experiments folder:

eicu_eval_sample.ipynb
eicu_eval_sapsii.ipynb
eicu_eval_sepsis.ipynb

Additional Experiment Analysis:
eicu_regression_eval_sepsis.ipynb
eicu_regression_eval_mse.ipynb

Citation

When using this paper or code, please use:

@article{staniekETAL24causes,
  title={Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting},
  author={Staniek, Michael and Fracarolli, Marius and Hagmann, Michael and Riezler, Stefan},
  journal={Proceedings of Machine Learning Research},
  volume={252},
  pages={1--29},
  year={2024},
  publisher={}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
eicu_experiments		eicu_experiments
mimic_experiments		mimic_experiments
240409_predict_label.png		240409_predict_label.png
LICENSE		LICENSE
README.md		README.md
env.txt		env.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting

Table of Contents

Abstract

Important Remarks

Conda Environment Setup

Data Preprocessing

Model Training

Additional Experiment

Model Evaluation

Citation

About

Releases

Packages

Languages

License

StatNLP/mlhc_2024_prediction_of_causes

Folders and files

Latest commit

History

Repository files navigation

Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting

Table of Contents

Abstract

Important Remarks

Conda Environment Setup

Data Preprocessing

Model Training

Additional Experiment

Model Evaluation

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages