MIMIC-III preprocessing for Dynamic Measurement Scheduling for Event Forecasting using Deep RL (ICML 2019)

3 Steps to preprocess the data:

Run this repo by following the instructions below first (it's cloned around Jan. 2018). But when creating the dataset use mingjie_create_in_hospital_mortality.py to create.
Follow the official MIMIC repo to run the concept comorbidity with file elixhauser_ahrq (https://github.com/MIT-LCP/mimic-code/tree/master/concepts). You need to setup sql to run this part. After that export the table view to a csv file as elixhauser_ahrq.csv
Run the 2 notebooks in the notebooks/. Just change the file path pointing toward your generated files and folder.

Then you can run the code part in https://github.com/zzzace2000/autodiagnosis.

MIMIC-III Benchmarks

Python suite to construct benchmark machine learning datasets from the MIMIC-III clinical database. Currently, we are focused on building a multitask learning benchmark dataset that includes four key inpatient clinical prediction tasks that map onto core machine learning problems: prediction of mortality from early admission data (classification), real-time detection of decompensation (time series classification), forecasting length of stay (regression), and phenotype classification (multilabel sequence classification).

News

2017 March 23: We are pleased to announce the first official release of these benchmarks. We expect to release a revision within the coming months that will add at least ~50 additional input variables. We are likewise pleased to announce that the manuscript associated with these benchmarks is now available on arXiv.

Citation

If you use this code or these benchmarks in your research, please cite the following publication: Hrayr Harutyunyan, Hrant Khachatrian, David C. Kale, and Aram Galstyan. Multitask Learning and Benchmarking with Clinical Time Series Data. arXiv:1703.07771 which is now available on arXiv. This paper is currently under review for SIGKDD and if accepted, the citation will change. Please be sure also to cite the original MIMIC-III paper.

Motivation

Despite rapid growth in research that applies machine learning to clinical data, progress in the field appears far less dramatic than in other applications of machine learning. In image recognition, for example, the winning error rates in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) plummeted almost 90% from 2010 (0.2819) to 2016 (0.02991). There are many reasonable explanations for this discrepancy: clinical data sets are inherently noisy and uncertain and often small relative to their complexity, and for many problems of interest, ground truth labels for training and evaluation are unavailable.

However, there is another, simpler explanation: practical progress has been difficult to measure due to the absence of community benchmarks like ImageNet. Such benchmarks play an important role in accelerating progress in machine learning research. For one, they focus the community on specific problems and stoke ongoing debate about what those problems should be. They also reduce the startup overhead for researchers moving into a new area. Finally and perhaps most important, benchmarks facilitate reproducibility and direct comparison of competing ideas.

Here we present four public benchmarks for machine learning researchers interested in health care, built using data from the publicly available Medical Information Mart for Intensive Care (MIMIC-III) database (paper, website). Our four clinical prediction tasks are critical care variants of four opportunities to transform health care using in "big clinical data" as described in Bates, et al, 2014:

early triage and risk assessment, i.e., mortality prediction
prediction of physiologic decompensation
identification of high cost patients, i.e. length of stay forecasting
characterization of complex, multi-system diseases, i.e., acute care phenotyping

In Harutyunyan, Khachatrian, Kale, and Galstyan 2017, we propose a multitask RNN architecture to solve these four tasks simultaneously and show that this model generally outperforms strong single task baselines.

Requirements

We do not provide the MIMIC-III data itself. You must acquire the data yourself from https://mimic.physionet.org/. Specifically, download the CSVs. Otherwise, generally we make liberal use of the following packages:

numpy
pandas

For logistic regression baselines sklearn is required. LSTM models use Theano/Lasagne.

Building a benchmark

Here are the required steps to build the benchmark. It assumes that you already have MIMIC-III dataset (lots of CSV files) on the disk.

Clone the repo.

git clone https://github.com/YerevaNN/mimic3-benchmarks/
cd mimic3-benchmarks/

Add the path to the PYTHONPATH (sorry for this).

export PYTHONPATH=$PYTHONPATH:[PATH TO THIS REPOSITORY]

The following command takes MIMIC-III CSVs, generates one directory per SUBJECT_ID and writes ICU stay information to data/[SUBJECT_ID/stays.csv, diagnoses to data/[SUBJECT_ID]/diagnoses.csv, and events to data/[SUBJECT_ID]/events.csv. This step might take around an hour.
```
python scripts/extract_subjects.py [PATH TO MIMIC-III CSVs] data/root/
```
The following command attempts to fix some issues (ICU stay ID is missing) and removes the events that have missing information. 4741761 events (80%) remain after removing all suspicious rows.
```
python scripts/validate_events.py data/root/
```
The next command breaks up per-subject data into separate episodes (pertaining to ICU stays). Time series of events are stored in [SUBJECT_ID]/episode{#}_timeseries.csv (where # counts distinct episodes) while episode-level information (patient age, gender, ethnicity, height, weight) and outcomes (mortality, length of stay, diagnoses) are stores in [SUBJECT_ID]/episode{#}.csv. This script requires two files, one that maps event ITEMIDs to clinical variables and another that defines valid ranges for clinical variables (for detecting outliers, etc.).
```
python scripts/extract_episodes_from_subjects.py data/root/
```
The next command splits the whole dataset into training and testing sets. Note that all benchmarks use the same split:
```
python scripts/split_train_and_test.py data/root/
```

The following commands will generate task-specific datasets, which can later be used in models. These commands are independent, if you are going to work only on one benchmark task, you can run only the corresponding command.

python scripts/create_in_hospital_mortality.py data/root/ data/in-hospital-mortality/
python scripts/create_decompensation.py data/root/ data/decompensation/
python scripts/create_length_of_stay.py data/root/ data/length-of-stay/
python scripts/create_phenotyping.py data/root/ data/phenotyping/
python scripts/create_multitask.py data/root/ data/multitask/

I add my own datasets.

python scripts/mingjie_create_in_hospital_mortality.py data/root/ data/my-mortality/

Working with baseline models

For each of the 4 main tasks we provide logistic regression and LSTM baselines. Please note that running linear models can take hours because of extensive grid search. You can change the chunk_size parameter in codes and they will became faster (of course the performance will not be the same).

Train / validation split

Use the following command to extract validation set from the traning set. This step is required for running the baseline models.

   python mimic3models/split_train_val.py [TASK]

[TASK] is either in-hospital-mortality, decompensation, length-of-stay, phenotyping or multitask.

In-hospital mortality prediction

Run the following command to train the neural network which gives the best result. We got the best performance on validation set after 8 epochs.

   cd mimic3models/in_hospital_mortality/
   python -u main.py --network lstm --dim 256 --timestep 2.0 --mode train --batch_size 8 --log_every 30

To test the model use the following:

   python -u main.py --network lstm --dim 256 --timestep 2.0 --mode test --batch_size 8 --log_every 30 --load_state best_model.state

Use the following command to train logistic regression. The best model we got used L2 regularization with C=0.001:

   cd mimic3models/in_hospital_mortality/logistic/
   python -u main.py --l2 --C 0.001

Decompensation prediction

The best model we got for this task was trained for 110 chunks (that's less than one epoch; it overfits before reaching one epoch because there are many training samples for the same patient with different lengths).

   cd mimic3models/decompensation/
   python -u main.py --network lstm --dim 256 --mode train --batch_size 8 --log_every 30

Here is the command to test:

   python -u main.py --network lstm --dim 256 --mode test --batch_size 8 --log_every 30 --load_state best_model.state

Use the following command to train a logistic regression. It will do a grid search over a small space of hyperparameters and will report the scores for every case.

   cd mimic3models/decompensation/logistic/
   python -u main.py

Length of stay prediction

The best model we got for this task was trained for 15 chunks.

   cd mimic3models/length_of_stay/
   python -u main.py --network lstm_cf_custom --dim 256 --mode train --batch_size 8 --log_every 30

Run the following command to test the best pretrained neural network.

   python -u main.py --network lstm_cf_custom --dim 256 --mode test --batch_size 8 --log_every 30 --load_state best_model.state

Use the following command to train a logistic regression. It will do a grid search over a small space of hyperparameters and will report the scores for every case.

   cd mimic3models/length_of_stay/logistic/
   python -u main_cf.py

Phenotype classification

The best model we got for this task was trained for 30 epochs.

   cd mimic3models/phenotyping/
   python -u main.py --network lstm_2layer --dim 512 --mode train --batch_size 8 --log_every 30

Use the following command for testing:

   python -u main.py --network lstm_2layer --dim 512 --mode test --batch_size 8 --log_every 30 --load_state best_model.state

Use the following command for logistic regression. It will do a grid search over a small space of hyperparameters and will report the scores for every case.

   cd mimic3models/phenotyping/logistic/
   python -u main.py

Multitask learning

ihm_C, decomp_C, los_C and ph_C coefficients control the relative weight of the tasks in the multitask model. Default is 1.0. The best model we got was trained for 12 epochs.

   cd mimic3models/multitask/
   python -u main.py --network lstm --dim 1024 --mode train --batch_size 8 --log_every 30 --ihm_C 0.02 --decomp_C 0.1 --los_C 0.5

Use the following command for testing:

   python -u main.py --network lstm --dim 1024 --mode test --batch_size 8 --log_every 30 --load_state best_model.state

General todos:

Test and debug
Add comments and documentation
Refactor, where appropriate, to make code more generally useful
Expand coverage of variable map and variable range files.
Decide whether we are missing any other high-priority data (CPT codes, inputs, etc.)

More on validating results

Here are the problems identified by validate_events.py on randomly chosen 1000 subjects:

Type	Description	Number of rows
`n_events`	total number of events	5937206
`nohadminstay`	HADM_ID does not appear in `stays.csv`	836341
`emptyhadm`	HADM_ID is empty	126480
`icustaymissinginstays`	ICUSTAY_ID does not appear in `stays.csv`	232624
`noicustay`	ICUSTAY_ID is empty	347768
`recovered`	empty ICUSTAY_IDs are recovered according to `stays.csv` files (given `HADM_ID`)	347768
`couldnotrecover`	empty ICUSTAY_IDs that are not recovered. This should be zero, because the unrecoverable ones are counted in `icustaymissinginstays`	0

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
mimic3benchmark		mimic3benchmark
mimic3models		mimic3models
notebooks		notebooks
resources		resources
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run.sh		run.sh
statistics.md		statistics.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIMIC-III preprocessing for Dynamic Measurement Scheduling for Event Forecasting using Deep RL (ICML 2019)

3 Steps to preprocess the data:

MIMIC-III Benchmarks

News

Citation

Motivation

Requirements

Building a benchmark

Working with baseline models

Train / validation split

In-hospital mortality prediction

Decompensation prediction

Length of stay prediction

Phenotype classification

Multitask learning

General todos:

More on validating results

About

Releases

Packages

Contributors 7

Languages

License

zzzace2000/mimic-preprocess

Folders and files

Latest commit

History

Repository files navigation

MIMIC-III preprocessing for Dynamic Measurement Scheduling for Event Forecasting using Deep RL (ICML 2019)

3 Steps to preprocess the data:

MIMIC-III Benchmarks

News

Citation

Motivation

Requirements

Building a benchmark

Working with baseline models

Train / validation split

In-hospital mortality prediction

Decompensation prediction

Length of stay prediction

Phenotype classification

Multitask learning

General todos:

More on validating results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages