GitHub - bafulton/readmission-nlp: Predict hospital readmissions using NLP on doctors' discharge notes.

Predicting Hospital Readmissions Using Natural Language Processing

Navigate to the "code" folder and use the command "sbt run". Four pretrained models have been included in our code submission, corresponding to the four models listed in Figure 3 of our report. The logistic regression is currently set to run. To run a different learner, simply use the command: sbt "run [model]", where [model] is "lr" (logistic regression), "gbt" (gradient boosted trees), "mlp" (multilayer perceptron), or "rf" (random forest). The resulting metrics will be printed to the console, and the predictions saved to "data/test_predictions.csv".
The scala code should have automatically generated charts in the "graphics" folder. If for some reason that failed, navigate to the "analysis" folder and run "./make_charts.sh". The resulting charts will be saved to the "graphics" folder. An "environment.yml" file has been included in the "analysis/chart_maker" directory.

Directory and file location set up under folder "CSE6250-NLP", create a folder called "data", put the following three MIMIC III csv files under this folder: NOTEEVENTS.csv, ADMISSIONS.csv, PATIENTS.dsv.
Run "python preprocessing/notes_preprocessing.py", and wait until the cleaned up csv file is generated.

combined.parquet -- parquet file which compiled information from NOTEEVENTS.csv and ADMISSIONS.csv
feature_importance.csv
feature_map.parquet -- feature map in parquest format
feature_map.csv -- feature map in csv format
train.parquet -- training dataset in parquet format
test.parquet -- test dataset in parquet format
predictions_train.csv -- predication results for training dataset
predictions_validate.csv -- prediction results for validating dataset
predictions_test.csv -- prediction results for testing dataset

A folder is generated named based on the learner name and balancing method. For example: RegressionLearnerWithCV.
Under the newly generated folder, there are four folders: bestModel, estimator, evaluator, metadata.
Under bestModel and estimator folder, two new folders are generated: metadata, stages.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
code		code
dictionaries		dictionaries
performance		performance
preprocess		preprocess
report		report
README.md		README.md