Comparing Modeling Techniques for Predicting Absence:
A Case Study from Regionshospital Gødstrup 🏥 ♥️ 🤖

The present repository contains the code develped for an exam paper in the course Data Science, Predicting, and Forecasting at the Master's in Cognitive Science, Aarhus University by Klara Fomsgaard and Laura Paaby.

Data availability

Due to privacy restrictions, the analyzed data is not included in the current repository. Access may be granted upon request, with joined consent from Gødstrup Sygehus and the authors.

Setup

Step 1 Run setup.sh

To replicate the setup, we have included a bash script that automatically

Creates a virtual environment for the project
Activates the virtual environment
Installs the correct versions of the packages required
Runs the script
Deactivates the virtual environment

Usage

Regression Models and Feature Importance

Step 2 Run data_prep_1.ipynb, data_prep_2.ipynb and descriptive_plots_and_data_split.ipynb
Running these notebooks will:

Preprocess and clean data
Generate additional features
Scale independent variables
Split the data into train (80%) and test (20%) subsets
Visualize the raw data

Step 3 Run regressors_GRID.py
This script conducts a comprehensive grid search across all regressors, identifying and storing the optimal parameters that yield the highest performance in RegMod_Performance.

Step 4 Run fitting_best_params.py
This script fits all models using their optimal parameters determined previously. The performance metrics ($R^2$, $MAE$, $RMSE$) for these models are evaluated on the test dataset and recorded.

Step 5 Run baselinemodel.py
This script creates two baselinemodels:

A model which always predicts the mean of the target
A model which always predicts the a value corresponding to the previous datapoint

Step 6 Run feature_imp.py
This script calculates the permutation feature importance and their standard deviations for the two top-performing models, XGBoost and Random Forest, and stores the results.

Step 7 Run plot_script.R
This R script generates visualizations of the feature importances and the models’ predictions in comparison to the actual data values. The visualizations are stored in ./plots.

Time Series Prediction and Forecasting

Step 1 Run forecasting_subset.py
This script fits a Prophet forecasting model for selected groups in the emergency department:

Medical staff
Nursing staff
Administrative staff The script generates plots both for the entire timeseries and a subset including data and predictions from 2024-, and stores them in 'forecasting_plots'.

Enjoy! 😉

Repository Overview

.
├── data_prep/                                  <--- folder containing scripts related to data prep and data visualization
│   ├── data_prep_1.ipynb
│   ├── data_prep_2.ipynb
│   └── descriptive_plots_and_data_split.ipynb
│
├── plots/                                      <--- folder containing plots from feature importance analysis
├── Reg_Model_Performance/                      <--- folder with results from model comparison and feature importance
│   └── BestParameters/                         <--- folder containing the best parameters
│
├── time_series_prophet/                        <--- folder containing timeseries analysis and forecasting using Prophet
│   ├── forecasting_plots/
│   ├── create_plot_grids.py
│   ├── forecast_subset.py
│   └── helper_functions_forecasting.py
│
├── .gitignore
├── README.md
├── baselinemodels.py
├── feature_imp.py                                      
├── fitting_best_params.py   
├── plot_script.R
├── regressors_GRID.py
├── requirements.txt            
└── setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing Modeling Techniques for Predicting Absence:
A Case Study from Regionshospital Gødstrup 🏥 ♥️ 🤖

Data availability

Setup

Usage

Regression Models and Feature Importance

Time Series Prediction and Forecasting

Repository Overview

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
RegMod_Performance		RegMod_Performance
data_prep		data_prep
plots		plots
time_series_prophet		time_series_prophet
.gitignore		.gitignore
README.md		README.md
baselinemodels.py		baselinemodels.py
feature_imp.py		feature_imp.py
fitting_best_params.py		fitting_best_params.py
plot_script.R		plot_script.R
regressors_GRID.py		regressors_GRID.py
requirements.txt		requirements.txt
setup.sh		setup.sh

laurawpaaby/PredictingAbsence_DataScience2024

Folders and files

Latest commit

History

Repository files navigation

Comparing Modeling Techniques for Predicting Absence: A Case Study from Regionshospital Gødstrup 🏥 ♥️ 🤖

Data availability

Setup

Usage

Regression Models and Feature Importance

Time Series Prediction and Forecasting

Repository Overview

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Comparing Modeling Techniques for Predicting Absence:
A Case Study from Regionshospital Gødstrup 🏥 ♥️ 🤖

Packages