Skip to content

This repository was used for my thesis. The goal was to find a biased dataset, and mitigate its bias. That is done under the patients directory. Check the README file for more.

Notifications You must be signed in to change notification settings

steve12512/thesis

Repository files navigation

The purpose of this project was to find a biased dataset, and try different methods to mitigate its bias. That is done under the patients directory, in the patients.ipynb file. The dataset used is this; https://www.kaggle.com/datasets/majdmustafa/diabetes-hospital-readmission-dataset We firstly train a Random Forest Classifier on the dataset, in order to predict whether or not a patient will be readmitted to a hospital. We then notice a disparity between the outcomes, and the true positive/negative, false positive/negative results for different genders, and especially races. We then try to mitigate such unfair-biased predictions using different algorithms. We first use preprocessing techniques, such as reweighting and resampling. We then use in processing techniques, such as fairness constraints(demographic parity and equalized odds), and an Adversial Debiasing Model. Lastly, we use post processing techniques, such as a Threshold Optimizer. We then compare the resuts, and the different trade offs they induce between accuracy and fairness.

The rest of the directories contain different attempts to find bias. The greek directory contains the training of an nlp model(word2vec) on a corpus on classical Greek literature.

About

This repository was used for my thesis. The goal was to find a biased dataset, and mitigate its bias. That is done under the patients directory. Check the README file for more.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published