Traffic Violations

Authors: Mykyta Alekseiev, Elizaveta Barysheva, Joao Melo, Thomas Schneider, Harshit Shangari and Maria Stoelben

Description

The goal of this project is to predict a binary variable using white and black box models. Subsequently, the performance and fairness of the models with respect to certain protected features will be analysed. The protected attributes that will be focused on here are gender and race. Moreover, the models' predictions will be analysed with methods for interpretability.

Data

For this project a dataset of traffic violations in Maryland, USA was selected. You can download the data here. The .arff should be placed in a data/ folder in the root of your repository.

The processed data contains 65'203 instances with 15 columns, where 5 columns are categorical and the rest binary or numeric. The target column is Citation, which is equal to 1 when a citation was given by an officer and 0 if only a warning was declared.

Setup

Create a virtual environment and install the requirements:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .
pre-commit install

Data Preproessing

Check out the jupyter notebooks to understand the data the preprocessing decisions.

To run the data preprocessing and get a data.csv output for the following parts, run:

python -m spacy download en_core_web_sm
python src/data_preprocessing/data_preprocessor.py

Modeling

The parameters can be changed in the config/config_modeling.py. The data is seperated into 60% training and 20% validation and testing each by default.

Run the training with mlflow tracking with the following command:

python src/modeling/main.py

Results

The model selection was performed on the validation data. Below the results are displayed for white and black box models.

Model	Train AUC	Val AUC	Test AUC	Test Accuracy	Test F1 Score
XGB	0.898	0.866	0.860	0.778	0.748
Random Froest	0.873	0.849	0.843	0.764	0.728
Decision Tree	0.825	0.818	0.818	0.742	0.703
GAM	0.805	0.814	0.805	0.730	0.705
Logistic Regression	0.645	0.652	0.641	0.600	0.559
ANN	0.641	0.649	0.637	0.537	0.097

Explainability and fairness

If you are interested in our conclusions regarding how our model works and if it is fair to different protected attributes, please check within the notebooks folder the explanation and fairness subfolders, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
config		config
models		models
notebooks		notebooks
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Traffic Violations

Description

Data

Setup

Data Preproessing

Modeling

Results

Explainability and fairness

About

Releases

Packages

Languages

License

DataThomas/project-fairness-interpretability

Folders and files

Latest commit

History

Repository files navigation

Traffic Violations

Description

Data

Setup

Data Preproessing

Modeling

Results

Explainability and fairness

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages