credit-risk-modeling

Predicting the probability of an applicant paying back the loan.This repository aims to analyze data from different types of personal loans and apply machine learning algorithms to develop a credit risk predictor.

📈 Goal

Predict whether a loan application should be approved or not based on the probability of credit default. I use the following models:

Random Forest
XGBoost with Incremental Learning

Installing Required Packages

To install all the required python packages run the following code on linux terminal.

  pip install -r requirements.txt

📊 Exploratory Data Analysis

Data

Kaggle

The dataset represents over 300 k personal home loans.

Each row represents one loan.

Data Preprocessing

Data preprocessing is an essential step in preparing the data for analysis and modeling. It involves transforming the raw data into a format that is suitable for machine learning algorithms. In this project, we followed the data preprocessing steps below:

Handling Outliers: Used Z-score to identify the outliers in the numerical features. Call the module

from feature_engineering import outliers

Handling missing values: Missing data can have a reason for missing, therefore its important to understand the properties of the missing values. I used missingno python library to analyze and visualize the missing values. For some features with missing values, I created an extra column indicating whether a value is missing or not. Then I compared the model performances for different imputation techniques. Imputation techniques used:
- Median and Mode Imputation
- MICE (Multivariate Imputation by Chained Equation)
- Median and Mode Imputation combined mean Imputation

Class Distribution Analysis

We checked for class imbalance for the target variable.

Feature Engineering

You can find all the processes we implemented in this section, in feature-engeering.ipynb.

Data Encoding

First, we used One-Hot encoding for cateogorical data that does not have a hierarchical structure. Other categorical data with a hierarch, I implemented Ordinal Encoding.

Feature Selection

We applied two alogrithms seperately to select important features

Recursive Feature Elimination (RFE)
Univariate Feature Selection : ANOVA F-value
Information Value(IV) and Weight of evidence (WoE)
Correlation
Variance Threshold
Boruta

Results

Performances

We used ROC-auc score as the main metric to evaluate the performance of a model.

To find the best threshold we caluclated the threshold for the maximum of Young's J Statistic.

Feature Importance

We utilized 3 different methods to evaluate feature importance.

Scikit-learn's Feature importance: averaging the decrease in impurity over trees
Permutation Feature Importance: based on how random re-shuffling of each perdictor influences the model performances.
SHAP

Conclusion

Best model achieved a 0.69 ROC-auc score.
PR-ROC Score is 140 % better than the baseline model (Due to the imbalance in classes in target variable, I computed Precission-Recall Curve)

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
model_runs/xgboost		model_runs/xgboost
models		models
results		results
src		src
.gitignore		.gitignore
Credit-Default-Predictor-Notebook		Credit-Default-Predictor-Notebook
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

credit-risk-modeling

📈 Goal

Installing Required Packages

📊 Exploratory Data Analysis

Data

Data Preprocessing

Class Distribution Analysis

Feature Engineering

Data Encoding

Feature Selection

Results

Performances

Feature Importance

Conclusion

About

Releases

Packages

Languages

hasi03/credit-risk-modeling

Folders and files

Latest commit

History

Repository files navigation

credit-risk-modeling

📈 Goal

Installing Required Packages

📊 Exploratory Data Analysis

Data

Data Preprocessing

Class Distribution Analysis

Feature Engineering

Data Encoding

Feature Selection

Results

Performances

Feature Importance

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages