This project implements and compares a number of different models for predicting loan default and loss at default. We use data from the U.S. SBA 504 loan program, consisting of 150,000 loans issued between 1990 and 2014. We augment our data with several macroeconomic factors, including the Consumer Price Index and yearly S&P 500 returns.
The data_processed_final folder contains our final processed data, created with data_processed_hujia.ipynb. data_exploration.ipynb contains code for preliminary analysis and generating exploratory graphs.
The logistic model.ipynb notebook in logistic_model folder contains code for tuning and analyzing our logistic model. logistic_roc.csv is the validation ROC curve.
The neural_network folder contains our attempts at implementing a binary classification neural network. NNprocessing.py contains neural network-specific preprocessing. static_net.py and dynamic_net.py are first attempts, exploring PyTorch's support for dynamic computational graphs. default_net.py contains our final implementation, which uses batch normalization, dropout, and Adam gradient descent. nn_eval.py analyzes our model parameters and tests its validation performance. Unfortunately, were were unable to implement a fully functioning neural network.
The hazard model is in the hazard_lifelines_michelle.ipynb notebook in the data_processed_final folder.
The loss model is in the loss folder, in loss_model_michelle.ipynb. The 1_and_5_year_loss_michelle.ipynb notebook contains the tranche loss simulation code. Generated graphs in said notebook were also screenshotted and placed in the graphs folder.