You can find the Analysis file here: credit_risk_resampling.ipynb | credit_risk_ensemble.ipynb
We create and analyse multiple machine learning models to forecast credit risk using Python in this project. The following technique was used:
- Using the RandomOverSampler and SMOTE algorithms, oversample the data.
- Using the ClusterCentroids technique, undersample the data.
- Using the SMOTEENN method, take a combinatorial approach to over- and undersampling.
- BalancedRandomForestClassifier and EasyEnsembleClassifier are two machine learning models that eliminate bias.
We'll assess these models' performance and offer a recommendation on whether or not they should be utilised to predict credit risk.
A balanced accuracy score of 64 percent is achieved. The high risk precision is just approximately 1% with a sensitivity of 62 percent, resulting in an F1 of about 2%. Because of the large number of low-risk people, it has a precision of almost 100 percent and a sensitivity of 68 percent. The outcomes are very similar to those of the prior model. The balanced accuracy score is 63%. The high risk precision is just approximately 1% with a sensitivity of 60 percent, resulting in an F1 of about 2%. Because of the large number of low-risk individuals, it has an accuracy of almost 100% and a sensitivity of 68%. Here the balanced accuracy score is down to about 51%. The high risk precision is still 1% with a sensitivity of 60 percent, resulting in an F1 of 1%. The low risk sensitivity is just 43% due to the significant amount of false positives. The balanced accuracy score is around 62%. The high risk precision is still 1% with a sensitivity of 70 percent, resulting in an F1 of only 2%. The low risk sensitivity is 55 percent due to the significant amount of false positives. The balanced accuracy score increased to around 79%. The high risk precision is still poor, at just 4% with only 67 percent sensitivity, resulting in an F1 of only 7%. The low risk sensitivity is now 91 percent with 100 percent presicion, thanks to a decreasing number of false positives. The balanced accuracy score has now risen to over 92 percent. The high risk precision is still poor, at just 7% with 91 percent sensitivity, resulting in an F1 of of 14%. The low risk sensitivity is now 94 percent with 100 percent precision, thanks to a decreasing number of false positives.All of the credit risk analysis models have low accuracy in assessing if a credit risk is high. The Ensemble models resulted in significant improvements, particularly in the sensitivity of high-risk loans. With a recall of 92 percent, the EasyEnsembleClassifier model can detect virtually all high-risk credit. On the other hand, because of the poor accuracy, many low-risk credits are still misclassified as high-risk, putting the bank's credit strategy at risk and causing it to miss out on income prospects. As a result, I would advise the bank against using any of these algorithms to anticipate credit risk.
- Email : neda.ahmadi.jesh@gmail.com
- Linkedin: www.linkedin.com/in/neda-ahmadi-j