This project uses ensemble models to predict bank customer churn. We saw that people approaching retirement saw higher churn rates. People with a very low balance and high balance were likely to churn too.
We used Logistic Regression as the baseline model and compared the ensembles with it.
- Bagging
- Boosting
- Stacking
The ensemble models were aslo tested with different set of features to obtain the best results. A correlation-score based strategy was used to select the features. We have performed feature selection by setting a minimum threshold value of the respective attributes with their correlation with the column ‘Exited’(the class variable that indicates customer churn). If the corresponding correlation is greater than the threshold then the attribute will be considered to train the model.