The aim of this project is to help a fictitious charity organization identify people most likely to donate by using sklearn and supervised learning techniques on data collected for the U.S. census. Firstly, the factors that affect the likelihood of charity donations being made are investigated. Then, a training and predicting pipeline to evaluate the accuracy and efficiency/speed of three supervised machine learning algorithms (GaussianNB, SVC, Adaboost) is created. Next, fine tune the parameters of the algorithm is made which provides the highest donation yield (while reducing mailing efforts/costs). Finally, the impact of reducing number of features in data is analysed.
- finding_donors.ipynb: main code for this project
- visuals.py: additional supporting code for visualizing the necessary graphs
In a terminal or command window, run the following command:
jupyter notebook finding_donors.ipynb
This project was completed as a part of Udacity Data Scientist Nanodegree.