Customer segmentation report using machine learning for arvato financials
The goal of this project is to help a mail-order sales company in Germany to identify segments of the general population to target with their marketing to grow. The company has provided us with the demographic data of their current customers and the general population. We’ve to build a customer-segmentation machine learning model for the company, which correctly categorizes customers into groups and identifies the customers that the company should target. The project is divided into 3 parts:
- Data Explorationg and Cleaning
- Unsupervised learning: Grouping the population into clusters using K-means Clustering algorithm
- Supervised learning: Predicting the response of individual customers towards the marketing campaign using classification algorithms and make a submission to the kaggle competition.
There are four data files associated with this project:
- Udacity_AZDIAS_052018.csv: Demographics data for the general population of Germany; 891 211 persons (rows) x 366 features (columns).
- Udacity_CUSTOMERS_052018.csv: Demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features (columns).
- Udacity_MAILOUT_052018_TRAIN.csv: Demographics data for individuals who were targets of a marketing campaign; 42 982 persons (rows) x 367 (columns).
- Udacity_MAILOUT_052018_TEST.csv: Demographics data for individuals who were targets of a marketing campaign; 42 833 persons (rows) x 366 (columns).
The data required for the project has been provided by Bertelsmann Arvato Analytics for the completion of Machine Learning Nanodegree capstone project. The data can be accessed from udacity platform only.
Besided Anaconda Python 3.7 distribution, following libraries will require installement:
LGBM
XGBoost
- Dirk Van den Poel, 2003Predicting Mail-Order Repeat Buying: Which Variables Matter?
- Selva Prabhakaran, Principal Component Analysis[PCA] – better explained
- Analytics Vidya
- LGBM Wikipedia
- Scikit-Learn Docs