Arvato-Customer-Segmentation

Customer segmentation report using machine learning for arvato financials

Blog Post: Medium
Kaggle Leaderboard: Kaggle

Problem Statment

The goal of this project is to help a mail-order sales company in Germany to identify segments of the general population to target with their marketing to grow. The company has provided us with the demographic data of their current customers and the general population. We’ve to build a customer-segmentation machine learning model for the company, which correctly categorizes customers into groups and identifies the customers that the company should target. The project is divided into 3 parts:

Data Explorationg and Cleaning
Unsupervised learning: Grouping the population into clusters using K-means Clustering algorithm
Supervised learning: Predicting the response of individual customers towards the marketing campaign using classification algorithms and make a submission to the kaggle competition.

Datasets and Inputs

There are four data files associated with this project:

Udacity_AZDIAS_052018.csv: Demographics data for the general population of Germany; 891 211 persons (rows) x 366 features (columns).
Udacity_CUSTOMERS_052018.csv: Demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features (columns).
Udacity_MAILOUT_052018_TRAIN.csv: Demographics data for individuals who were targets of a marketing campaign; 42 982 persons (rows) x 367 (columns).
Udacity_MAILOUT_052018_TEST.csv: Demographics data for individuals who were targets of a marketing campaign; 42 833 persons (rows) x 366 (columns).

The data required for the project has been provided by Bertelsmann Arvato Analytics for the completion of Machine Learning Nanodegree capstone project. The data can be accessed from udacity platform only.

Installation

Besided Anaconda Python 3.7 distribution, following libraries will require installement:

LGBM
XGBoost

Authors

Aakriti Sharma

License

MIT

References

Dirk Van den Poel, 2003Predicting Mail-Order Repeat Buying: Which Variables Matter?
Selva Prabhakaran, Principal Component Analysis[PCA] – better explained
Analytics Vidya
LGBM Wikipedia
Scikit-Learn Docs

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
terms_and_conditions		terms_and_conditions
.gitignore		.gitignore
1_Data_Exploration_and_Preprocessing.ipynb		1_Data_Exploration_and_Preprocessing.ipynb
2_Customer_Segmentation_Report.ipynb		2_Customer_Segmentation_Report.ipynb
3_Customer_Aquisition.ipynb		3_Customer_Aquisition.ipynb
LICENSE		LICENSE
Proposal.pdf		Proposal.pdf
README.md		README.md
Report.pdf		Report.pdf
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arvato-Customer-Segmentation

Problem Statment

Datasets and Inputs

Installation

Authors

License

References

About

Releases

Packages

Languages

License

itirkaa/Arvato-Customer-Segmentation

Folders and files

Latest commit

History

Repository files navigation

Arvato-Customer-Segmentation

Problem Statment

Datasets and Inputs

Installation

Authors

License

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages