Customer Segmentation Analysis

A project submission to Nanyang Technological University for the course SC1015 (Introduction to Data Science & Artificial Intelligence).

Video presentation link: https://youtu.be/k0Ayb1S2R-o

Dataset is obtained from Kaggle, titled "Customer Personality Analysis".

Problem Motivation

Supermarkets attract a diverse range of customers with differing preferences and needs. Effective advertising required a tailored approach towards targeting the needs and wants of each customer.
Our chosen problem statement: How can supermarkets leverage machine learning to identify customer segments based on customer attributes?
With many different Clustering algorithms available, how can we identify the most optimal model that can segment the supermarket's customers?

Approach Taken

Exploratory Data Analysis was performed to identify required preprocessing steps.
Preprocessing was done to clean and prepare the dataset for the models. Steps such as the replacement of null values, removal of outliers, scaling and one hot encoding was performed.
Data Visualisation was used to understand subtle relationships and distributions within the dataset.
Dimensionality Reduction was achieved through Principal Component Analysis.
The optimal cluster number was identified using Elbow Method, Hierarchical Graph and Gap Statistic.
6 Clustering Algorithms across 5 Clustering methods were employed on the dataset.
Evaluation was performed on these 6 Clustering Algorithms using Silhouette Score, Calinski Harabaz Index and Davies Bouldin Index.
Profiling of the identified clusters was performed based on their demographic and behavioural characteristics.
Recommendations for the supermarkets were drawn from the results.

Clustering Algorithms Used

Connectivity/ Hierarchical Clustering

Agglomerative Clustering Model

Centroid/ Partition Clustering

K-Means
Mean Shift

Distribution Model

Gaussian Mixture Model

Density Model

Ordering Points To Identify the Clustering Structure (OPTICS)

Graph-based Model

Spectral Clustering

Conclusions

K-Means Clustering yielded the best results overall.
Customers can be segmented into 4 clusters, eac with their own demographic traits: Income, Age, Number of Children, Education (specifically Third Cycle).
These clusters have differing spending behaviours: Receptivity to Campaigns, Compain Tendencies, Highest Expenditure Product Categories and Prefered Purchase Avenue.
More detailed conclusions can be found in the notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Mini_Project.ipynb		Mini_Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Segmentation Analysis

Problem Motivation

Approach Taken

Clustering Algorithms Used

Conclusions

Contributors

References

About

Releases

Packages

Languages

Ranchu2000/Customer-Segmentation-Analysis

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation Analysis

Problem Motivation

Approach Taken

Clustering Algorithms Used

Conclusions

Contributors

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages