Customer-Segmentation

Applied K Means Clustering algorithm to create customer segmentation. By creating customer segmentation, we can identified the behavior of customers and create marketing campaign based on each segmentation

Understanding the Problem Statement and Business Case

Marketing is crucial for the growth and sustainability of any business. Marketers can help build the company's brand, engage customers, grow revenue, and increase sales.One of the key pain points for marketers is to know their customers and identify their needs. By understanding the customer, marketers can launch a targeted marketing campaign that is tailored for specific needs.

In this project, we are going to perform customer segmentation based on data of the customers. By performing customer segmentation, we can identify customer needs and behaviours. In this way, we can create effective marketing campaign for customers.

Importing Datasets

We used customer personality analysis from kaggle that contains the information of 2240 customers. The following is the first two rows of the dataset :

ID	Year_Birth	Education	Marital_Status	Income	Kidhome	Teenhome	Dt_Customer	Recency	MntWines	...	NumWebVisitsMonth	AcceptedCmp3	AcceptedCmp4	AcceptedCmp5	AcceptedCmp1	AcceptedCmp2	Complain	Z_CostContact	Z_Revenue	Response
5524	1957	Graduation	Single	58138.0	0	0	2012-04-09	58	635	...	7	0	0	0	0	0	0	3	11	1
2174	1954	Graduation	Single	46344.0	1	1	2012-08-03	38	11	...	5	0	0	0	0	0	0	3	11	0

Attributes

People

ID: Customer's unique identifier
Year_Birth: Customer's birth year
Education: Customer's education level
Marital_Status: Customer's marital status
Income: Customer's yearly household income
Kidhome: Number of children in customer's household
Teenhome: Number of teenagers in customer's household
Dt_Customer: Date of customer's enrollment with the company
Recency: Number of days since customer's last purchase
Complain: 1 if the customer complained in the last 2 years, 0 otherwise Products
MntWines: Amount spent on wine in last 2 years
MntFruits: Amount spent on fruits in last 2 years
MntMeatProducts: Amount spent on meat in last 2 years
MntFishProducts: Amount spent on fish in last 2 years
MntSweetProducts: Amount spent on sweets in last 2 years
MntGoldProds: Amount spent on gold in last 2 years

Promotion

NumDealsPurchases: Number of purchases made with a discount
AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise
AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise
AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise
Response: 1 if customer accepted the offer in the last campaign, 0 otherwise

Place

NumWebPurchases: Number of purchases made through the company’s website
NumCatalogPurchases: Number of purchases made using a catalogue
NumStorePurchases: Number of purchases made directly in stores
NumWebVisitsMonth: Number of visits to company’s website in the last month

After importing our dataset, we should take a look for missing values in our dataset. We have 24 missing values on Income column, lets fill them up with average values on income column.

Before we do visualization of customers profile, lets make new column that contain age of the customers. The following is new dataset with age column :

ID	Year_Birth	Education	Marital_Status	Income	Kidhome	Teenhome	Dt_Customer	Recency	MntWines	...	NumWebVisitsMonth	AcceptedCmp3	AcceptedCmp4	AcceptedCmp5	AcceptedCmp1	AcceptedCmp2	Complain	Z_CostContact	Z_Revenue	Response	Age
5524	1957	Graduation	Single	58138.0	0	0	2012-04-09	58	635	...	7	0	0	0	0	0	0	3	11	1	65
2174	1954	Graduation	Single	46344.0	1	1	2012-08-03	38	11	...	5	0	0	0	0	0	0	3	11	0	68

Customers Profile

Customers Age

Boxplot above shows that there are huge outlier on age boxplot, we should drop it later.

We can divide age of customer into 4 category : Young, Adult, Mature, and Senior category

Now we know that all of the customer is consist by 54.1% of customers falls into mature category, 25.8% of customers falls into adult category, 18.8% of customers falls into senior category, 1.25% of customers falls into young category.

Customers Education

Based on chart above, the majority of customer is graduation or bachelor deegre followed by master and PhD deegre. We can divide the education of customer for our model into two categories, undergraduate and postgraduate

Now we know that all of the customer is consist by 88.5% of customer have postgraduate education and 11.5% of customer have undergraduate education

Customers Marital Status

Based on chart above, we know that there are many marital status with absurd and YOLO being one of the marital status of the customers. Lets divide it into two categories, single and in-relationship.

After we divide the marital status into two categories, now we know that all of the customer is consist by 64.5% of customer in relationship and 35.5% of customer are single.

Customers Income

The plot above shows there are huge outlier in income boxplot with income on 666.666k ,we should remove it later.

We can divide the income of each customers into 4 category : low income, low to medium income, medium to high income, and high income.

Distribution of Customers by Children in Their Home

Now lets combine Kidhome and Teenhome into one column. We call it 'Number of Child' column, this column contain info of the child in customer household.

Now we know that majority of the customers have 1 children followed by 0 children and 2 children.

Customers complain

Pie chart above shows that 99.1% of customers never filed a complain while 0.938% of customers have filed a complain

We already take a look into our customer profile, now lets create new columns that contain information of total monthly spend,total number of campaign accepted, and total purchases

Data Pre-processing

We drop Dt_Customer, Kidhome, Teenhome, Recency, ID, Year_Birth, Income, Age, Z_CostContact, Z_Revenue columns because its unnecesary for our cluster

Data Encoding

There are categorical data on model data, we will do ordinal encoding to give label for these columns with categorical data :

Age_group = {'Young': 1, 'Adult': 2 , 'Mature': 3, 'Senior': 4}
Education = {'Undergraduate': 1, 'Postgraduate': 2}
Marital_Status = {'Single': 1, 'In-Relationship': 2}
Income_group = {'Low income': 1, 'Low to medium income': 2, 'Medium to high income': 3, 'High income': 4}

Data Normalization

Before we put the data into our model, we should normalize the data first.Normalization gives equal weights/importance to each variable so that no single variable steers model performance in one direction just because they are bigger numbers. We use Power Transformer to normalize the data

Before Normalization :

After Normalization :

Clustering

Before we cluster our data, we should find out ideal cluster for us using Elbow Method and silhouette score for validation.

Elbow Method

In cluster analysis, the elbow method is a heuristic used in determining the number of clusters in a data set. The vertical lines of the graph above shows that ideal cluster for our cluster is 4

Silhouette Score

Silhouette Coefficient or silhouette score is a metric used to calculate the goodness of a clustering technique. Its value ranges from -1 to 1, each clusters are well apart from each other and clearly distinguished if silhouette score is towards 1.

We have bad silhouette score because our silhouette score is towards 0.

To optimize it, we can apply principal component analysis (PCA) for our cluster

Principal Component Analysis

PCA is an unsupervised machine learning algorithm. PCA performs dimensionality reductions while attempting at keeping the original information unchanged.

We will fit PCA algorithm into our cluster and visualize it for customer segmentation

Customer Segmentation

Summary

Based on visualization of customer clusters, we can tell the characteristic of each clusters

Cluster 0

Majority of customers belong to this cluster
Have low to medium - low income
Majority have 1 - 2 children at home
Spend less money on products
Buy less products
Complains a lot
Don't like to buy product via catalog
Gives negatives responses on marketing campaign

Cluster 1

Have medium-high to high income
Have 0-1 children at home
Cluster with second highest amount of total purchases
Spent more money on fruits and fish products
Gives negative responses on marketing campaign
Likes to buy product via store
Likes discount

Cluster 2

Have low-medium to medium-high income
Consist of customers in adult to senior age group
Have 1-2 children at home
Spent more money on wine and gold
Really likes discount
Gives positives responses on marketing campaign

Cluster 3

Cluster with least number of customers on it
Have high income
Majority are child-free
Cluster with highest amount of total purchases
Spent more money on all kinds of products
Less likely to complain
likes to buy product via catalog
Don't likes discount
Gives positive responses on marketing campaign

Conclusions

After we perform clustering to create customer segmentation, we have 4 segmentation for our customers. We can tell that cluster 3 is our best cluster because they are spent more money on all kinds of products, less likely to complain, don't likes discount, and gives positive responses on marketing campaign. Cluster 0 is our worst cluster because thay are spend less money on products, buy less products, complains a lot, and gives negative responses on marketing campaign

For next campaign, we can create marketing campaign towards cluster 2 and cluster 3 because they are giving positive responses on marketing campaign. For cluster 3, we can use catalog as media for our campaign because they likes to buy product via catalog. For cluster 2, we can gives discount with minimum spend on wines and gold products because they likes discount and spend more money on gold and wines product

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Customer-Segmentation

Understanding the Problem Statement and Business Case

Importing Datasets

Customers Profile

Data Pre-processing

Data Encoding

Data Normalization

Clustering

Elbow Method

Silhouette Score

Principal Component Analysis

Customer Segmentation

Summary

Cluster 0

Cluster 1

Cluster 2

Cluster 3

Conclusions

Files

README.md

Latest commit

History

README.md

File metadata and controls

Customer-Segmentation

Understanding the Problem Statement and Business Case

Importing Datasets

Customers Profile

Data Pre-processing

Data Encoding

Data Normalization

Clustering

Elbow Method

Silhouette Score

Principal Component Analysis

Customer Segmentation

Summary

Cluster 0

Cluster 1

Cluster 2

Cluster 3

Conclusions