Customer-Insight-Clustering

Customer segmentation using K-Means clustering to analyse distinct customer groups based on their purchasing behaviours. By analyzing retail transaction data, we can identify key customer segments, which enables targeted marketing strategies and enhances overall business insights.

Tools and Technologies

Pandas: For data manipulation and analysis.

Matplotlib: For creating visualizations.

Scikit-learn: For data normalization and clustering algorithms.

i) Data Preprocessing:

Conversion of the InvoiceDate column to datetime format and handling missing values by dropping rows with missing CustomerID.

ii) Data Aggregation:

Aggregating data at the customer level to derive purchase frequency, total quantity purchased, and average unit price.

iii) Feature Engineering:

Calculating total spend and average purchase value per customer.

iv) Feature Selection:

Selecting key features such as purchase frequency, total spend, and average purchase value for clustering.

v) Data Normalization:

Normalizing the selected features using StandardScaler to ensure all features contribute equally to the clustering process.

vi) Optimal Cluster Determination:

Utilizing the Elbow Method to determine the optimal number of clusters by plotting the sum of squared errors (SSE) for different cluster counts.

vii) K-Means Clustering:

Applying K-Means clustering with the optimal number of clusters to segment customers into distinct groups.

ix) Visualization:

Visualizing the customer segments using a 3D scatter plot to illustrate the distribution of clusters based on purchase frequency, total spend, and average purchase value.

1: Clone the repository: git clone https://github.com/yourusername/Customer-Insight-Clustering.git

2: Navigate to the project directory: cd Customer-Insight-Clustering

3: Run the script python customer_segmentation.py

Conclusions:

Marketing Strategy: The distinct segmentation allows for targeted marketing strategies, focusing on high-value customers for loyalty programs and low-value customers for acquisition campaigns.

Optimal Cluster Identification: The Elbow Method plot indicates an optimal number of clusters at K=4, where the sum of squared errors (SSE) shows a pronounced inflection point, suggesting diminishing returns for adding more clusters beyond this point.

Low Variance within Clusters: The sharp decline in SSE up to 4 clusters indicates that the clusters formed within this range have low intra-cluster variance, enhancing the homogeneity within each cluster.

Dominance of Low-Frequency Purchasers: The 3D scatter plot reveals a dominant cluster with low purchase frequency and low total spend, indicating a significant portion of customers with infrequent and low-value transactions.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE		LICENSE
README.md		README.md
clustering.py		clustering.py
data_preprocessing.py		data_preprocessing.py
feature_engineering.py		feature_engineering.py
imports.py		imports.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer-Insight-Clustering

Tools and Technologies

i) Data Preprocessing:

ii) Data Aggregation:

iii) Feature Engineering:

iv) Feature Selection:

v) Data Normalization:

vi) Optimal Cluster Determination:

vii) K-Means Clustering:

ix) Visualization:

Conclusions:

About

Releases

Packages

Languages

License

Kshitij-Shresth/Customer-Insight-Clustering

Folders and files

Latest commit

History

Repository files navigation

Customer-Insight-Clustering

Tools and Technologies

i) Data Preprocessing:

ii) Data Aggregation:

iii) Feature Engineering:

iv) Feature Selection:

v) Data Normalization:

vi) Optimal Cluster Determination:

vii) K-Means Clustering:

ix) Visualization:

Conclusions:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages