This project performs customer segmentation using K-Means clustering on the "Mall Customers" dataset. It includes data preprocessing, visualization, clustering using the Elbow Method, and deployment with Streamlit for interactive analysis.
- Load and preprocess customer data.
- Perform K-Means clustering on
Annual Income
andSpending Score
. - Use the Elbow Method to determine the optimal number of clusters.
- Visualize the clusters using Seaborn and Matplotlib.
- Deploy an interactive Streamlit app for user-friendly exploration.
- Save clustered data to Google Drive for further analysis.
The dataset contains the following attributes:
- Customer ID: Unique identifier for each customer.
- Gender: Gender of the customer.
- Age: Age of the customer.
- Annual Income (k$): Annual income in thousands of dollars.
- Spending Score (1-100): A measure of spending habits.
Ensure you have the following libraries installed:
pip install pandas numpy matplotlib seaborn scikit-learn streamlit
-
Mount Google Drive (for Google Colab users):
from google.colab import drive drive.mount('/content/drive')
-
Run the Python script:
python clustering_script.py
-
Run the Streamlit app:
streamlit run app.py
├── clustering_script.py # Main script for clustering
├── app.py # Streamlit app for visualization
├── Mall_Customers.csv # Input dataset
├── clustered_customers.csv # Output file with cluster labels
├── README.md # Project documentation
- The dataset is clustered into 5 groups based on annual income and spending score.
- Customers can be visualized in different clusters using an interactive Streamlit app.
Feel free to fork this repository, make changes, and submit pull requests. Contributions are welcome!
This project is open-source under the MIT License.