This project focuses on performing Market Basket Analysis using a retail grocery dataset. It utilizes association rule mining techniques to uncover relationships between items and build a recommendation system.
Market Basket Analysis is a data mining technique used to understand customer purchasing behaviors. This project applies the Apriori algorithm and association rules to analyze transactions, identify frequent itemsets, and provide product recommendations.
- Data Cleaning: Handles missing values and removes duplicates.
- Exploratory Data Analysis: Visualizes top purchased items.
- Transaction Encoding: Transforms data into a binary format suitable for association rule mining.
- Frequent Itemsets Mining: Identifies combinations of items frequently purchased together.
- Association Rule Generation: Creates rules with metrics such as support, confidence, and lift.
- Product Recommendation System: Recommends items based on user-specified products.
- Name: Groceries Dataset
- Description: A collection of transaction records from a retail grocery store.
- Format: CSV file with fields like
Member_number
(customer ID) anditemDescription
(products purchased). - Source: Provided with the project.
- Import Libraries: Load essential Python libraries for data analysis and mining.
- Load and Clean Data:
- Check and handle missing values.
- Identify and remove duplicate records.
- Exploratory Data Analysis:
- Data Transformation:
- Convert transaction data into a binary matrix (purchased: 1, not purchased: 0).
- Frequent Itemsets Mining:
- Apply the Apriori algorithm to find itemsets with a minimum support threshold.
- Generate Association Rules:
- Derive rules with metrics like confidence and lift.
- Build a Recommendation System:
- Recommend items based on antecedents and lift scores.
- Top Purchased Items: Visual representation of popular products.
- Frequent Itemsets: Insights into commonly bought item combinations.
- Association Rules: Actionable rules for marketing strategies.
- Recommendations: A list of suggested products for a given item.
- Python: For data manipulation, analysis, and visualization.
- Libraries:
pandas
andnumpy
for data processing.mlxtend
for Apriori and association rules.matplotlib
for visualizations.