This repo shows a set of Jupyter Notebooks that I used to tackle the Instacart Masket Basket Analysis challenge. The dataset for this competition is a relational set of files describing customers' orders over time. The goal of the competition is to predict which products will be in a user's next order. The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users. For each user, Instacart provides between 4 and 100 of their orders, with the sequence of products purchased in each order. Instacart also provides the week and hour of day the order was placed, and a relative measure of time between orders.
Here are the different notebooks:
- Data Exploration: Exploring the raw datasets.
- Customer Segmentation: Segmenting the customers with Principal Component Analysis and K-Means Clustering.
- Association Rule Mining: Applying the Apriori algorithm to mine association rules between orders and customers.
A 3-part series of accompanied Medium blog posts have been written up and can be viewed here:
- Part 1: Which Grocery Items Are Popular?
- Part 2: Which Groups of Customers Are Similar?
- Part 3: Which Sets of Products Should Be Recommended To Shoppers?
Choose the latest versions of any of the dependencies below:
MIT. See the LICENSE file for the copyright notice.