Starbucks offer optimisation

Optimise target customers for offers based on transaction, demographic and offer data

Detailed report HERE

0. Requirements

python 3.7
as specified in requirements.txt

1. Background

Starbucks is a coffee company and coffeehouse chain that serves hot and cold drinks, various kinds of coffee and tea. Once every few days, Starbucks sends out an offer to users of their mobile app as a way to stimulate customer spending. Starbucks is looking to optimise their offering strategy so that the right offer is sent to the right customer.

2. Problem statement

With millions of customers and various types of offers, it is impossible to allocate personel to manually decide the offering for each customer. Each person in the simulation has some hidden traits that influence their purchasing patterns and are associated with their observable traits. People produce various events, including receiving offers, opening offers, and making purchases.

Therefore, it is necessary to build an automated decision process that allocate the right offer to the right customer.

3. Dataset

Starbucks has provided a dataset that contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be an advertisement for a drink or an actual offer. Some users might not receive any offer during certain weeks.

The data is contained in three files:

portfolio.json - containing offer ids and meta data about 10 available offers (duration, type, etc.) sent during 30-day test period
- id (string) - offer id
- offer_type (string) - type of offer ie BOGO, discount, informational. In details:
  - Buy-one-get-one (BOGO): A user needs to spend a certain amount to get a reward equal to that threshold amount.
  - Discount: A user gains a reward equal to a fraction of the amount spent
  - Informational: There is no reward, but neither is there a requisite amount that the user is expected to spend.
- difficulty (int) - minimum required spend to complete an offer
- reward (int) - reward (in USD) given for completing an offer
- duration (int) - time for offer to be open, in days
- channels (list of strings) web, email, mobile, social
profile.json - demographic data for each customer, aging from 18 to 118, with approximately 15,000 customers, in which 6000 females, 8000 males, and 200 other, income ranging from 30,000 USD per annum to 120,000 USD per annum.
- age (int) - age of the customer, missing value encoded as 118
- became_member_on (int) - date when customer created an app account, format YYYYMMDD
- gender (str) - gender of the customer (M, F, O, or null)
- id (str) - customer id
- income (float) - customer's income
transcript.json - records for approximately 300,000 events such as offers received, offers viewed, and offers completed. This shows user purchases made on the app including the timestamp of purchase and the amount of money spent on a purchase. This transactional data also has a record for each offer that a user receives as well as a record for when a user actually views the offer. There are also records for when a user completes an offer. It's also important to know that a user can receive an offer, never actually view the offer, and still complete the offer. For example, a user might receive the "buy 10 dollars get 2 dollars off offer", but the user never opens the offer during the 10 day validity period. The customer spends 15 dollars during those ten days. There will be an offer completion record in the data set; however, the customer was not influenced by the offer because the customer never viewed the offer.
- event (str) - record description (ie transaction, offer received, offer viewed, etc.)
- person (str) - customer id
- time (int) - time in hours since start of test. The data begins at time t=0
- value - (dict of strings) - either an offer id or transaction amount depending on the record

4. Solution statement

The proposed solution has 2 parts, outlined as follows:

4.1. Predict customer spending

This part aims to predict how customer would spend with and without the influence of offer, based on their profiles (age, gender, income, time, etc.)

In this part, information on offer, customer profile and historical spending are combined into 1 data table with the following information:

Source	Information	Note
portfolio.json	Offer type	Label encoded or one-hot encoded
portfolio.json	channels	Label encoded or one-hot encoded
profile.json	age	Might need to be binned depending on model type
profile.json	income	Might need to be binned depending on model type
profile.json	id	Might be included/removed depending on data availability
transcript.json	Spending	Spending during the period when the offer is valid, which is also the target variable

Then, relevant features will be engineered, depending on the findings from the exploratory data analysis part.

Data is then split into training and validation set, with 80% of customers in the training set and 20% in the validation set. This is to mimic the situation where we need to predict on new customers.

A machine learning algorithm is then chosen to learn to predict customer spending given an offer.

4.2. Create offer policy to maximise customer spending

Since the above step generate a simulation of customer spending if given each offer, the decision rule will simply choose the offer that maximise the increased spending.

5. Benchmark model

Linear regression was selected for simplicity. This will be used to predict spending from customer and offer data with minimal feature engineering.

6. Evaluation metrics

6.1. For predicting spending

For this task, RMSE was selected, which is a standard metric for regression tasks. However, this is not the direct metric we would like to optimise because the goal is to maximise spending increased by offer. Therefore, this will only be used to assess goodness of fit for spending prediction models.

6.2. For optimising offer sending

Businesses may seek to optimise for maximum income (i.e. customer spending) or profit. Therefore, the following metric was selected:

Spending increased by allocating an offer to customer (in USD).

spending_increased = spending_with_offer - spending_without_offer

In which: spending_with_offer, spending_without_offer is the amount that a customer would spend if they receive or did not receive an offer. This is provided in the transcript.json.

7. Project design

The workflow is as follow:

Data transformation: Transform data from json to table format to facilitate later analysis
- portfolio and profile datasets: Script
- transaction dataset: Notebook
Exploratory data analysis ("EDA"): Apply statistical/visualisation method to obtain further understanding of customer profiles and transactions
- profile dataset: Notebook
- portfolio dataset: Notebook
- Customer spending: Notebook
Perform data ETL and data cleaning as informed by the EDA step:
- Parsing transcript data: Notebook
- Numerically encoded portfolio data: Notebook
Feature engineering, informed by the EDA step: Notebook
Setup benchmark model Notebook
Create machine learning model to predict customer spending. The choice of algorithm will be informed by the exploratory analysis phase. Tentative candidates are LightGBM and LinearRegressor. Notebook
Evaluate and compare results obtained from benchmark models and machine learning model according to the metrics defined in Section 6. Notebook
Use the above trained model to simulate how customer would react to each type of offer Notebook
Create a decision rule to allocate offers to customers that maximise increased spending. Since the above step generate a simulation of customer spending if given each offer, the decision rule will choose the offer that maximise the increased spending. Notebook
Critical reflection and assessment of the solution's business impact Report

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
outputs		outputs
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Starbucks offer optimisation

0. Requirements

1. Background

2. Problem statement

3. Dataset

4. Solution statement

4.1. Predict customer spending

4.2. Create offer policy to maximise customer spending

5. Benchmark model

6. Evaluation metrics

6.1. For predicting spending

6.2. For optimising offer sending

7. Project design

About

Releases

Packages

Languages

xquyvu/starbucks-offer-optimisation

Folders and files

Latest commit

History

Repository files navigation

Starbucks offer optimisation

0. Requirements

1. Background

2. Problem statement

3. Dataset

4. Solution statement

4.1. Predict customer spending

4.2. Create offer policy to maximise customer spending

5. Benchmark model

6. Evaluation metrics

6.1. For predicting spending

6.2. For optimising offer sending

7. Project design

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages