- Introduction
- Problem Statement
- Entity Relationship Diagram
- Case Study Questions & Solutions
- Bonus Questions & Solutions
- Key Insights
In early 2021, Danny follows his passion for Japanese food and opens "Danny's Diner," a charming restaurant offering sushi, curry, and ramen. However, lacking data analysis expertise, the restaurant struggles to leverage the basic data collected during its initial months to make informed business decisions. Danny's Diner seeks assistance in using this data effectively to keep the restaurant thriving.
Danny aims to utilize customer data to gain valuable insights into their visiting patterns, spending habits, and favorite menu items. By establishing a deeper connection with his customers, he can provide a more personalized experience for his loyal patrons.
He plans to use these insights to make informed decisions about expanding the existing customer loyalty program. Additionally, Danny seeks assistance in generating basic datasets for his team to inspect the data conveniently, without requiring SQL expertise.
Due to privacy concerns, he has shared a sample of his overall customer data, hoping it will be sufficient for you to create fully functional SQL queries to address his questions.
The case study revolves around three key datasets:
- Sales
- Menu
- Members
- The SQL query retrieves the
customer_id
and calculates the total amount spent (total_amnt
) by each customer at the restaurant. - It combines data from the
sales
andmenu
tables based on matchingproduct_id
. - The results are grouped by
customer_id
. - The query then calculates the total sum of
price
for each group of sales records with the samecustomer_id
. - Finally, the results are sorted in ascending order based on the
customer_id
. - The SQL query selects the
customer_id
and counts the number of distinct order dates (No_Days
) for each customer. - It retrieves data from the
sales
table. - The results are grouped by
customer_id
. - The
COUNT(DISTINCT order_date)
function calculates the number of unique order dates for each customer. - Finally, the query presents the total number of unique order dates as
No_Days
for each customer. - The SQL query uses a Common Table Expression (CTE) named
CTE
to generate a temporary result set. - Within the CTE, it selects the
customer_id
, assigns a dense rank to each row based on the order date for each customer, and retrieves the correspondingproduct_name
from themenu
table. - The
sales
table is joined with themenu
table on matchingproduct_id
. - The DENSE_RANK() function assigns a rank to each row within the partition of each
customer_id
based on theorder_date
in ascending order. - Each
customer_id
has its own partition and separate ranks based on the order dates of their purchases. - Next, the main query selects the
customer_id
and correspondingproduct_name
from the CTE. - It filters the results and only includes rows where the rank
rn
is equal to 1, which means the earliest purchase for eachcustomer_id
. - As a result, the query returns the first purchased product for each customer.
- The SQL query selects the
product_name
from themenu
table and counts the number of times each product was ordered (most_ordered
). - It retrieves data from the
Sales
table and joins it with themenu
table based on matchingproduct_id
. - The results are grouped by
product_name
. - The
COUNT(S.product_id)
function calculates the number of occurrences of eachproduct_id
in theSales
table. - The query then presents the
product_name
and its corresponding count asmost_ordered
for each product. - Next, the results are sorted in descending order based on the
most_ordered
column, so the most ordered product appears first. - The
LIMIT 1
clause is used to restrict the result to only one row, effectively showing the most ordered product. - The SQL query uses a Common Table Expression (CTE) named
CTE
to generate a temporary result set. - Within the CTE, it selects the
customer_id
,product_name
, and counts the number of times each product was ordered (order_count
) for each customer. - It retrieves data from the
sales
table and joins it with themenu
table based on matchingproduct_id
. - The results are grouped by
customer_id
andproduct_name
to get the count of orders for each product of each customer. - The
COUNT(M.product_id)
function calculates the number of occurrences of eachproduct_id
in themenu
table. - The DENSE_RANK() function assigns a rank to each row within the partition of each
customer_id
based on the order count of products in descending order. - Each
customer_id
has its own partition and separate ranks based on the number of product orders. - Next, the main query selects the
customer_id
,product_name
, andorder_count
from the CTE. - It filters the results and only includes rows where the rank
rnk
is equal to 1, which means the most ordered product for eachcustomer_id
. - As a result, the query returns the customer's ID, the most ordered product, and the number of times it was ordered by that customer.
- The SQL query retrieves distinct rows for each unique
customer_id
with their correspondingproduct_name
from thesales
andmenu
tables. - It filters the data based on the condition that the
order_date
in thesales
table is greater than thejoin_date
of the customer in themembers
table. - The
sales
table is aliased ass
, themembers
table is aliased asmbr
, and themenu
table is aliased asm
. - The query performs inner joins between
sales
andmembers
tables on matchingcustomer_id
and betweensales
andmenu
tables on matchingproduct_id
. - Only rows that meet the join condition and the
order_date > join_date
condition are considered in the result set. - The query selects the
customer_id
and correspondingproduct_name
for each customer who has placed an order after theirjoin_date
. - The results are sorted in ascending order based on the
customer_id
. - The
DISTINCT ON (s.customer_id)
clause ensures that only the first occurrence of eachcustomer_id
is included in the result set. - As a result, the query returns a unique list of
customer_id
along with the firstproduct_name
they ordered after joining as a member. - The SQL query retrieves distinct rows for each unique
customer_id
with their correspondingproduct_name
from thesales
andmenu
tables. - It filters the data based on the condition that the
order_date
in thesales
table is less than thejoin_date
of the customer in themembers
table. - The
sales
table is aliased ass
, themembers
table is aliased asmbr
, and themenu
table is aliased asm
. - The query performs inner joins between
sales
andmembers
tables on matchingcustomer_id
and betweensales
andmenu
tables on matchingproduct_id
. - Only rows that meet the join condition and the
order_date < join_date
condition are considered in the result set. - The query selects the
customer_id
and correspondingproduct_name
for each customer who has placed an order before theirjoin_date
. - The results are sorted in ascending order based on the
customer_id
. - The
DISTINCT ON (s.customer_id)
clause ensures that only the first occurrence of eachcustomer_id
is included in the result set. - As a result, the query returns a unique list of
customer_id
along with the firstproduct_name
they ordered before joining as a member. - The SQL query retrieves the
customer_id
along with the total count of items ordered (total_item
) and the total amount spent (total_amont
) by each customer. - It retrieves data from the
sales
table and joins it with themenu
table based on matchingproduct_id
. - It also joins the
sales
table with themembers
table based on matchingcustomer_id
. - The results are filtered based on the condition that the
order_date
in thesales
table is less than thejoin_date
of the customer in themembers
table. - The
COUNT(S.product_id)
function calculates the number of occurrences of eachproduct_id
in thesales
table, giving the total number of items ordered by each customer. - The
SUM(M.price)
function calculates the sum of theprice
from themenu
table, providing the total amount spent by each customer. - Results are grouped by
customer_id
to get the totals for each customer. - The query then presents the
customer_id
,total_item
, andtotal_amont
for each customer who placed orders before joining as a member. - Finally, the results are sorted in ascending order based on the
customer_id
. - The SQL query retrieves the
customer_id
and calculates the total points (total_points
) earned by each customer based on their purchases from thesales
andmenu
tables. - It retrieves data from the
sales
table and joins it with themenu
table based on matchingproduct_id
. - The query uses a
CASE
statement to differentiate between 'sushi' and other products. - If the product name is 'sushi', the price is multiplied by 2 to give double points.
- Otherwise, the regular price is considered.
- The
SUM
function calculates the total points for each customer by adding up the points earned from their purchases. - The total points are then multiplied by 10 to give a scaled value.
- Results are grouped by
customer_id
to get the total points for each customer. - The query then presents the
customer_id
and the scaledtotal_points
for each customer based on their purchases. - Finally, the results are sorted in ascending order based on the
customer_id
.
SELECT S.customer_id, SUM(M.price) AS total_amnt
FROM sales S
JOIN menu M ON S.product_id = M.product_id
GROUP BY S.customer_id
ORDER BY customer_id
SELECT customer_id, COUNT(DISTINCT order_date) AS No_Days
FROM sales
GROUP BY customer_id
WITH CTE AS
(SELECT S.customer_id,DENSE_RANK() OVER(PARTITION BY S.customer_id ORDER BY S.order_date)AS rn,M.product_name
FROM sales S
JOIN menu M ON S.product_id=M.product_id)
SELECT customer_id,product_name
FROM CTE
WHERE rn=1
SELECT M.product_name,COUNT(S.product_id)AS most_ordered
FROM Sales S
JOIN menu M ON S.product_id=M.product_id
GROUP BY M.product_name
ORDER BY most_ordered DESC
LIMIT 1
WITH CTE AS
(SELECT S.customer_id,M.product_name,COUNT(M.product_id)AS order_count,DENSE_RANK() OVER(PARTITION BY S.customer_id ORDER BY COUNT(S.product_id)DESC) AS rnk
FROM sales S
JOIN menu M ON S.product_id=M.product_id
GROUP BY S.customer_id,M.product_name)
SELECT customer_id,product_name,order_count
FROM CTE
WHERE rnk=1
SELECT DISTINCT ON (s.customer_id)
s.customer_id,
m.product_name
FROM sales s
JOIN members mbr ON s.customer_id = mbr.customer_id
JOIN menu m ON s.product_id = m.product_id
WHERE s.order_date > mbr.join_date
ORDER BY s.customer_id;
SELECT DISTINCT ON (s.customer_id)
s.customer_id,
m.product_name
FROM sales s
JOIN members mbr ON s.customer_id = mbr.customer_id
JOIN menu m ON s.product_id = m.product_id
WHERE s.order_date < mbr.join_date
ORDER BY s.customer_id;
SELECT S.customer_id,
COUNT(S.product_id) AS total_item,
SUM(M.price) AS total_amont
FROM sales S
JOIN menu M ON S.product_id=M.product_id
JOIN members ME ON S.customer_id=ME.customer_id
WHERE S.order_date<ME.join_date
GROUP BY S.customer_id
ORDER BY S.customer_id
SELECT s.customer_id,
SUM(CASE
WHEN m.product_name = 'sushi' THEN price * 2
ELSE price
END) * 10 AS total_points
FROM sales s
JOIN menu m ON s.product_id = m.product_id
GROUP BY s.customer_id
ORDER BY s.customer_id;
WITH dates_cte AS (
SELECT
customer_id,
join_date,
join_date + INTERVAL '6 days' AS valid_date,
DATE_TRUNC('month', '2021-01-31'::DATE) + INTERVAL '1 month' - INTERVAL '1 day' AS last_date
FROM members
)
SELECT
s.customer_id,
SUM(CASE
WHEN m.product_name = 'sushi' OR (s.order_date BETWEEN dates.join_date AND dates.valid_date) THEN 2 * 10 * m.price
ELSE 10 * m.price END) AS points
FROM sales s
INNER JOIN dates_cte AS dates
ON s.customer_id = dates.customer_id
AND dates.join_date <= s.order_date
AND s.order_date <= dates.last_date
INNER JOIN menu m
ON s.product_id = m.product_id
GROUP BY s.customer_id
ORDER BY s.customer_id;
- The SQL query starts by creating a Common Table Expression (CTE) named
dates_cte
. - Within the CTE, it selects
customer_id
,join_date
,join_date + INTERVAL '6 days'
asvalid_date
, and the last day of the month for the date '2021-01-31' aslast_date
. - The CTE is used to generate date ranges for each customer, from their
join_date
to 6 days later, and the last day of the month for January 2021. - Next, the main query selects the
customer_id
and calculates the total points (points
) earned by each customer based on their purchases from thesales
andmenu
tables. - It retrieves data from the
sales
table and joins it with thedates_cte
CTE using a combination ofJOIN
andON
clauses. - The query uses a
CASE
statement to differentiate between 'sushi' purchases and other products. - If the product name is 'sushi' or the order date falls within the range of
join_date
tovalid_date
, the points are calculated as 2 times 10 times the price of the product. - Otherwise, for other products, the points are calculated as 10 times the price of the product.
- The
SUM
function calculates the total points for each customer by adding up the points earned from their purchases. - Results are grouped by
customer_id
to get the total points for each customer. - The query then presents the
customer_id
and the calculatedpoints
for each customer based on their purchases. - Finally, the results are sorted in ascending order based on the
customer_id
.
WITH customer_member_status AS (
SELECT
s.customer_id,
s.order_date,
m.product_name,
m.price,
CASE
WHEN mbr.join_date <= s.order_date THEN 'Y'
ELSE 'N'
END AS member
FROM sales s
INNER JOIN menu m ON s.product_id = m.product_id
LEFT JOIN members mbr ON s.customer_id = mbr.customer_id
)
SELECT
customer_id,
order_date,
product_name,
price,
member
FROM customer_member_status
ORDER BY
customer_id,
member DESC,
order_date;
- The SQL query starts by creating a Common Table Expression (CTE) named
customer_member_status
. - Within the CTE, it selects
customer_id
,order_date
,product_name
,price
, and uses aCASE
statement to determine whether the customer is a member ('Y') or not ('N') based on their join date in themembers
table. - The
sales
table is aliased ass
, themenu
table asm
, and themembers
table asmbr
. - The query performs inner joins between
sales
andmenu
tables on matchingproduct_id
and left join betweensales
andmembers
tables on matchingcustomer_id
. - For each row, the
CASE
statement checks if thejoin_date
from themembers
table is less than or equal to theorder_date
from thesales
table. - If true, it assigns 'Y' (member) to the
member
column, otherwise 'N' (non-member). - Next, the main query selects the
customer_id
,order_date
,product_name
,price
, andmember
from thecustomer_member_status
CTE. - Results are ordered first by
customer_id
in ascending order, then bymember
in descending order (so members appear first), and finally byorder_date
in ascending order. - The query presents the final results with the selected columns for each customer, showing whether they are a member or not for each order and sorted accordingly.
Danny needs additional information about the ranking of customer products. However, he specifically requires null ranking values for non-member purchases, as he is not interested in ranking customers who are not yet part of the loyalty program.
WITH customers_data AS (
SELECT
sales.customer_id,
sales.order_date,
menu.product_name,
menu.price,
CASE
WHEN members.join_date > sales.order_date THEN 'N'
WHEN members.join_date <= sales.order_date THEN 'Y'
ELSE 'N'
END AS member_status
FROM sales
LEFT JOIN members ON sales.customer_id = members.customer_id
INNER JOIN menu ON sales.product_id = menu.product_id
)
SELECT
customer_id,
order_date,
product_name,
price,
member_status AS member,
CASE
WHEN member_status = 'N' THEN NULL
ELSE RANK() OVER (PARTITION BY customer_id, member_status ORDER BY order_date)
END AS ranking
FROM customers_data;
- The SQL query starts by creating a Common Table Expression (CTE) named
customers_data
. - Within the CTE, it selects
customer_id
,order_date
,product_name
,price
, and uses aCASE
statement to determine themember_status
based on whether thejoin_date
in themembers
table is greater, equal, or less than theorder_date
in thesales
table. - The
sales
table is aliased assales
, themembers
table asmembers
, and themenu
table asmenu
. - The query performs left join between
sales
andmembers
tables on matchingcustomer_id
and inner join betweensales
andmenu
tables on matchingproduct_id
. - For each row, the
CASE
statement checks thejoin_date
from themembers
table and compares it to theorder_date
from thesales
table, assigning 'Y' if thejoin_date
is less than or equal to theorder_date
(customer is a member) and 'N' otherwise (non-member). - Next, the main query selects the
customer_id
,order_date
,product_name
,price
, andmember_status
from thecustomers_data
CTE. - It also uses a
CASE
statement to calculate theranking
for each customer's orders if they are a member ('Y'). TheRANK()
function is used withPARTITION BY
to partition the ranking within each customer and their membership status and ordered byorder_date
. - If the customer is not a member ('N'), the
ranking
is set toNULL
. - Results are presented with the selected columns and
ranking
for each customer's orders, showing whether they are a member or not and the order of their orders.
- Customer Spending: The total amount spent by each customer at Danny's Diner varies widely. Some customers have spent significantly more than others, indicating potential high-value customers or loyal patrons.
- Customer Visits: The number of days each customer visited the restaurant also varies, showing different patterns of customer engagement. Some customers visit frequently, while others may visit less often.
- First Purchases: Understanding the first items purchased by each customer can help Danny identify popular entry items and potentially attract new customers.
- Most Popular Item: The most purchased item on the menu is valuable information for Danny. He can use this insight to optimize inventory management and capitalize on the popularity of the item.
- Personalized Recommendations: Knowing the most popular item for each customer allows Danny to make personalized menu suggestions, enhancing the dining experience for his customers.
- Customer Loyalty: The data about purchases before and after joining the loyalty program helps Danny evaluate the effectiveness of the loyalty program and its impact on customer behavior.
- Bonus Points for New Members: By offering 2x points to new members during their first week, Danny incentivizes more spending, encouraging members to engage more actively with the program.
- Member Points: The points earned by each member can be used to assess their loyalty and potentially offer targeted rewards and promotions.
- Data Visualization: Creating visualizations based on the data and insights can further aid Danny in understanding trends and making data-driven decisions.
- Customer Segmentation: By analyzing customer spending habits, Danny can segment his customer base and tailor marketing strategies accordingly.
- Expanding Membership: Danny can use the insights from this data to refine his loyalty program and attract new members, leveraging the success of the program.
- Inventory Management: Knowing the most popular and least popular items can help Danny optimize his inventory, reduce wastage, and maximize profits.
- Menu Optimization: Danny can use the data to evaluate the performance of different menu items and consider introducing new dishes based on customer preferences.
- Customer Engagement: Analyzing customer behavior can help Danny understand what keeps customers coming back and help him create more engaging experiences.
- Long-Term Growth: By leveraging data analysis, Danny can make informed decisions that contribute to the long-term growth and success of Danny's Diner.