A retail company wants to create a roadmap for its sales and marketing activities. To plan for the medium to long term, the company needs to predict the potential value that existing customers will bring in the future.
The dataset consists of information about customers who made their last purchases as OmniChannel (both online and offline) shoppers between 2020 and 2021. The dataset includes the following columns:
master_id
: Unique customer identifierorder_channel
: The channel through which the purchase was made (Android, iOS, Desktop, Mobile, Offline)last_order_channel
: The channel of the last purchasefirst_order_date
: The date of the customer's first purchaselast_order_date
: The date of the customer's last purchaselast_order_date_online
: The date of the customer's last online purchaselast_order_date_offline
: The date of the customer's last offline purchaseorder_num_total_ever_online
: Total number of purchases made by the customer onlineorder_num_total_ever_offline
: Total number of purchases made by the customer offlinecustomer_value_total_ever_offline
: Total amount spent by the customer offlinecustomer_value_total_ever_online
: Total amount spent by the customer onlineinterested_in_categories_12
: List of categories in which the customer made purchases in the last 12 months
- The dataset is read, and data exploration is conducted to understand its structure.
- Outliers in certain columns are identified and replaced with threshold values.
- New variables for the total number of purchases and total customer value for OmniChannel customers are created.
- Date columns are converted to the "date" data type.
- The analysis date is set as 2 days after the last purchase in the dataset.
- A new DataFrame (cltv_df) is created to store customer_id, recency_cltv_weekly, T_weekly, frequency, and monetary_cltv_avg.
- Recency_cltv_weekly, T_weekly, frequency, and monetary_cltv_avg are calculated for each customer.
- The BG/NBD (Beta Geometric/Negative Binomial Distribution) model is fitted to predict expected customer sales within 3 months, 6 months, 9 months, and 12 months.
- The Gamma-Gamma model is fitted to predict the expected average customer value.
- The 6-month CLTV is calculated, and the CLTV values are standardized.
- The top 20 customers with the highest CLTV are identified.
To use the provided code, follow these steps:
- Ensure you have the required libraries installed, including pandas, datetime, lifetimes, and sklearn.
- Load the dataset "data.csv" into the appropriate directory.
- Run the code, which includes data preparation, CLTV modeling, segmentation, and optional functionalization.
The analysis provides insights into customer segments, predicted sales, and customer value. Specific customer segments and top-performing customers are highlighted. The resulting CLTV segments can help the retail company make data-driven decisions and prioritize marketing efforts.