This repository contains the codebase to reproduce the sales forecasting challenges on Corporación Favorita and Walmart.
The challenge is presented in this Kaggle page. A description of the codebase with respect to analytics lifecycle below.
- python 3.X (Tested with 3.7)
- pandas
- numpy
- scipy
- scikit-learn
- lightgbm
- tqdm
- matplotlib
- squarify
- tensorflow 2.x
-
Obtaining environmental data.
Weather data was collected from the World Weather Online API to enrich the Kaggle dataset. The scrpit to access the API can be found fromfavorita/Get_Temperature_Data.ipynb
. You can download the acquired weather data from this Google drive link. -
Data preparation.
Basic pre-processing steps can be found from the first half of the Jupyter notebook atfavorita/1_EDA_Cleaning.ipynb
. -
Data Exploration.
Exploratory data analytics (EDA) is detailed in the second half of the same above notebook -favorita/1_EDA_Cleaning.ipynb
. -
Modeling prototypes.
Prior to the model development, a prototyping was conducted for LGBM and DNN using Google Colab. Notebooks are available atfavorita/2_Modeling_LGBM_Log_Scaled_Prototype.ipynb
andfavorita/3_Modeling_NN_Log_Scaled_Prototype.ipynb
. Base code for LGBM and XGBoost are available atfavorita/base_lgb_model.py
andfavorita/base_xgb_model.py
. -
Utility scripts.
Script to load data:favorita/load_data.py
Script to engineer features:favorita/feature_extractor.py
Script to evaluate:favorita/evaluation.py
!Important: Please create a config.py file in your environment indicating the root folder for the dataset. -
Hyper-parameter search.
Random Search and Grid Search scripts for LGBM can be found atfavorita/base_lgb_model_random_search.py
. -
Predictive models (general model for all stores).
Script for LGBM:favorita/model_lgbm.py
Script for DNN:favorita/model_nn.py
-
Predictive models (per store model).
Script for LGBM:favorita/model_lgbm_per_store.py
Script for DNN:favorita/model_nn_per_store.py
-
Ensemble.
Prototype ensemble is avaialble here:favorita/4_Modeling_LGBM_Ensemble.ipynb