- link: https://www.kaggle.com/datasets/ethon0426/lending-club-20072020q1/data
- loan records of the largest P2P lending platform between 2007 and 2020Q3 with a size of 2.9M rows and 141 columns
- etl_pandas.ipynb: to perform ETL on the dataset using Python Pandas
- etl_pyspark.ipynb: to perform ETL on the dataset using PySpark
- ml_training.py: to train a ML model to predict the probability of default
- oos_evaluate.ipynb: to evaluate the trained model with the out-of-sample loan data by ROI and ROC AUC.
LCLoanAnalysis (Power BI).pdf