This is the repo for all my code as part of this case study
-> Performed Exploratory data analysis on 47 million rows of data from the "Synthetic Financial Datasets For Fraud Detection" by Kaggle:
-> Profiled the dataset and found issues with it. Created visualizations on the full dataset to confirm these issues and answer other questions to help in data cleaning.
-> Cleaned the data by removing unwanted information, outliers and by balancing the dataset.
-> Performed feature engineering by normalizing the "amount" column and encoding categorical variables.
-> Performed prediction on pre-processed dataset using Logistic Regression and Random Forest algorithms. Visualized the results before and after data cleaning for both models.
Evaluation Metrics for Logistic Regression
Accuracy 0.9034356576539947
Precision 0.8523560209424084
Recall 0.9791499599037691
Classification Report for Random Forest
Precision Recall F1-score Support
0 1.00 0.99 0.99 2425
1 0.99 1.00 0.99 2494
Accuracy 0.99 4919
Macro Avg 0.99 0.99 0.99 4919
Weighted Avg 0.99 0.99 0.99 4919