This project aims to build a predictive model to detect fraudulent credit card transactions using machine learning techniques. The dataset used in this project contains transaction details, and the target is to classify transactions as either fraud or non-fraud.
-
Data Loading and Preprocessing
- Load the dataset and clean the data.
- Convert date and time fields to appropriate formats.
- Encode categorical variables using Label Encoding and WOE Encoding.
-
Data Balancing
- Address class imbalance through downsampling of the majority class.
-
Feature Scaling
- Standardize the features using
StandardScaler
.
- Standardize the features using
-
Model Training
- Train multiple machine learning models:
- Logistic Regression
- Decision Tree
- Random Forest
- Gaussian Naive Bayes
- Support Vector Machine (SVC)
- XGBoost Classifier
- Train multiple machine learning models:
-
Evaluation
- Compare the models using accuracy and visualize the results in bar charts.
- Gender distribution of fraud and non-fraud cases.
- Fraud distribution across transaction hours.
- Original vs. downsampled class distributions.
- Model accuracy comparison.
pandas
numpy
seaborn
matplotlib
category_encoders
scikit-learn
xgboost
- Install required dependencies:
pip install category_encoders scikit-learn xgboost
- Load the dataset and follow the workflow outlined in the code.
Developed by Tonmoy Day Sarkar as part of a machine learning project on fraud detection.