- Features
- Null values detection
- Duplicated values detection
- Column removal
- Discretization
- Creating new features
- Feature extraction
- Datetime feature extraction
- Credit card feature extraction
-
Univariate Analysis
- Target
- Categorical features
- Numerical features
-
Bivariate Analysis
- Target analysis
- Amount of activity analysis
- Time analysis
- Correlation matrix
- Association matrix
-
Column removal
-
Log transform
-
Categorical encoding
- Binary encoding
- Weight of evidence encoding
- Ordinal encoding
-
Train-test split
Target is imbalanced
Methods performed
- No changes
- Random under sampling
- Random over sampling
- SMOTE-Tomek links
- Class weights
- Random Forest Classifier
- Logistic Regression Classifier
- Naive Bayes
- Decision Tree Classifier
- Support Vector Machine (SVM) Classifier
- K-nearest neighbor (KNN) Classifier
- Confusion matrix
- AUC curve
- Classification metrics
- Decision boundary
Results on random forest classifier for test data