This project implements machine learning models to predict the likelihood of stroke occurrence based on various risk factors. It explores techniques for data exploration, pre-processing, model development, and hyperparameter tuning.
Implements binning for numerical variables. Performs feature encoding for categorical data. Visualizes data distributions using kernel density estimation (KDE). Applies data transformations like MinMaxScaler and log scaling for improved modeling.
Addresses class imbalance issues to optimize model performance.
Implements and compares various classification models including:
- Logistic Regression
- Random Forest Classifiers
- Decision Tree Classifiers
- KNeighbors Classifiers
- XGBoost Classifier
- Gaussian Naive Bayes Utilizes GridSearchCV for hyperparameter tuning to optimize model performance.
Implements Voting Classifier to leverage the strengths of multiple models for a more robust prediction.
--- | RUS | ROS | SMOT | Original |
---|---|---|---|---|
Logistic Regression | 75% | 74% | 74% | - |
KNeighbors Classifiers | 66% | 84% | 78% | - |
XGBoost Classifier | 75% | 91% | 92% | - |
ANN | 72% | - | 88% | 93.9% |