Diabetes is among the most prevalent chronic diseases in the world, impacting millions of people each year and exerting a significant financial burden on the economy. Diabetes is a serious chronic disease in which individuals lose the ability to effectively regulate levels of glucose in the blood, and can lead to reduced quality of life and life expectancy. Our project goal Classify the people that will infection Diabetes Type is Diabetes or No Diabetes .
This project originates from the Data Science Bootcamp (T5) to predict diabets infection that were diabets or not through Classification.The data provided by kaggle has the main information about diabets patient.And it explains whether patients have diabetes or not. At first, we cleaned and represented the data to better understand and interpret the data. We represented strong relationships with the target of the project.
The dataset contains 254000 observations and 22 columns.
About target column (Diabets) have two classes:
Class 1 | Class 2 |
---|---|
0 (NO Diabetes) |
1 (Diabetes) |
213703 | 39977 |
The classification algorithms that has been used in this project:
- Logistic Regression:
- Logistic Regression (Random over Sampler)
- Logistic Regression (Random under Sampler)
- Logistic Regression (class weight : balanced)
- Logistic Regression (SMOTE)
- K-Neatest Neighbors
- Decision Tree
- Random Forest
- Bagging
- Naive Bayes
- Ensembling with Voting (HARD,SOFT,AVERAGE)
- Python and Jupyter Notebook
- Pandas and dataprep for data manipulation
- Seaborn and Plotly for visuializations
- Sklearn for ML algorithms.
The project process and result has presented. To see the presentation slides click HERE.