- About the Project
- Inferences derived from Exploratory Data Analysis.
- Results
- Contributing
- License
- Contact
- Acknowledgements
Photo by jesse orrico on Unsplash
-
It is a well-known fact that Heart Diseases are currently the leading cause of death across the world.
-
The development of a computational system that can predict the presence of heart diseases will significantly reduce the mortality rates & substantially reduce the costs of health care.
-
Such information predicted well in advance, can provide important insights to doctors.
-
For more information regarding this project. - Click here.
Source: UCI Heart Disease Dataset
- Each data-set from Cleveland, Hungary, Switzerland, and Long-Beach consisted of 76 attributes.
- However, it is recommended to use only 14 for our analysis.
- Moreover, later we will find out that only 6 attributes have a significant effect.
- In this notebook, classifiers were built using one combined dataset and the performance evaluation is carried out using cross-validation techniques.
- In addition to this, heart-disease prediction is carried out using logistic regression, random forest, neural-network, and decision tree.
Total number of observations of healthy people and people suffering from heart-diseases.
It can be observed that heart disease is uniformly spread out across age. In addition to that the median age for patients estimated was 56 with youngest and oldest being 29 and 77, respectively.
It can be observed from the plots that the median age of the people exhibiting heart diseases is less than the healthy ones. Moreover, it can be observed that the distribution of patients exhibiting heart diseases is slightly skewed. Hence, we can use age as a predictive feature.
It can be observed that there is no major difference between the distributions of Rest ECG of healthy people and the ones exhibiting heart diseases.
Moreover, it can be observed here that majority of the people exhibiting heart diseases have their serum cholesterol in the range of 200 - 300 mg/dl
People exhibiting heart diseases generally have higher maximum heart rates as compared to health people.
Majority of heart diseased patients have ST-Depression of 0.1
Most of the people who have 0-Major vessels are suffering from heart diseases
Heart diseases among females are higher as compared to males.
Majority of people suffering from heart-diseases have chest pain of type 1 or 2.
No major difference in fasting blood sugar.
People having Rest ECG-1 have a higher probability of suffering from heart diseases.
People with No exercise induced angina have a higher probability of getting heart diseases.
People with Peak Exercise slope 2 have a higher probability of exhibiting heart diseases.
People with fixed defect Thalassemia have a higher probability to suffer from heart diseases
It can be observed that only a few parameters such as Chest pain type, Gender, Exercise induced angina, Number of vessels, and ST depression have a significant effect. Hence, dropping other parameters
- Logistic Regression
- Accuracy: 78.54%
- Sensitivity: 84.77%
- Specificity: 72%
- Random Forest
- Accuracy: 89.76%
- Sensitivity: 85.59%
- Specificity: 95.4%
- Neural Network
- Accuracy: 81.95%
- Sensitivity: 76.98%
- Specificity: 89.87%
- Decision Tree
- Accuracy: 86.65%
- Sensitivity: 81.4%
- Specificity: 87.56%
Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/amazing-feature
) - Commit your Changes (
git commit -m 'feat: some amazing feature'
) - Push to the Branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
Maharsh Suryawala - Portfolio
Project Link: https://github.com/MaharshSuryawala/Heart-Disease-Risk-Prediction