Skip to content

diogit/Data-Analysis-Project

Repository files navigation

Diabetes in Pima Indian Women: understanding the problem and searching for answers through Data Analysis

Data Analysis Project on Pima Indian Women for the Data Analysis and Mining course 2018/2019. The purpose of the project is to try discover new information about the high rate of diabetes that occurred 50 years ago in the population of Pima Indian Women through data exploration and data analysis. The study is focused on 3 main Data Analysis techniques: Linear Regression analysis; Principal Component Analysis; Fuzzy Clustering with Anomalous Patterns analysis.

Data set source: https://www.kaggle.com/uciml/pima-indians-diabetes-database

Prerequisites

The code requires the 'scikit-fuzzy' package to run, available at: https://scikit-fuzzy.readthedocs.io/en/latest/install.html

All of the code is available in Jupyter Notebooks and can be run on the Jupyter program. There's also an auxiliary file 'anomalous_cluster.py' that contains the implementation of the Anomalous Algorithm.