Skip to content

Data-driven analysis to predict birthweight, leveraging computational analytics and machine learning techniques to identify critical factors, evaluate correlations, and implement actionable public health insights.

Notifications You must be signed in to change notification settings

stefagnone/-Birthweight-Prediction-Modeling-Case-Study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Overview

This project focuses on leveraging computational analytics and machine learning techniques to analyze and predict birthweight, a crucial metric for neonatal health. The analysis aims to identify key factors influencing birthweight, assess strong correlations, and implement advanced machine learning models to derive actionable insights for improving public health outcomes.

The study employs data-driven methods to address the following:

  • Identifying correlations with birthweight, including strong positive and negative relationships.
  • Transforming birthweight data to evaluate improvements in correlation metrics.
  • Implementing classification models to predict low birthweight and evaluate their performance.

Key deliverables include exploratory data analysis (EDA), feature engineering, multiple model implementations, and the final predictions submitted to Kaggle.


Key Features

  • Exploratory Data Analysis (EDA): Insights into the dataset through descriptive statistics, histograms, and correlation matrices.
  • Feature Engineering: Designed and transformed features to enhance predictive power and address data challenges.
  • Modeling Techniques: Developed and evaluated multiple classification models, including Logistic Regression, Ridge Classification, and Random Forest.
  • Confusion Matrix Analysis: Analyzed model performance and errors to prioritize correct predictions for low birthweight cases.
  • Actionable Insights: Provided recommendations based on model results to inform public health strategies.

Repository Structure

|-- Compagnone_Stefano_A2.ipynb       # Jupyter Notebook containing analysis and model development
|-- Compagnone_Stefano_A2.html        # HTML version of the notebook for easy viewing
|-- birthweight.csv                   # Dataset used for training and analysis
|-- submission.csv                    # Final predictions submitted to Kaggle
|-- Images/
    |-- correlation_matrix.png        # Heatmap of feature correlations
    |-- confusion_matrix.png          # Confusion matrix of the final model
    |-- feature_histograms.png        # Histogram visualizations for continuous variables

Key Insights

  • Correlation Analysis: Strong correlations identified between birthweight and factors such as parental age, education level, and health-related behaviors (e.g., smoking and drinking).
  • Threshold Analysis: Explored birthweight thresholds distinguishing healthy and non-healthy categories, supported by public health research.
  • Model Interpretability: Highlighted impactful features, such as maternal education and prenatal visits, to provide actionable insights for healthcare professionals.

Technologies Used

Programming Language: Python Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn, PHIK Models: Logistic Regression, Ridge Classifier, K-Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machine (GBM)


Contact

For any inquiries or further collaboration, please contact: Stefano Compagnone Email: stefanocompagnone98@gmail.com

About

Data-driven analysis to predict birthweight, leveraging computational analytics and machine learning techniques to identify critical factors, evaluate correlations, and implement actionable public health insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published