Skip to content

benitomartin/mlops-aws-stroke

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLOPS STROKE PREDICTION ⚱️

aws

This is a personal MLOps project based on a Kaggle dataset for stroke prediction.

Feel free to ⭐ and clone this repo 😉

Tech Stack

Visual Studio Code Jupyter Notebook Python Pandas NumPy Matplotlib scikit-learn Flask Docker Anaconda Linux AWS Git

Project Structure

The project has been structured with the following folders and files:

  • data: raw and clean data
  • src: source code. It is divided into:
    • Notebooks with EDA, Baseline Model and AWS Pipelines incl. unit testing
    • code_scripts: processing, training, evaluation, docker container, serving and lambda
  • requirements.txt: project requirements

Project Description

The dataset was obtained from Kaggle and contains 5110 rows and 10 columns to detect stroke predictions. To prepare the data for modelling, an Exploratory Data Analysis was conducted where it was detected that the dataset is very imbalance (95% no stroke, 5% stroke). For modeling, the categorical features where encoded, XGBoost was use das model and the best roc-auc threshold was selected for the predictions using aditionally threshold-moving for the predictions due to the imbalance. The learning rate was tuned in order to find the best one on the deployed model.

Pipeline Deployment

All pipelines where deployed on AWS SageMaker, as well as the Model Registry and Endpoints. The following pipelines where created:

  • ✅ Preprocessing
  • ✅ Training
  • ✅ Tuning
  • ✅ Evaluation
  • ✅ Model Registry
  • ✅ Model Conditional Registry
  • ✅ Deployment

Additionally the experiments were tracked on Comel ML.