Machine learning lifecycle lab

A collection of notebooks for walking through the typical ML lifecycle from data cleaning through to model hosting using Amazon SageMaker.

A typical ML lifecycle will look something like...

Identify a business problem or question which ML can answer
Identify the data sources available to describe the problem space
Acquire and cleanse the data or a sample of the data
Engineer a feature set from the data or data sample so that everything has meaning and relevance
Apply this cleansing and feature engineering logic to the full data set
Spot check multiple ML algorithms against a sample of the feature set to assess which algorithm is likely to give the best result
Select one or more algorithms and perform hyperparameter optimization to determine the best configuration parameters, use a sample of the feature set
Train a model using the best performing algorithm and hyperparameters on the full training feature set
Test the model on a control or test feature set to produce a baseline for performance
Deploy the model for consumption by the business (Lambda, mobile device, container, etc)
Consider how future observations will be engineered in preparation for inference
Monitor the model for context drift

For this collection of labs we will start by defining a business problem and then work through the process through to model deployment.

[Feature engineering](./01 Feature engineering.ipynb) This notebook walks through acquiring the data, cleaning it and then engineering a base feature set which can then be prepared for ML training.
[ML algorithm spot check](./02 Algorithm spot check.ipynb) This notebook walks through transforming the cleansed data to assess the performance of many ML algorithms.
[Hyperparameter optimization](./03 Hyperparameter tuning.ipynb) This notebook walks through performing HPO on an algorithm and a subset of the feature set before performing a full scale training job.
[Training your model](./04 Training.ipynb) This notebook walks through performing a full scale training job of your model.
[Hosting and usage](./05 Host and infer.ipynb) This notebook walks through how to host a trained model and use it to make predictions.

Resources