We will visit different steps involved in MLOps pipeline.
Machine Learning Model Operationalization Management - MLOps, as a DevOps extension, establishes effective practices and processes around designing, building, and deploying ML models into production.
In paper titled Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology, the authors introduce a methodology or a process model for development of ML applications called CRoss-Industry Standard Process model for the development of Machine Learning applications with Quality assurance methodology (CRISP-ML(Q)). CRISP-ML(Q) offers ML community a standard process to streamline ML and data science projects making results reproducible. It is designed for development of ML applications where ML model is deployed and maintained as part of product or service.
CRISP-ML(Q) process model consits of 6 phases:
- Business & Data Understanding
- Data Preparation
- Modelling
- Evaluation
- Deployment
- Monitoring and Maintenance
For each phase, the flow chart below explains quality assurance approach in CRISP-ML(Q). In the first step, clear objective for the current phase are defined, followed by taking steps to initiate the task, followed by identifying the risks that might negatively impact the efficiency and success of the ML application (e.g., bias, overfitting, lack of reproducibility, etc.), quality assurance methods to mitigate risks when these risks need to be diminished (e.g., cross-validation, documenting process and results, etc.).
The ML model deployment includes following tasks :
- Define inference hardware and optimize ML model for target hardware
- Evaluate model under production condition
- Assure user acceptance and usability
- Minimize the risks of unforseen errors
- Deployment strategy
A wise person on the Internet once said: deploying is easy if you ignore all the hard parts. If you want to deploy a model for your friends to play with, all you have to do is to create an endpoint to your prediction function, push your model to AWS, create an app with Streamlit or Dash. The hard parts include making your model available to millions of users with a latency of milliseconds and 99% uptime, setting up the infrastructure so that the right person can be immediately notified when something went wrong, figuring out what went wrong, and seamlessly deploying the updates to fix what’s wrong. Source by Chip Huyen
Source: https://ml-ops.org/content/three-levels-of-ml-software
Model serving is a way to integrate the ML model in a software system. There are two aspects for deploying ML system in a production environment. First deploying pipeline for automated retraining and second providing endpoint to ingest input data and provide predictions using ML model.
There are 5 popular model serving patterns to put ML model into production
- Model-as-Service
- Model-as-Dependency
- Precompute
- Model-on-Demand
- Hybrid-Serving
There are 2 popular deployment strategies
- Deploying ML models as Docker Containers
- Deploying ML Models as Serverless Functions
- CRISP-ML(Q) introduction blog my ml-ops.org
- Chapter 7: Model Deployment by Chip Huyen
- Three Levels of ML Software
- Bringing ML to Production (Slides)
- Serving and Case Studies
In this project, our focus will be on different approaches we can serve ML model. MLOps.toys provides a comprehensive survey of different frameworks that exists for Model Serving. The focus of this project would be to explore all 10+ frameworks and many more along with cloud services for serving and testing the endpoint of deployed ML model.
We will start with simple exercise of how to make use of Github Actions for CI/CD. As we go down, we will integrate various technologies such as Github Actions, Docker, PyTest, Linting while testing different ML model serving frameworks visiting best practices.
-
Makefile : In this exercise, we will automate the task of installing packages, linting, formatting and testing using Makefile.
Technologies : Pytest, Make
-
Github Actions Makefile: In this exercise, we will automate the task of installing packages, linting, formatting and testing using github actions.
Technologies: Pytest, Make, Github Actions
-
Github Actions Docker: In the exercise, we will implement the following:
-
Containerize a GitHub project by integrating a Dockerfile and automatically registering new containers to a Container Registry.
-
Create a simple load test for your application using a load test framework such as locust or loader io and automatically run this test when you push changes to a staging branch
Technologies: Docker, Github Actions, Locust
-
-
FastAPI Azure: In this exercise, we will build a fastapi ML application and deploy it with continuous delivery on Azure using Azure App Services and Azure DevOps Pipelines.
Technologies: Docker, FastAPI, Continuous Delivery using Azure App Services, Azure DevOps Pipelines
-
FastAPI GCP: In this exercise, we will build a fastapi ML application and deploy it with continuous delivery on GCP using Cloud Run and Cloud Build.
Technologies: Docker, FastAPI, Continuous Delivery using GCP Cloud Run and Cloud Build
-
FastAPI AWS: In this exercise, we will build a fastapi ML application and deploy it with continuous delivery on AWS using AWS using Elastic Beanstalk and Code Pipeline.
Technologies: Docker, FastAPI, sklearn, Continuous Delivery using Elastic Beanstalk and Code Pipeline
-
AWS Terraform Deploy: To be implemented
-
FastAPI GKE: In this project, we will deploy a sentiment analyser model using fastapi on GCP using GKE.
-
Containerizing different components of projects
-
Writing tests and testing individual modules using
pytest
-
Using trunk for automatic code checking, formatting and liniting
-
Deploying application on GKE
Technologies: Docker, FastAPI, HuggingFace Transformer model, Pytest, Trunk, GKE
-
-
FastAPI Kubernetes Monitoring: In this exercise, we will introduce Kubernetes. Using Kubernetes deploy fastapi application and monitor this application using
Prometheus
andGrafana
, following best practices of writing tests and trigger a CI workflow using github actions.Technologies: Docker, Docker-compose, Pytest, FastAPI, HuggingFace Transformer model, Continuous Integration using Github Actions, Kubernetes, Prometheus, Grafana
-
BentoML Deploy: In this exercise, we will use BentoML library to deploy the sentiment classification model from Hugging Face 🤗 on following services.
Technologies: Docker, Pytest, FastAPI, HuggingFace Transformer model, AWS Lambda, Azure Functions, Kubernetes, BentoML
-
Cortex Deploy: In this exercise, transformers sentiment classifier fastapi application is deployed using Cortex two different APIs.
Technologies: Docker, Cortex, FastAPI, HuggingFace Transformer model, Continuous Integration using Github Actions, Trunk.io linter
-
Serverless Deploy: In this exercise, hugging face transformers sentiment classifier FastAPI application is deployed using Serverless Framework.
Technologies: Docker, Serverless Framework, FastAPI, HuggingFace Transformer model, Continuous Integration using Github Actions, Trunk.io linter
-
Bodywork Train and Deploy: This exercise contains a Bodywork project that demonstrates how to run a ML pipeline on Kubernetes, with Bodywork. The example ML pipeline has two stages:
-
Run a batch job to train a model.
-
Deploy the trained model as service with a REST API.
Technologies: Bodywork, Sklearn, Flask, Kubernetes, Cronjob
-
-
KServe Deploy: In this exercise, we will deploy the sentiment analysis huggingface transformer model. Since MLServer does not provide out-of-the-box support for PyTorch or Transformer models, we will write a custom inference runtime to deploy this model and test the endpoints.
Technologies: Docker, KServe, HuggingFace Transformer model, Pytest, Kubernetes, Istio, Knative, Kind, TorchServe
TorchServe: Deploying hugging face transformer model using torchserve.
-
MLServer Deploy: In this exercise, we will deploy the sentiment analysis huggingface transformer model. Since MLServer does not provide out-of-the-box support for PyTorch or Transformer models, we will write a custom inference runtime to deploy this model and test the endpoints.
Technologies: Docker, MLServer, HuggingFace Transformer model
-
Ray Serve Deploy: In this exercise, we will deploy the sentiment analysis huggingface transformer model using Ray Serve so it can be scaled up and queried over HTTP using two approaches.
-
Ray Serve default approach
-
Ray Serve with FastAPI
Technologies: Docker, Ray Serve, FastAPI, HuggingFace Transformer model
-
-
Seldon core Deploy: In this exercise, we will deploy a simple sklearn iris model using Seldon Core. We will deploy using two approaches and test the endpoints .
-
Seldon core default approach
-
V2 Inference protocol
Technologies: Docker, Seldon Core, Sklearn model, Kubernetes, Istio, Helm, Kind
-
-
Nvidia Triton Deploy: Coming Soon