This project provides an example for deploying machine learning models using FastAPI, containerized with Docker and ready for Kubernetes deployment. The setup includes a load testing script to verify the API's performance under stress (parallel inference requests).
The API endpoint is built using FastAPI and serves your machine learning model. Key features:
- Loads a pre-trained model for inference
- Handles data preprocessing and model inference
- Exposes a POST endpoint that accepts input data
- Returns model predictions
See below on how to adapt this setup for your own models.
The Dockerfile provides containerization for the API endpoint:
- Python 3.10 base image
- Installs dependencies from requirements.txt
- Exposes port 8000
- Runs 4 worker processes with Uvicorn exposing models using FastAPI
To build and run the Docker container locally navigate to model/docker directory:
# Build the Docker image
docker build -t api-tester-nvg .
# Run the container
docker run -p 8000:8000 api-tester-nvg
The API will be available at http://localhost:8000/NVG
Endpoint route can be changed in api-endpoint.py
file.
To adapt this setup for your own model:
-
Model Preparation
- Place your trained model weights in the endpoint directory
- Modify the endpoint name and path in
api-endpoint.py
- Update the model loading code in
api-endpoint.py
- Add your own model class
- Load the model as you do normally
- In async function adjust any data preprocessing after the request is received
- Make predictions using the received data
- Return the predictions
-
Deployment
- Update the requirements.txt file with correct dependencies for your model
-
Testing
- Adjust the
loadtesting.py
to load your data and send a sample as a POST request - Update the URL to point to your deployed endpoint
- Adjust the
The load testing script verifies the API's performance and reliability:
- Tests endpoint with parallel requests using ThreadPoolExecutor
- Measures response times and validates predictions
- Configurable number of parallel requests
Before running it you should set open files limit to a higher value, for example 500000 (If using Linux):
ulimit -n 500000
To use the load tester:
- Update the
url
,ip
, andport
variables to point to your deployed endpoint - Configure
parallel_requests
based on your testing needs - Prepare your test data in the required format
The project includes a Kubernetes configuration file (MaaS_on_kubernetes.yaml
) that enables easy deployment and scaling of your model API. The configuration provides:
-
Deployment Setup: Controls how your model API pods are deployed and scaled
- Configurable number of replicas
- Supports deployment on ARM devices (e.g., Raspberry Pi) via nodeSelector
-
Service Configuration: Manages how your API is exposed
- NodePort service type for external access
- Automatic load balancing across pods (Evenly)
- Configurable ports for service access on nodes in the cluster
To deploy your model on Kubernetes:
-
Configure your deployment:
- Modify the deployment name and labels (ensure labels match between deployment selector and pod template)
- Set the appropriate number of replicas
- Push your docker image to (some) registry
- Update the image name to match your Docker image (can be local or in a docker registry)
- Adjust the nodeSelector if you want to deploy the model on edge/arm nodes
-
Apply the configuration:
kubectl apply -f MaaS_on_kubernetes.yaml
The service exposes your API through a NodePort for external access.
- Configure load testing:
- Update the cluster IP and port in your load testing script
This setup enables scalable deployment of your model API endpoint with built-in load balancing and management capabilities through Kubernetes. The api endpoint is available at all nodes in the cluster through configured port and Kubernetes balances requests evenly across all pods (containers deployed on kubernetes).
GNU General Public License v3.0 see LICENSE file for details.