model-serving

Star

Here are 92 public repositories matching this topic...

vllm-project / vllm

Sponsor

Star

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated Aug 9, 2025
Python

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Aug 8, 2025
Python

kserve / kserve

Star

Standardized Serverless ML Inference Platform on Kubernetes

kubernetes machine-learning tensorflow sklearn pytorch artificial-intelligence xgboost k8s service-mesh hacktoberfest istio model-serving kubeflow mlops knative model-interpretability kserve genai llm-inference

Updated Aug 8, 2025
Python

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

machine-learning deep-learning inference-engine model-deployment model-serving distributed-training federated-learning mlops edge-ai ai-agent on-device-training

Updated Jul 7, 2025
Python

ModelTC / LightLLM

Star

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

nlp deep-learning llama gpt model-serving llm openai-triton

Updated Aug 8, 2025
Python

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated May 21, 2025
Python

mlrun / mlrun

Star

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

python kubernetes workflow data-science machine-learning data-engineering model-serving mlops experiment-tracking mlops-workflow

Updated Aug 8, 2025
Python

thu-pacman / chitu

Star

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

gpu pytorch model-serving llm llm-serving deepseek

Updated Aug 8, 2025
Python

basetenlabs / truss

Star

The simplest way to serve AI/ML models in production

open-source machine-learning packaging artificial-intelligence falcon easy-to-use whisper inference-server model-serving inference-api stable-diffusion wizardlm

Updated Aug 9, 2025
Python

vllm-project / vllm-ascend

Sponsor

Star

Community maintained hardware plugin for vLLM on Ascend

inference transformer model-serving mlops ascend llm llmops llm-serving vllm

Updated Aug 9, 2025
Python

mosecorg / mosec

Star

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

python rust machine-learning deep-learning mxnet tensorflow gpu cv pytorch tts hacktoberfest model-serving nerual-network machine-learning-platform jax mlops llm llm-serving

Updated Aug 1, 2025
Python

underneathall / pinferencia

Star

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

Updated Feb 14, 2023
Python

ServerlessLLM / ServerlessLLM

Star

Serverless LLM Serving for Everyone.

cuda pytorch model-serving model-as-a-service huggingface-transformers large-language-models serverless-inference

Updated Aug 6, 2025
Python

trustgraph-ai / trustgraph

Star

Take control of your context. Orchestrate LLMs through APIs or private deployments with context automation using your data. Run anywhere - local, cloud, or bare metal.

context data-sovereignty model-serving llm-deployment context-management graphrag llm-orchestration ai-sovereignty trustgraph knowledge-package agentic-graphrag context-engineering

Updated Aug 8, 2025
Python

eightBEC / fastapi-ml-skeleton

Star

FastAPI Skeleton App to serve machine learning models production-ready.

python machine-learning python3 model-serving fastapi

Updated Jun 19, 2025
Python

Lightning-Universe / stable-diffusion-deploy

Star

Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

model-serving stable-diffusion