Category | Badges |
---|---|
Languages | |
Framework | |
ML / GPU | |
CI | |
Code Style | |
Tests | |
Docs | |
OS | |
Version | |
License | |
Support | |
GitHub |
NeuroServe is an AI Inference Server built on FastAPI, designed to run seamlessly on GPU (CUDA/ROCm), CPU, and macOS MPS. It provides ready-to-use REST APIs, a modular plugin system, runtime utilities, and a consistent unified response format β making it the perfect foundation for AI-powered services.
π§ Virtualenv quick guide: see docs/README_venv.md.
Detailed API reference and usage examples are available here: β‘οΈ API Documentation
- π REST APIs out-of-the-box with Swagger UI (
/docs
) & ReDoc (/redoc
). - β‘ PyTorch integration with automatic device selection (
cuda
,cpu
,mps
,rocm
). - π Plugin system to extend functionality with custom AI models or services.
- π Runtime tools for GPU info, warm-up routines, and environment inspection.
- π§ Built-in utilities like a toy model and model size calculator.
- π§± Unified JSON responses for predictable API behavior.
- π§ͺ Cross-platform CI/CD (Ubuntu, Windows, macOS, Self-hosted GPU).
repo-fastapi/
ββ app/ # application package
β ββ core/ # settings & configuration
β β ββ config.py # app settings (Pydantic v2)
β ββ routes/ # HTTP API routes
β ββ plugins/ # extensions / integrations
β ββ workflows/ # workflow definitions & orchestrators
β ββ templates/ # Jinja templates (if used)
ββ docs/ # documentation & generated diagrams
β ββ ARCHITECTURE.md # main architecture report
β ββ architecture.mmd # Mermaid source (no fences)
β ββ architecture.html # browser-friendly diagram
β ββ architecture.png # exported PNG (if mmdc installed)
β ββ runtime.mmd # runtime/infra diagram
β ββ imports.mmd # Python import graph (if generated)
β ββ endpoints.md # discovered API endpoints (if generated)
β ββ README_venv.md # virtualenv quick guide
ββ tools/ # project tooling & scripts
β ββ build_workflows_index.py # builds docs/workflows-overview.md
ββ tests/ # test suite
β ββ test_run.py # smoke test for app startup
ββ gen_arch.py # architecture generator script
ββ requirements.txt # runtime dependencies
ββ requirements-dev.txt # dev dependencies (ruff, pre-commit, pytest, ...)
ββ .pre-commit-config.yaml # pre-commit hooks configuration
ββ README.md # project overview & usage
ββ LICENSE # project license
For a deeper look into the internal design, modules, and flow of the system, see: β‘οΈ Architecture Guide
git clone https://github.com/USERNAME/gpu-server.git
cd gpu-server
python -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows
.venv\Scripts\activate
pip install -r requirements.txt
python -m scripts.install_torch --gpu # or --cpu / --rocm
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Available endpoints:
- π Home β http://localhost:8000/
- β€οΈ Health β http://localhost:8000/health
- π Swagger UI β http://localhost:8000/docs
- π ReDoc β http://localhost:8000/redoc
- π§ Env Summary β http://localhost:8000/env
- π Plugins β http://localhost:8000/plugins
Quick test:
curl http://localhost:8000/health
# {"status": "ok"}
Each plugin lives in app/plugins/<name>/
and typically includes:
manifest.json
plugin.py # Defines Plugin class inheriting AIPlugin
README.md # Documentation
API Endpoints:
GET /plugins
β list all plugins with metadata.POST /plugins/{name}/{task}
β execute a task inside a plugin.
Example:
from app.plugins.base import AIPlugin
class Plugin(AIPlugin):
name = "my_plugin"
tasks = ["infer"]
def load(self):
# Load models/resources once
...
def infer(self, payload: dict) -> dict:
return {"message": "ok", "payload": payload}
A lightweight orchestration layer to chain plugins into reproducible pipelines (steps β plugin + task + payload).
All endpoints are exposed under /workflow
.
- Endpoints:
GET /workflow/ping
,GET /workflow/presets
,POST /workflow/run
- System Guide (EN): app/workflows/README.md
- Workflows Index: docs/workflows-overview.md
A full list of available workflows with their versions, tags, and step counts is maintained in the Workflows Index.
β‘οΈ View Workflows Index
A full list of available plugins with their providers, tasks, and source files is maintained in the Plugins Index.
β‘οΈ View Plugins Index
Install dev dependencies:
pip install -r requirements-dev.txt
pre-commit install
Run tests:
pytest
Ruff (lint + format check) runs automatically via pre-commit hooks.
We enforce a clean and consistent code style using Ruff (linter, import sorter, and formatter). For full details on configuration, commands, helper scripts, and CI integration, see:
β‘οΈ Code Style & Linting Guide
Download models in advance:
python -m scripts.prefetch_models
Models are cached in models_cache/
(see docs/LICENSES.md
for licenses).
- Use
uvicorn
/hypercorn
behind a reverse proxy (e.g., Nginx). - Configure environment with
APP_*
variables instead of hardcoding. - Enable HTTPS and configure CORS carefully in production.
A complete history of changes and improvements: β‘οΈ CHANGELOG
Details about the initial release v0.1.0: β‘οΈ Release Notes v0.1.0
- Add
/cuda
endpoint β return detailed CUDA info. - Add
/warmup
endpoint for GPU readiness. - Provide a plugin generator CLI.
- Implement API Key / JWT authentication.
- Example plugins: translation, summarization, image classification.
- Docker support for one-click deployment.
- Benchmark suite for model inference speed.
Contributions are welcome!
- Open Issues for bugs or ideas.
- Submit Pull Requests for improvements.
- Follow style guidelines (Ruff + pre-commit).
Licensed under the MIT License β see LICENSE.
Some AI/ML models are licensed separately β see Model Licenses.