Skip to content
/ repo-fastapi Public template
generated from TamerOnLine/repo-fastapi

GPU-ready FastAPI AI inference server with plugin system, supporting CUDA, ROCm, CPU, and macOS MPS.

License

Notifications You must be signed in to change notification settings

liebemama/repo-fastapi

Repository files navigation

πŸš€ NeuroServe β€” GPU-Ready FastAPI AI Server

πŸ“Š Project Status

Category Badges
Languages Python HTML5 CSS3
Framework FastAPI
ML / GPU PyTorch CUDA Ready
CI Ubuntu CI Windows CI Windows GPU CI macOS CI
Code Style Ruff
Tests Tests
Docs Docs
OS Ubuntu Windows macOS
Version GitHub release
License License
Support Sponsor
GitHub Stars Forks

πŸ“– Overview

NeuroServe is an AI Inference Server built on FastAPI, designed to run seamlessly on GPU (CUDA/ROCm), CPU, and macOS MPS. It provides ready-to-use REST APIs, a modular plugin system, runtime utilities, and a consistent unified response format β€” making it the perfect foundation for AI-powered services.


Quick Setup

πŸ”§ Virtualenv quick guide: see docs/README_venv.md.


πŸ“š API Documentation

Detailed API reference and usage examples are available here: ➑️ API Documentation


✨ Key Features

  • 🌐 REST APIs out-of-the-box with Swagger UI (/docs) & ReDoc (/redoc).
  • ⚑ PyTorch integration with automatic device selection (cuda, cpu, mps, rocm).
  • πŸ”Œ Plugin system to extend functionality with custom AI models or services.
  • πŸ“Š Runtime tools for GPU info, warm-up routines, and environment inspection.
  • 🧠 Built-in utilities like a toy model and model size calculator.
  • 🧱 Unified JSON responses for predictable API behavior.
  • πŸ§ͺ Cross-platform CI/CD (Ubuntu, Windows, macOS, Self-hosted GPU).

πŸ“‚ Project Structure

repo-fastapi/
β”œβ”€ app/                             # application package
β”‚  β”œβ”€ core/                         # settings & configuration
β”‚  β”‚  └─ config.py                  # app settings (Pydantic v2)
β”‚  β”œβ”€ routes/                       # HTTP API routes
β”‚  β”œβ”€ plugins/                      # extensions / integrations
β”‚  β”œβ”€ workflows/                    # workflow definitions & orchestrators
β”‚  └─ templates/                    # Jinja templates (if used)
β”œβ”€ docs/                            # documentation & generated diagrams
β”‚  β”œβ”€ ARCHITECTURE.md               # main architecture report
β”‚  β”œβ”€ architecture.mmd              # Mermaid source (no fences)
β”‚  β”œβ”€ architecture.html             # browser-friendly diagram
β”‚  β”œβ”€ architecture.png              # exported PNG (if mmdc installed)
β”‚  β”œβ”€ runtime.mmd                   # runtime/infra diagram
β”‚  β”œβ”€ imports.mmd                   # Python import graph (if generated)
β”‚  β”œβ”€ endpoints.md                  # discovered API endpoints (if generated)
β”‚  └─ README_venv.md                # virtualenv quick guide
β”œβ”€ tools/                           # project tooling & scripts
β”‚  └─ build_workflows_index.py      # builds docs/workflows-overview.md
β”œβ”€ tests/                           # test suite
β”‚  └─ test_run.py                   # smoke test for app startup
β”œβ”€ gen_arch.py                      # architecture generator script
β”œβ”€ requirements.txt                 # runtime dependencies
β”œβ”€ requirements-dev.txt             # dev dependencies (ruff, pre-commit, pytest, ...)
β”œβ”€ .pre-commit-config.yaml          # pre-commit hooks configuration
β”œβ”€ README.md                        # project overview & usage
└─ LICENSE                          # project license


πŸ—οΈ Architecture

For a deeper look into the internal design, modules, and flow of the system, see: ➑️ Architecture Guide


βš™οΈ Installation

1. Clone the repository

git clone https://github.com/USERNAME/gpu-server.git
cd gpu-server

2. Create a virtual environment

python -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows
.venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. (Optional) Auto-install PyTorch

python -m scripts.install_torch --gpu    # or --cpu / --rocm

πŸš€ Running the Server

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Available endpoints:

Quick test:

curl http://localhost:8000/health
# {"status": "ok"}

πŸ”Œ Plugin System

Each plugin lives in app/plugins/<name>/ and typically includes:

manifest.json
plugin.py        # Defines Plugin class inheriting AIPlugin
README.md        # Documentation

API Endpoints:

  • GET /plugins β€” list all plugins with metadata.
  • POST /plugins/{name}/{task} β€” execute a task inside a plugin.

Example:

from app.plugins.base import AIPlugin

class Plugin(AIPlugin):
    name = "my_plugin"
    tasks = ["infer"]

    def load(self):
        # Load models/resources once
        ...

    def infer(self, payload: dict) -> dict:
        return {"message": "ok", "payload": payload}

Workflow System

A lightweight orchestration layer to chain plugins into reproducible pipelines (steps β†’ plugin + task + payload). All endpoints are exposed under /workflow.


πŸ”„ Available Workflows

A full list of available workflows with their versions, tags, and step counts is maintained in the Workflows Index.

➑️ View Workflows Index


🧩 Available Plugins

A full list of available plugins with their providers, tasks, and source files is maintained in the Plugins Index.

➑️ View Plugins Index


πŸ§ͺ Development

Install dev dependencies:

pip install -r requirements-dev.txt
pre-commit install

Run tests:

pytest

Ruff (lint + format check) runs automatically via pre-commit hooks.


🧹 Code Style

We enforce a clean and consistent code style using Ruff (linter, import sorter, and formatter). For full details on configuration, commands, helper scripts, and CI integration, see:

➑️ Code Style & Linting Guide


πŸ“¦ Model Management

Download models in advance:

python -m scripts.prefetch_models

Models are cached in models_cache/ (see docs/LICENSES.md for licenses).


🏭 Deployment Notes

  • Use uvicorn/hypercorn behind a reverse proxy (e.g., Nginx).
  • Configure environment with APP_* variables instead of hardcoding.
  • Enable HTTPS and configure CORS carefully in production.

πŸ“ Changelog

A complete history of changes and improvements: ➑️ CHANGELOG

πŸ“¦ Release Notes

Details about the initial release v0.1.0: ➑️ Release Notes v0.1.0


πŸ—ΊοΈ Roadmap

  • Add /cuda endpoint β†’ return detailed CUDA info.
  • Add /warmup endpoint for GPU readiness.
  • Provide a plugin generator CLI.
  • Implement API Key / JWT authentication.
  • Example plugins: translation, summarization, image classification.
  • Docker support for one-click deployment.
  • Benchmark suite for model inference speed.

🀝 Contributing

Contributions are welcome!

  • Open Issues for bugs or ideas.
  • Submit Pull Requests for improvements.
  • Follow style guidelines (Ruff + pre-commit).

πŸ“œ License

Licensed under the MIT License β€” see LICENSE.

πŸ“œ Model Licenses

Some AI/ML models are licensed separately β€” see Model Licenses.


About

GPU-ready FastAPI AI inference server with plugin system, supporting CUDA, ROCm, CPU, and macOS MPS.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Languages