Skip to content

Private infrastructure for cloud natives

License

Notifications You must be signed in to change notification settings

QC-Labs/orange-lab

Repository files navigation

OrangeLab

Private infrastructure for cloud natives.

OrangeLab logo

Core components

Principles and goals

  • decentralized - uses your physical machines potentially spread out over geographical locations, minimise dependency on external services and cloud providers
  • private by default - uses Tailscale/WireGuard for end to end encrypted communication, making services public has to be explicitly defined
  • OSS - prefer open source components that can be run locally
  • automation - use Pulumi and Helm to automate most tasks and configuration
  • easy to use - no deep Kubernetes knowledge required, sensible defaults
  • offline mode - continue working (with some limitations) over local network when internet connection lost
  • lightweight - can be run on a single laptop using default configuration
  • scalable - distribute workloads across multiple machines as they become available, optional use of cloud instances for autoscaling
  • self-healing - in case of problems, the system should recover with no user intervention
  • immutable - no snowflakes, as long as there is at least one Longhorn replica available, components can be destroyed and easily recreated

Applications

System module (required):

  • longhorn - replicated storage
  • nvidia-gpu-operator - NVidia GPU support
  • tailscale-operator - ingress support with Tailscale authentication

Monitoring module:

  • beszel - Beszel lightweight monitoring
  • prometheus - Prometheus/Grafana monitoring

IoT module:

  • home-assistant - sensor and home automation platform

AI module:

  • automatic1111 - Automatic1111 Stable Diffusion WebUI
  • kubeai - Ollama and vLLM models over OpenAI-compatible API
  • invokeai - generative AI plaform, community edition
  • ollama - local large language models
  • open-webui - Open WebUI frontend
  • sdnext - SD.Next Stable Diffusion WebUI

Platforms and limitations

Installation instructions assume your machines are running Bluefin (Developer edition, https://projectbluefin.io/) based on Fedora Silverblue unless otherwise noted. It should run on any modern Linux distribution with Linux kernel 6.11.6+, even including Raspberry Pi.

Windows and MacOS support is limited. K3s requires Linux to run workloads using containerd directly, however you could have some luck running https://k3d.io/ which uses Docker wrapper to run some containers as long as they do not use persistent storage. Not a tested configuration but feedback welcome. The issue is Longhorn, which only runs on Linux. More info at https://github.com/k3d-io/k3d/blob/main/docs/faq/faq.md#longhorn-in-k3d

Steps to disable Longhorn and switch to local-path-provisioner at install-system.md

Currently only NVidia GPUs are supported.

Installation

Initial cluster setup

The first time you configure the cluster, it's best to run pulumi up after each component. Make sure all pods are running fine before moving to next step.

Click on the links for detailed instructions:

  1. configure Pulumi and Tailscale on management node docs/install.md
  2. (optional) configure SSH on nodes for easier access docs/install-ssh.mc
  3. install K3s and label nodes docs/install-k3s.md
  4. deploy system components components/system/SYSTEM.md

Deploying applications

After system components have been deployed, you can add any of the optional #Applications.

All available settings can be found in Pulumi.yaml. Override defaults with pulumi config or by directly modifying Pulumi.<stack>.yaml.

# enable app
pulumi config set <app>:enabled true

# configure app-specific settings from Pulumi.yaml if needed
pulumi config set ollama:hostname ollama-api
pulumi config set ollama:storageSize 100Gi

# deploy
pulumi up
# or
pulumi up -r # --refresh Pulumi state if out of sync

# Make request to provision HTTP certificate and activate endpoint
curl https://<app>.<tsnet>.ts.net/

Enable/disable applications

To remove an application, set the enabled flag to false. This will remove all resources associated with the app.

To keep storage around (for example downloaded ollama models) but remove all other resources, use storageOnly:

# Remove application including storage
pulumi config set <app>:enabled false
pulumi up

# Remove application resources but keep related storage
pulumi config set <app>:enabled true
pulumi config set <app>:storageOnly true
pulumi up

Documentation

Troubleshooting - docs/troubleshooting.md