Skip to content

My home infrastructure written as code, aligned to GitOps practices

License

Notifications You must be signed in to change notification settings

varuntirumala1/home-ops

Repository files navigation

🏠 Home kubernetes cluster using talos linux backed by flux CD

The base cluster configs are derived from onedr0p cluster template and then customized with my modifications and additons to work with my home infrastructure and needs.

✨ Features

This cluster is based on talos linux with the configurations managed through git and applied through flux CD. This cluster includes external-dns that is used to update external and internal DNS and ingress-nginx for SSL with Cloudflare. Cloudflare Tunnel is also included to provide external access to certain applications deployed in your cluster. Postgres database is deployed using cloudnative-pg and Redis compatible database (dragonflydb) is deployed using dragonfly-operator.

Other features include:

Renovate is a tool that automates dependency management. It is designed to scan your repository around the clock and open PRs for out-of-date dependencies it finds. Common dependencies it can discover are Helm charts, container images, GitHub Actions, Ansible roles... even Flux itself!

Merging a PR will cause Flux to apply the update to your cluster.

The base Renovate configuration in your repository can be viewed at .github/renovate.json5.

GitHub Actions with helpful workflows.

System requirements

Note: All nodes are able to run workloads, including the controller nodes. No workers are deployed in my cluster at the moment. All nodes are deployed on vSphere 8.0.

Talos

  1. I started with talos linux vmware deploy script and customized it to deploy the VMs required using the configuration files generated in steps below. GOVC will need to installed on the system prior to this step. I chose to deploy 5 control plane VMs with 8c/32G/500G(System)/300G(Ceph/Rook).

  2. Continue on to πŸš€ Getting Started

πŸš€ Getting Started (Notes for future me)

🌱 Stage 1: Setup your local workstation

Dev Container is used to run the environment that has all the necessary tools.

  • devcontainer requires Docker and VSCode installed.
  1. Start Docker and open the repository in VSCode. There will be a pop-up asking you to use the devcontainer, click the button to start using it.

β›΅ Stage 2: Install Kubernetes

Talos

  1. Create talos secrets

    task talos:bootstrap-gensecret
    task talos:bootstrap-genconfig
  2. Deploy talos VM with the configs generated from step 1.

    ./vmware.sh upload_ova
    ./vmware.sh create
  3. Boostrap talos and get kubeconfig

    task talos:bootstrap-install
    task talos:fetch-kubeconfig
  4. Install cilium and kubelet-csr-approver into the cluster

    task talos:bootstrap-apps
  5. Apply GPU patch to the GPU node/s.

    cd kubernetes/talos/clusterconfig
    talosctl -n <node-ip> patch mc --patch @gpu-patch.yaml
  6. Upgrade talos to the correct schematic generated from talos since the OVA doesn't include any required extensions for this repo; GPU node has a different schematic ID compared to regular node due to added modules/extensions siderolabs/nonfree-kmod-nvidia and siderolabs/nvidia-container-toolkit.

    talosctl -n <node-ip> upgrade --image factory.talos.dev/installer/<schematic-id>:<talos-ver>

🩹Note: I applied the upgrade to GPU node/s with GPU specific schematic ID and also to regular nodes with regular schematic ID with just intel-ucode and iscsi extension.

  1. Verify NVIDIA kernel modules and extensions are loaded

    talosctl -n <node-ip> read /proc/modules
    
    #nvidia_uvm 1146880 - - Live 0xffffffffc2733000 (PO)
    #nvidia_drm 69632 - - Live 0xffffffffc2721000 (PO)
    #nvidia_modeset 1142784 - - Live 0xffffffffc25ea000 (PO)
    #nvidia 39047168 - - Live 0xffffffffc00ac000 (PO)
    talosctl -n <node-ip> get extensions
    
    #NODE NAMESPACE TYPE ID VERSION NAME VERSION
    
    #172.31.41.27 runtime ExtensionStatus 000.ghcr.io-frezbo-nvidia-container-toolkit-510.60.02-v1.9.0 nvidia-container-toolkit 510.02-v1
    talosctl -n <node-ip> read /proc/driver/nvidia/version
    
    #NVRM version: NVIDIA UNIX x86_64 Kernel Module  510.60.02  Wed Mar 16 11:24:05 UTC 2022
    #GCC version:  gcc version 11.2.0 (GCC)
  2. Create nvidia runtime class

    kubectl apply -f nvidia-runtime.yaml

πŸ›°οΈ Stage 3: Install vmware tools

Talos vmtools daemonset

  1. Create secret for vmtools daemonset

    # create new talos API credentials
    talosctl -n <cp-node-ip> config new vmtoolsd-secret.yaml --roles os:admin
    
    # import API credentials into K8s
    kubectl -n kube-system create secret generic talos-vmtoolsd-config --from-file=talosconfig=./vmtoolsd-secret.yaml
    
    # delete temporary credentials file
    rm vmtoolsd-secret.yaml
  2. Install vmtools daemonset from manifest

    kubectl apply -f https://raw.githubusercontent.com/siderolabs/talos-vmtoolsd/master/deploy/latest.yaml

πŸ—οΈ Stage 4: Install flux in the cluster

  1. Verify flux can be installed

    flux check --pre
    # β–Ί checking prerequisites
    # βœ” kubectl 1.27.3 >=1.18.0-0
    # βœ” Kubernetes 1.27.3+k3s1 >=1.16.0-0
    # βœ” prerequisites checks passed
  2. Install flux and sync the cluster to the Git repository

    task flux:bootstrap
    # namespace/flux-system configured
    # customresourcedefinition.apiextensions.k8s.io/alerts.notification.toolkit.fluxcd.io created
    # ...
  3. Verify flux components are running in the cluster

    kubectl -n flux-system get pods -o wide
    # NAME                                       READY   STATUS    RESTARTS   AGE
    # helm-controller-5bbd94c75-89sb4            1/1     Running   0          1h
    # kustomize-controller-7b67b6b77d-nqc67      1/1     Running   0          1h
    # notification-controller-7c46575844-k4bvr   1/1     Running   0          1h
    # source-controller-7d6875bcb4-zqw9f         1/1     Running   0          1h

πŸ“£ Flux w/ Cloudflare post installation

🌐 DNS

The external-dns application created in the network namespace will handle creating public DNS records and Private DNS records.

πŸ“œ Certificates

Cert-Manager is configured with Cloudflare for DNS validation and the ACME certs are issued by Google Public CA (GTS - Google Trust Services) that is attached to a gCloud project. LetsEnrypt is also defined as a cluster issuer just as a backup.

πŸͺ Github Webhook

By default flux will periodically check your git repository for changes. In order to have Flux reconcile on git push Github should be configured to send push events to Flux.

  1. Obtain the webhook path

    πŸ“ Hook id and path should look like /hook/12ebd1e363c641dc3c2e430ecf3cee2b3c7a5ac9e1234506f6f5f3ce1230e123

    kubectl -n flux-system get receiver github-receiver -o jsonpath='{.status.webhookPath}'
  2. Piece together the full URL with the webhook path appended

    https://flux-webhook.${bootstrap_cloudflare_domain}/hook/12ebd1e363c641dc3c2e430ecf3cee2b3c7a5ac9e1234506f6f5f3ce1230e123
    
  3. Navigate to the settings of your repository on Github, under "Settings/Webhooks" press the "Add webhook" button. Fill in the webhook url and bootstrap_flux_github_webhook_token secret and save.

πŸ’₯ Nuke

There might be a situation which necessiates starting from scratch. This will completely destroy cluster and the VMs. This cluster's databases and volumes are synchronized to s3 based repos and can be bootstrapped from those backups to restore state.

# Nuke cluster
./vmware.sh destroy

πŸ™Œ Related Projects

Inspiration for my repo came from these repos below and the opensource community.

  • onedr0p/home-ops - This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Ansible, Terraform, Kubernetes, Flux, Renovate, and GitHub Actions.
  • bjw-s/home-ops - πŸ‘‹ Welcome to my Home Operations repository. This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using the tools like Ansible, Terraform, Kubernetes, Flux, Renovate and GitHub Actions.