This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using the tools like Ansible, Terraform, Kubernetes, Flux, Renovate and GitHub Actions.
There is a template over at onedr0p/flux-cluster-template if you wanted to try and follow along with some of the practices I use here.
My cluster is k3s provisioned overtop bare-metal Fedora Server using the Ansible galaxy role ansible-role-k3s. This is a semi hyper-converged cluster, workloads and block storage are sharing the same available resources on my nodes while I have a separate server for (NFS) file storage.
🔸 Click here to see my Ansible playbooks and roles.
- projectcalico/calico: Internal Kubernetes networking plugin.
- rook/rook: Distributed block storage for peristent storage.
- mozilla/sops: Manages secrets for Kubernetes, Ansible and Terraform.
- kubernetes-sigs/external-dns: Automatically manages DNS records from my cluster in a cloud DNS provider.
- jetstack/cert-manager: Creates SSL certificates for services in my Kubernetes cluster.
- kubernetes/ingress-nginx: Ingress controller to expose HTTP traffic to pods over DNS.
Flux watches my cluster folder (see Directories below) and makes the changes to my cluster based on the YAML manifests.
Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When some PRs are merged Flux applies the changes to my cluster.
This Git repository contains the following directories (kustomizatons) under cluster.
📁 cluster # k8s cluster defined as code
├─📁 flux # flux components which are loaded before everything
└─📁 apps # workloads in a categorized directory structure
Name | CIDR |
---|---|
Management VLAN | 192.168.1.0/24 |
Kubernetes Nodes VLAN | 192.168.42.0/24 |
Kubernetes external services (Calico w/ BGP) | 192.168.69.0/24 |
Kubernetes pods | 10.42.0.0/16 |
Kubernetes services | 10.43.0.0/16 |
- HAProxy configured on my
Opnsense
router for the Kubernetes Control Plane Load Balancer. - Calico configured with
externalIPs
to expose Kubernetes services with their own IP over BGP (w/ECMP) which is configured on my router.
Rook does not have built in support for backing up PVC data so I am currently using a DIY (or more specifically a "Poor Man's Backup") solution that is leveraging Kyverno, Kopia and native Kubernetes CronJob
and Job
resources.
At a high level the way this operates is that:
- Kyverno creates a
CronJob
for eachPersistentVolumeClaim
resource that contain a label ofsnapshot.home.arpa/enabled: "true"
- Everyday the
CronJob
creates aJob
and uses Kopia to connect to a Kopia repository on my NAS over NFS and then snapshots the contents of the app data mount into the Kopia repository - The snapshots made by Kopia are incremental which makes the
Job
run very quick. - The app data mount is frozen during backup to prevent writes and unfrozen when the snapshot is complete.
- Recovery is a manual process. By using a different
Job
a temporary pod is created and the fresh PVC and existing NFS mount are attached to it. The data is then copied over to the fresh PVC and the temporary pod is deleted.
🔸 Velero, Benji, Gemini, Kasten K10 by Veeam, Stash by AppsCode are some alternatives but have limitations.
Over WAN, I have port forwarded ports 80
and 443
to the load balancer IP of my ingress controller that's running in my Kubernetes cluster.
Cloudflare works as a proxy to hide my homes WAN IP and also as a firewall. When not on my home network, all the traffic coming into my ingress controller on port 80
and 443
comes from Cloudflare. In Opnsense
I block all IPs not originating from the Cloudflares list of IP ranges.
🔸 Cloudflare is also configured to GeoIP block all countries except a few I have whitelisted
coredns is deployed on my Opnsense
router and all DNS queries for my domains are forwarded to k8s_gateway that is running in my cluster. With this setup k8s_gateway
has direct access to my clusters ingresses and services and serves DNS for them in my internal network.
AdGuard Home is deployed on my Opnsense
router which has a upstream server pointing the coredns
instance I mentioned above. Adguard Home
listens on my MANAGEMENT
, SERVER
, IOT
and GUEST
networks on port 53
meanwhile coredns
only listens on 127.0.0.1:53
. In my firewall rules I have NAT port redirection forcing all the networks to use the Adguard Home
DNS server.
external-dns is deployed in my cluster and configure to sync DNS records to Cloudflare. The only ingresses external-dns
looks at to gather DNS records to put in Cloudflare
are ones that I explicitly set an annotation of external-dns.home.arpa/enabled: "true"
🔸 Click here to see how else I manage Cloudflare with Terraform.
My home IP can change at any given time and in order to keep my WAN IP address up to date on Cloudflare. I have deployed a CronJob in my cluster, this periodically checks and updates the A
record ipv4.domain.tld
.
Device | Count | OS Disk Size | Data Disk Size | Ram | Operating System | Purpose |
---|---|---|---|---|---|---|
Protectli FW6D | 1 | 500GB mSATA | - | 16GB | Opnsense 22 | Router |
Intel NUC8i3BEK | 3 | 256GB NVMe | - | 32GB | Fedora 36 | Kubernetes Masters |
Intel NUC8i5BEH | 3 | 240GB SSD | 1TB NVMe (rook-ceph) | 64GB | Fedora 36 | Kubernetes Workers |
PowerEdge T340 | 1 | 2TB SSD | 8x12TB ZFS (mirrored vdevs) | 64GB | Ubuntu 22.04 | NFS + Backup Server |
Lenovo SA120 | 1 | - | 6x12TB (+2 hot spares) | - | - | DAS |
Raspberry Pi | 1 | 32GB (SD) | - | 4GB | PiKVM | Network KVM |
TESmart 8 Port KVM Switch | 1 | - | - | - | - | Network KVM (PiKVM) |
APC SMT1500RM2U w/ NIC | 1 | - | - | - | - | UPS |
Unifi USP PDU Pro | 1 | - | - | - | - | PDU |
Thanks to all the people who donate their time to the Kubernetes @Home community. A lot of inspiration for my cluster comes from the people that have shared their clusters with the k8s-at-home GitHub topic.
See commit history
See LICENSE