We have two clusters: integration and production. The end goal is to leverage Flux and Kustomize to manage both clusters while minimizing duplicated declarations.
Flux is configured to install, test and upgrade Rucio using
HelmRepository
and HelmRelease
custom resources.
Flux monitors the Helm repository and this Git repository, and it will automatically
upgrade the Helm releases to their latest chart version based on semver ranges.
You will need a Kubernetes cluster version 1.22 or newer and kubectl version 1.18.
NGINX ingress controller MUST be configured to allow ssl-passthrough. To check that on a cern instance, you can take a look at the daemonset on kube-system namespace called
cern-magnum-ingress-nginx-controller
and check the presence of--enable-ssl-passthrough
flag. This can be rectified withkubectl edit ds cern-magnum-ingress-nginx-controller
.
CERN kubernetes cluster templates may include a prometheus node exporter that conflicts with the one provided here. You can remove it by running
kubectl -n kube-system delete service cern-magnum-prometheus-node-exporter
followed bykubectl -n kube-system delete daemonset cern-magnum-prometheus-node-exporter
. Better is to request the cluster without monitoring enabled (it's a flag).
For a quick local test, you can use Kubernetes kind. Any other Kubernetes setup will work as well though.
In order to follow the guide you'll need a GitHub account and a
personal access token
that can create repositories (check all permissions under repo
).
Or install the CLI by downloading precompiled binaries using a Bash script:
curl -s https://fluxcd.io/install.sh | sudo bash
(OPTIONAL) if OIDC authentication is enabled on the rucio-server configuration, you'll have to follow this preparatory steps.
The Git repository contains the following top directories:
- apps dir contains Helm releases with a custom configuration per cluster
- infrastructure dir contains common infra tools such as NGINX ingress controller and Helm repository definitions
- clusters dir contains the Flux configuration per cluster
├── apps
│ ├── base
│ ├── integration
│ ├── options
│ └── production
├── clusters
│ ├── integration
│ └── production
├── infrastructure
│ ├── base
│ │ ├── fluentbit
│ │ ├── prometheus
│ │ ├── etc..
│ ├── integration
│ └── production
The apps configuration is structured as follows:
- apps/base/ dir contains namespaces, Helm release definitions, and the helm config files applicable to all CMS Rucio clusters. The helm files are converted into Kubernetes ConfigMaps by
kustomizeconfig.yaml
in each directory. - apps/production/ dir contains the production Helm release values all grouped in a single directory.
kustomization.yaml
shows which components are running for the production server and generates ConfigMaps from the relevant YAML files. - apps/integration/ dir contains the integration values similarly grouped
- apps/options/ dir contains namespaces and Helm release definitions for optional components which may not run in every server
- infrastructure/base/ contains the common defintions of helm repositories and the releases for 3rd party products we install
- infrastructure/production(integration)/ dir contains the configuration changes specific to a cluster for the products in integration
Changes are applied in a cascading way which you can see from apps/base/PRODUCT/PRODUCT-helm where settings from later in the valuesFrom list take precedence over those from earlier in the list.
Note that with path: ./apps/production
we configure Flux
with dependsOn
to tell Flux to create the infrastructure items before deploying the apps.
To install this in a kubernetes cluster, fork this repository on your personal GitHub account and export your GitHub access token, username and repo name:
export GITHUB_TOKEN=<your-token>
export GITHUB_USER=<your-username>
export GITHUB_REPO=<repository-name>
The Rucio setup relies on a number of secrets being created before flux is bootstrapped. Run the create_flux_secrets.sh
script.
This relies on three pieces of information not supplied by any repository:
$HOSTP12
: The certificate for a node in the Rucio cluster which also has entries for the node aliases likecms-rucio.cern.ch
$ROBOTP12
: The Robot certificate used for all FTS/gfal operations. This also gets used to authenticate asroot
to Rucio.${INSTANCE}-secrets.yaml
(not a YAML file): A file providing the true secrets of the Rucio install (database connection strings, passwords and tokens for various services)
The format of this file is
# This is an ENV secret file
db_string="oracle://..."
kronos_password="..." # Used to connect to the message broker
trace_password="..." # Used to connect to the message broker
monit_token="..." # Used to connect to FacOps MONIT pages for site status
gitlab_token="..." # Token for SITECONF gitlab repositroy
globus_client="..." # Not currently used
globus_refresh="..." # Not currently used
geoip_licence_key="..." # Used to connect to the MaxMind GeoIP database
You will need to get these files or values from someone who has them for the server you are looking to setup.
Verify that your staging cluster satisfies the flux prerequisites with:
flux check --pre
Set the kubectl context to your staging cluster and bootstrap Flux:
flux bootstrap github \
--token-auth # New as of Flux 2.4 \
--owner=${GITHUB_USER} \
--repository=${GITHUB_REPO} \
--branch=main \
--personal \
--path=clusters/integration # or production
The actual clusters are done WITHOUT the --personal
flag, GITHUB_USER=dmwm
, and a GitHub personal access token which has commit rights to the dmwm/rucio-flux repository.
The bootstrap command commits the manifests for the Flux components in clusters/staging/flux-system
dir
and creates a deploy key with read-only access on GitHub, so it can pull changes inside the cluster.
Watch for the Helm releases being install on staging:
$ watch flux get helmreleases --all-namespaces
NAMESPACE NAME REVISION SUSPENDED READY MESSAGE
nginx nginx 5.6.14 False True release reconciliation succeeded
podinfo podinfo 5.0.3 False True release reconciliation succeeded
redis redis 11.3.4 False True release reconciliation succeeded
Watch the production reconciliation:
$ watch flux get kustomizations
NAME REVISION READY
apps main/797cd90cc8e81feb30cfe471a5186b86daf2758d True
flux-system main/797cd90cc8e81feb30cfe471a5186b86daf2758d True
infrastructure main/797cd90cc8e81feb30cfe471a5186b86daf2758d True
Or get an overview of everything flux has control over with
$ flux get all -A
...
Once you have verified changes working in your own cluster, make a PR against dmwm/rucio-flux
to have the changes deployed in production (or the integration server).
If you want to test out a new development without accepting a PR (maybe you aren't sure it will work). Of course, this is only appropriate on a development server, not in production:
- Checkout your branch in git
- Update clusters/CLUSTERNAME/flux-system/gotk-sync.yaml to set the value of
branch
toMY_TEST_BRANCH
and commit and push it upstream - At the shell with KUBECONFIG set to your cluster:
flux suspend source git flux-system
kubectl edit GitRepository flux-system -n flux-system
and change the value ofbranch
toMY_TEST_BRANCH
. Exit the editor.flux resume source git flux-system
Once testing is complete, repeat the above process but setting the branch back to its original value.
ROBOTP12=<PATH TO FTS P12 HERE> UPDATE_FTS_CERTS=1 ./scripts/create_flux_secrets.sh
HOSTP12=<PATH TO HOST P12 HERE> UPDATE_HOST_CERTS=1 ./scripts/create_flux_secrets.sh