Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(deploying-at-scale): enrich info about multi-node setup #214

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 151 additions & 46 deletions docs/administration/deployment/deploying-at-scale/index.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,90 @@
# Deploying at scale

REANA can be easily deployed on large Kubernetes clusters using Helm. Useful for production instances.
REANA can be easily deployed on large Kubernetes clusters consisting of many
nodes. This is useful for production instances with many users and many
concurrent jobs.

## Pre-requisites

- A Kubernetes cluster with version between v1.19 and v1.25 (included)
- Helm v3
- A shared file system to host all analyses' workspaces when running in a multinode deployment setup. See [Configuring storage volumes](../../configuration/configuring-storage-volumes).
- A Kubernetes cluster with version greater than v1.21;
- Helm v3;
- A shared POSIX file system volume (such as CephFS, NFS) to host the REANA
infrastructure volumes and the user runtime workspaces. The shared file system
is necessary for any multi-node deployment conditions. See [Configuring storage
volumes](../../configuration/configuring-storage-volumes).

!!! note
If you do not have any particular distributed file system in your Kubernetes cluster, you can easily [deploy an NFS network file system following our documentation](../../configuration/configuring-storage-volumes#nfs).
## Multi-node setup

For a scalable multi-user deployment of REANA, it is essential to use a
Kubernetes cluster consisting of several nodes.

We shall be separating various REANA services into various dedicated nodes in
order to ensure that the user runtime workloads would not interfere with the
REANA infrastructure services that are critical for the platform to operate.

We recommend to start with at least six worker nodes:

```console
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-0 Ready master 97m v1.18.2
node-0 Ready <none> 97m v1.18.2
node-1 Ready <none> 97m v1.18.2
node-2 Ready <none> 97m v1.18.2
node-3 Ready <none> 97m v1.18.2
node-4 Ready <none> 97m v1.18.2
node-5 Ready <none> 97m v1.18.2
```

The worker node roles are ensured by means of labelling the nodes:

- 1 node labelled `reana.io/system=infrastructure` that will run the REANA
infrastructure services such as the web interface application, the REST API
server, and the workflow orchestration controller;

- 1 node labelled `reana.io/system=infrastructuredb` that will run the
PostgreSQL database service (unless you have some already-existing database
service running outside of the cluster that could be reused without hosting the
database yourself; this would be even more preferable);

- 1 node labelled `reana.io/system=infrastructuremq` that will run the RabbitMQ
messaging service;

- 1 node labelled `reana.io/system=runtimebatch` that will run the user runtime
batch workflow orchestration pods (such as CWL, Snakemake or Yadage processes);

- 1 node labelled `reana.io/system=runtimejobs` that will run the user runtime
job workload pods (generated by the above workflow batch orchestration pods);

- 1 node labelled `reana.io/system=runtimesessions` that will run the user
interactive notebook sessions.

For example, you would label the above cluster nodes as follows:

```bash
kubectl label node node-0 reana.io/system=infrastructure
kubectl label node node-1 reana.io/system=infrastructuredb
kubectl label node node-2 reana.io/system=infrastructuremq
kubectl label node node-3 reana.io/system=runtimebatch
kubectl label node node-4 reana.io/system=runtimejobs
kubectl label node node-5 reana.io/system=runtimesessions
```

## Deploy
You would then configure your REANA deployment by means of the Helm
`myvalues.yaml` file as follows:

**1.** Add REANA chart repository:
```yaml
node_label_infrastructure: reana.io/system=infrastructure
node_label_infrastructuredb: reana.io/system=infrastructuredb
node_label_infrastructuremq: reana.io/system=infrastructuremq
node_label_runtimebatch: reana.io/system=runtimebatch
node_label_runtimejobs: reana.io/system=runtimejobs
node_label_runtimesessions: reana.io/system=runtimesessions
```

## Deployment

You would deploy REANA us usual. Start by adding the REANA chart repository:

```console
$ helm repo add reanahub https://reanahub.github.io/reana
Expand All @@ -26,10 +97,13 @@ Hang tight while we grab the latest from your chart repositories...
Update Complete. ⎈ Happy Helming!⎈
```

**2.** Deploy REANA (note that you can pass any of the [supported values](https://github.com/reanahub/reana/blob/master/helm/reana/README.md)):
Continue with deploying REANA using your `myvalues.yaml` Helm values file: (see
the list of [supported Helm
values](https://github.com/reanahub/reana/blob/master/helm/reana/README.md))

```console
$ helm install --devel reana reanahub/reana --wait
$ vim myvalues.yaml # customise your desired Helm values
$ helm install reana reanahub/reana -f myvalues.yaml --wait
NAME: reana
LAST DEPLOYED: Wed Mar 18 10:27:06 2020
NAMESPACE: default
Expand All @@ -42,42 +116,73 @@ Thanks for flying REANA 🚀

!!! warning

Note that the above `helm install` command used `reana` as the Helm release name. You can choose any other name provided that it is less than 13 characters long. (This is due to current limitation on the length of generated pod names.)
Note that the above `helm install` command used `reana` as the Helm release
name. You can choose any other name provided that it is less than 13 characters
long. (This is due to current limitation on the length of generated pod names.)

!!! note
Note that you can deploy REANA in different namespaces by passing `--namespace` to `helm install`. Remember to pass `--create-namespace` if the namespace you want to use does not exist yet. For more information on how to work with namespaces see the [official documentation](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).

## Advanced deployment scenarios

### High availability

REANA infrastructure services are critical for the platform to properly work, therefore it is a good technique to deploy them in dedicated nodes different from ones used to run user workflows and user jobs. To achieve this:

**1.** Create a multi-node Kubernetes cluster and check your nodes:

```console
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready master 97m v1.18.2
node2 Ready <none> 97m v1.18.2
node3 Ready <none> 97m v1.18.2
node4 Ready <none> 97m v1.18.2
```

**2.** Label your nodes according to the responsibility they should take; `reana.io/system: infrastructure` for infrastructure nodes, `reana.io/system: runtimebatch` for runtime batch workflow nodes and `reana.io/system: runtimejobs` for runtime job nodes (additionally you can use `reana.io/system: runtimesessions` to split interactive sessions too). For example:

```console
$ kubectl label nodes node2 reana.io/system=infrastructure
$ kubectl label nodes node3 reana.io/system=runtimebatch
$ kubectl label nodes node4 reana.io/system=runtimejobs
```

**3.** Configure REANA's `values.yaml` to specify the labels for runtime and infrastructure nodes:

```diff
+node_label_infrastructure: reana.io/system=infrastructure
+node_label_runtimebatch: reana.io/system=runtimebatch
+node_label_runtimejobs: reana.io/system=runtimejobs
```

**4.** Deploy REANA.
Note that you can deploy REANA in different namespaces by passing `--namespace`
to `helm install`. Remember to pass `--create-namespace` if the namespace you
want to use does not exist yet. For more information on how to work with
namespaces, please see the [Kubernetes namespace
documentation](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).

## Scaling up

With the above multi-node deployment scenario, it is easy to scale the cluster
up for running heavier workloads or for welcoming more concurrent users, should
the service evolve it that direction. You would keep the 3 infrastructure nodes
and scale the 2 runtime nodes (1 batch, 1 jobs) as your needs grow.

For example, you could add to the cluster 50 new nodes, 10 for batch and 40 for
jobs, and label these new nodes with `reana.io/system=runtimebatch` and
`reana.io/system=runtimejobs` labels, and REANA would automatically recognise
and use the new nodes for executing user workloads without any further change.

Ditto, if you see that users are preferring to run numerous Jupyter notebook
sessions, you could add new nodes labelled `reana.io/system=runtimesessions`,
and REANA would automatically use them to run Jupyter notebooks for users.

A typical production deployment could therefore look like:

- 1 infrastructure app node (labelled `reana.io/system=infrastructure`)
- 1 infrastructure DB node (labelled `reana.io/system=infrastructuredb`)
- 1 infrastructure RabbitMQ node (labelled `reana.io/system=infrastructuremq`)
- 5 runtime interactive session nodes (labelled `reana.io/system=runtimesessions`)
- 10 runtime batch nodes (labelled `reana.io/system=runtimebatch`)
- 40 runtime job nodes (labelled `reana.io/system=runtimejobs`)

Here, the first three infrastructure role nodes should be kept stable, whilst
the last three runtime role nodes can be added and removed at will, based on
increasing or decreasing user workload.

We have been operating REANA deployments on clusters of the above setup
consisting typically of 50-100 nodes and 500-1000 cores, with occasional tests
using up to 5000 cores.

## Designing cluster node roles

The optimal number of how many cluster nodes you should reserve for runtime
batch workflows, for runtime job workloads, or for runtime notebook sessions
depends on your users and their typical research workflows that the cluster is
running.

For example, assuming a cluster node of `m2.large` flavour, i.e. about 8 CPU
cores and 16 GB memory per node, one such runtime job node can comfortably hold
8 concurrent user jobs at the full speed (since 1 node has 8 CPU cores). (The
batch jobs do not require full CPU, since the workflow orchestration processes
do not consume a lot of CPUs; they mostly launch user jobs and then wait for
their execution.) Hence, 1 such runtime job node could run comfortably 8 user
jobs, should the memory suffice. (If the workflows are not CPU-bound but
memory-bound, then using higher RAM node flavours could be would be necessary.)

Another important consideration is the typical parallelism of the user
workflows. For example, if the nature of the physics workflows that are run the
most on the system is such that 1 workflow typically generates 4 very lengthy
parallel n-tupling jobs that are running for hours, followed by relatively
quicker statistical analysis jobs after them, then the overall job throughput
would be most likely determined by the former n-tupling jobs, and we may expect
1 runtime job node to serve up to 2 workflows only. Hence, if we would like to
run 80 such workflows concurrently, then we would need to have about 40 runtime
job nodes in order to run the user workloads at optimal sustainable full speed.