Skip to content
This repository has been archived by the owner on Sep 19, 2023. It is now read-only.

Commit

Permalink
Add extensive Ansible-based deployment tooling
Browse files Browse the repository at this point in the history
  • Loading branch information
matejpavlovic committed Nov 18, 2022
1 parent 601eb1d commit 37ca556
Show file tree
Hide file tree
Showing 25 changed files with 594 additions and 28 deletions.
186 changes: 186 additions & 0 deletions deployment/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# Deploying Spacenet

We use the [Ansible](https://www.ansible.com/) tool to deploy Spacenet nodes (and whole networks).
The set of machines on which to deploy Spacenet must be defined in an Ansible inventory file
(we use `hosts` as an example inventory file name in this document, but any other file name is also allowed).
The inventory file must contain 2 host groups: `bootstrap` (with 1 host) and `validators` (with all validator hosts).
An example host file looks as follows.
```
[bootstrap]
198.51.100.0
[validators]
198.51.100.1
198.51.100.2
198.51.100.3
198.51.100.4
```

The Spacenet deployment can be managed using the provided Ansible playbooks.
To run a playbook, install Ansible and execute the following command
```shell
ansible-playbook -i hosts <playbook.yaml> ...
```
with `hosts` being the Ansible inventory and `<playbook.yaml>` one of the provided playbooks.
Additional playbooks can be specified in the same command and will be executed in the given sequence.
A reference of the provided deployment playbooks is provided at the end of this document.

Running the command above applies the playbooks to their default targets,
assuming all nodes in the inventory are part of Spacenet.
To target specific nodes from the inventory, the `nodes` Ansible variable can be used
through specifying an additional parameter `--extra-vars "nodes='<alternative_targets>'"`.
For example the following commands, respectively,
only set up the bootstrap node and only kill the validators `198.51.100.3` and `198.51.100.4`.
```shell
ansible-playbook -i hosts setup.yaml --extra-vars "nodes=bootstrap"
ansible-playbook -i hosts kill.yaml --extra-vars "nodes='198.51.100.3 198.51.100.4'"
```

## System requirements and configuration

- Ansible installed on the local machine.
- Ubuntu 22.04 on all remote machines (might easily work with other systems, but was tested with this one).
- Sudo access without passowrd on remote machines.
- SSH access to remote machines without password

The file [group_vars/all.yaml](group_vars/all.yaml) contains some configuration parameters
(e.g. the location of the SSH key to use for accessing remote machines) documented therein.

### Potential issue on Ubuntu 22.04

While testing the deployment on Amazon EC2 virtual machines,
we noticed that installing dependencies on remote machines (performed by the `setup.yaml` playbook) sometimes failed.
The issue and solution has been
[described here](https://askubuntu.com/questions/1431786/grub-efi-amd64-signed-dependency-issue-in-ubuntu-22-04lts).
To apply the work-around, the `custom-script.yaml` playbook can be used.
If necessary, copy the following line
```shell
sudo apt --only-upgrade install grub-efi-amd64-signed
```
in the [scripts/custom.sh](scripts/custom.sh) file run
```shell
ansible-playbook -i hosts custom-script.yaml
```

## Deploying a fresh instance of Spacenet

To deploy an instance of Spacenet, first create an inventory file (called `hosts` in this example)
and populate it with IP addresses of machines that should run Spacenet as described above.

The following steps must be executed to deploy Spacenet:
1. Install necessary packages,
clone the Spacenet client (Lotus) code and compile it on the remote machines (`setup.yaml`).
2. Generate the genesis block for the network and distribute it to all nodes (`generate-genesis.yaml`)
3. Start the bootstrap node (`start-bootstrap.yaml`)
4. Start the Lotus daemons on validator nodes (`start-daemons.yaml`)
5. Start the Mir validator processes on validator nodes (`start-validators.yaml`)

These steps are automated for convenience in the `deploy-new.yaml` playbook.
Thus, to deploy a fresh instance of Spacenet, simply run
```shell
ansible-playbook -i hosts deploy-new.yaml
```


## Provided deployment playbooks

### `clean.yaml`

Kills running Lotus daemon and Mir validator and deletes their associated state.
Does not touch the code and binaries.

Applies to all hosts by default, unless other nodes are specified using --extra-vars "nodes=..."

### `connect-daemons.yaml`

Connects all Lotus daemons to each other. This is required for the nodes to be able to sync their state.
It assumes the daemons and the bootstrap are up and running (but not necessarily the validators)

Applies to all hosts (including bootstrap) by default, unless other nodes are specified using --extra-vars "nodes=..."

### `custom-script.yaml`

Runs the scripts/custom.sh script. This is meant as a convenience tool for executing ad-hoc scripts.

Applies to all hosts by default, unless other nodes are specified using --extra-vars "nodes=..."

### `deep-clean.yaml`

Performs deep cleaning of the host machines.
Runs clean.yaml and, in addition, deletes the cloned repository with the lotus code and binaries.

Applies to all hosts by default, unless other nodes are specified using --extra-vars "nodes=..."

### `deploy-current.yaml`

Deploys the bootstrap and the validators using existing binaries.
This playbook still cleans the lotus daemon state, but neither updates nor recompiles the code.
Performs the state cleanup by running clean.yaml,
potentially producing and ignoring some errors, if nothing is running on the hosts - this is normal.

The nodes variable must not be set, as this playbook must distinguish between different kinds of nodes
(such as bootstrap and validators).

### `deploy-new.yaml`

Deploys the whole system from scratch.
Performs a deep clean by running deep-clean.yaml
(potentially producing and ignoring some errors, if nothing is running on the hosts - this is normal)
and sets up a new Spacenet deployment.

The nodes variable must not be set, as this playbook must distinguish between different kinds of nodes
(such as bootstrap and validators).

### `generate-genesis.yaml`

Generates a new genesis block (at the bootstrap node) and copies it to all validators.
Assumes the hosts to have already been set up (using setup.yaml).

An alternative set of nodes to copy the genesis block to can be specified using --extra-vars "nodes=..."

### `kill.yaml`

Kills running Lotus daemon and Mir validator.
Does not touch their persisted state or the code and binaries.
Reports but ignores errors, so it can be used even if the processes to be killed are not running.

Applies to all hosts by default, unless other nodes are specified using --extra-vars "nodes=..."

### `restart-validators.yaml`

Restarts a given set of validators.
For safety, does NOT default to restarting all validators
and the set of hosts to restart must be explicitly given using --extra-vars "nodes=..."

Note that this playbook always affects all hosts, regardless of the value of the nodes variable.
This is due to the necessity of reconnecting all daemons to the restarted one.

### `setup.yaml`

Sets up the environment for running the Lotus daemon and validator.
This includes installing the necessary packages, fetching the Lotus code, and compiling it.
It does not start any nodes. See start-* and deploy-*.yaml for starting the nodes.

Applies to all hosts by default, unless other nodes are specified using --extra-vars "nodes=..."

### `start-bootstrap.yaml`

Starts the bootstrap node and downloads its identity to localhost.
Assumes the host has been set up and the genesis block has been generated
(using setup.yaml and generate-genesis.yaml respectively).

Applies to the bootstrap host by default, unless other nodes are specified using --extra-vars "nodes=..."

### `start-daemons.yaml`

Starts the Lotus daemons and creates connections among them and to the bootstrap node.
Assumes that the bootstrap node is up and running (see start-bootstrap.yaml).

Applies to the validator host by default, unless other nodes are specified using --extra-vars "nodes=..."

### `start-validators.yaml`

Starts the Mir validators.
Assumes that the Lotus daemons are up and running (see start-daemons.yaml).

Applies to the validator host by default, unless other nodes are specified using --extra-vars "nodes=..."
18 changes: 18 additions & 0 deletions deployment/clean.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Kills running Lotus daemon and Mir validator and deletes their associated state.
# Does not touch the code and binaries.
#
# Applies to all hosts by default, unless other nodes are specified using --extra-vars "nodes=..."

---
- import_playbook: kill.yaml

- name: Delete the whole lotus state
hosts: "{{nodes | default('all')}}"
gather_facts: False
become: False
tasks:
- name: "Delete the .lotus repo directory"
file:
state: absent
path: ~/.lotus
...
34 changes: 34 additions & 0 deletions deployment/connect-daemons.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Connects all Lotus daemons to each other. This is required for the nodes to be able to sync their state.
# It assumes the daemons and the bootstrap are up and running (but not necessarily the validators)
#
# Applies to all hosts (including bootstrap) by default, unless other nodes are specified using --extra-vars "nodes=..."

---
- name: Connect lotus daemons to each other
hosts: "{{nodes | default('all')}}"
gather_facts: False
become: False
tasks:

- name: Collect Lotus daemon addresses
ansible.builtin.fetch:
src: .lotus/lotus-addr
dest: tmp-lotus-addrs


- name: Combine Lotus daemon addresses into a single file
run_once: True
delegate_to: localhost
shell: 'rm -f lotus-addrs && cat tmp-lotus-addrs/*/.lotus/lotus-addr >> lotus-addrs && rm -r tmp-lotus-addrs'


- name: Copy Lotus daemon address file to all nodes
ansible.builtin.copy:
src: lotus-addrs
dest: .lotus/


- name: Connect all Lotus daemons to each other
ansible.builtin.script:
cmd: scripts/connect-daemon.sh
...
6 changes: 5 additions & 1 deletion deployment/custom-script.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Runs the scripts/custom.sh script. This is meant as a convenience tool for executing ad-hoc scripts.
#
# Applies to all hosts by default, unless other nodes are specified using --extra-vars "nodes=..."

---
- name: Run custom script
hosts: all
hosts: "{{nodes | default('all')}}"
gather_facts: False
become: False
tasks:
Expand Down
17 changes: 17 additions & 0 deletions deployment/deep-clean.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Performs deep cleaning of the host machines.
# Runs clean.yaml and, in addition, deletes the cloned repository with the lotus code and binaries.
#
# Applies to all hosts by default, unless other nodes are specified using --extra-vars "nodes=..."

---
- import_playbook: clean.yaml
- name: Delete the whole lotus state
hosts: "{{nodes | default('all')}}"
gather_facts: False
become: False
tasks:
- name: Remove the whole repository with Lotus code
file:
state: absent
path: ~/lotus
...
22 changes: 22 additions & 0 deletions deployment/deploy-current.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Deploys the bootstrap and the validators using existing binaries.
# This playbook still cleans the lotus daemon state, but neither updates nor recompiles the code.
# Performs the state cleanup by running clean.yaml,
# potentially producing and ignoring some errors, if nothing is running on the hosts - this is normal.
#
# The nodes variable must not be set, as this playbook must distinguish between different kinds of nodes
# (such as bootstrap and validators).

---
- hosts: all
gather_facts: False
tasks:
- name: Verify that nodes variable is not defined
fail: msg="Variable nodes must not be defined (nodes set to '{{ nodes }}')"
when: nodes is defined

- import_playbook: clean.yaml
- import_playbook: generate-genesis.yaml
- import_playbook: start-bootstrap.yaml
- import_playbook: start-daemons.yaml
- import_playbook: start-validators.yaml
...
23 changes: 23 additions & 0 deletions deployment/deploy-new.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Deploys the whole system from scratch.
# Performs a deep clean by running deep-clean.yaml
# (potentially producing and ignoring some errors, if nothing is running on the hosts - this is normal)
# and sets up a new Spacenet deployment.
#
# The nodes variable must not be set, as this playbook must distinguish between different kinds of nodes
# (such as bootstrap and validators).

---
- hosts: all
gather_facts: False
tasks:
- name: Verify that nodes variable is not defined
fail: msg="Variable nodes must not be defined (nodes set to '{{ nodes }}')"
when: nodes is defined

- import_playbook: deep-clean.yaml
- import_playbook: setup.yaml
- import_playbook: generate-genesis.yaml
- import_playbook: start-bootstrap.yaml
- import_playbook: start-daemons.yaml
- import_playbook: start-validators.yaml
...
39 changes: 39 additions & 0 deletions deployment/generate-genesis.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Generates a new genesis block (at the bootstrap node) and copies it to all validators.
# Assumes the hosts to have already been set up (using setup.yaml).
#
# An alternative set of nodes to copy the genesis block to can be specified using --extra-vars "nodes=..."

---
- name: Generate new Lotus genesis block
hosts: bootstrap[0]
gather_facts: False
become: False
tasks:

- name: Copy genesis template to the node
ansible.builtin.copy:
src: spacenet_template.json
dest: lotus/spacenet_template.json

- name: Run genesis generation script
ansible.builtin.script:
cmd: scripts/generate-genesis.sh spacenet-genesis.car spacenet_template.json

- name: Fetch genesis file
ansible.builtin.fetch:
flat: True
src: lotus/spacenet-genesis.car
dest: spacenet-genesis.car


- name: Distribute genesis block to all Lotus nodes
hosts: "{{nodes | default('all')}}"
gather_facts: False
become: False
tasks:

- name: Copy genesis file to all nodes
ansible.builtin.copy:
src: spacenet-genesis.car
dest: lotus/spacenet-genesis.car
...
2 changes: 1 addition & 1 deletion deployment/group_vars/all.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,5 @@ git_version: "adlrocha/mir-sync" # Meaningless example value. Set to desired cod
ansible_user: ubuntu # Example value (default on EC2 Ubuntu virtual machines)

# Other variables Ansible might use, probably no need to touch those...
ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ServerAliveInterval=60'
host_count: "{{ groups['all'] | length }}"
18 changes: 18 additions & 0 deletions deployment/kill.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Kills running Lotus daemon and Mir validator.
# Does not touch their persisted state or the code and binaries.
# Reports but ignores errors, so it can be used even if the processes to be killed are not running.
#
# Applies to all hosts by default, unless other nodes are specified using --extra-vars "nodes=..."

---
- name: Stop all nodes and delete their state
hosts: "{{nodes | default('all')}}"
gather_facts: False
become: False
tasks:

- name: "Execute cleanup script"
ansible.builtin.script:
cmd: scripts/kill.sh
ignore_errors: True
...
Loading

0 comments on commit 37ca556

Please sign in to comment.