Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and cleanup #303

Merged
merged 31 commits into from
Aug 10, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
7d831b4
Removed dependency that is no longer used.
pneerincx Aug 3, 2020
9de4ede
Updated README for static inventories relocated to subdir.
pneerincx Aug 4, 2020
611c664
Removed unnecessary slurm-management host group: use sys-admin-interf…
pneerincx Aug 4, 2020
0ba59ac
Updated main cluster.yml playbook: Only deploy grafany-proxy on airlo…
pneerincx Aug 4, 2020
3129a7a
Removed dependency that is no longer used.
pneerincx Aug 3, 2020
6e26743
Updated README for static inventories relocated to subdir.
pneerincx Aug 4, 2020
0a9563f
Removed unnecessary slurm-management host group: use sys-admin-interf…
pneerincx Aug 4, 2020
4c053c0
Updated main cluster.yml playbook: Only deploy grafany-proxy on airlo…
pneerincx Aug 4, 2020
2c5c552
Fixed typos / syntax issues.
pneerincx Aug 4, 2020
08597d2
Fix: made spacewalk_client role idempotent.
pneerincx Aug 4, 2020
c585406
Fixed merge conflict.
pneerincx Aug 4, 2020
6561ec6
Added dummy play at beginning to ping jumphost and establish a persis…
pneerincx Aug 4, 2020
f86d02d
Consistent use of ldap_* variable names, ldap_uri now includes the pr…
pneerincx Aug 6, 2020
b158389
Fixed permissions for config files created by openldap role.
pneerincx Aug 6, 2020
ff1229d
Fixed syntax error.
pneerincx Aug 6, 2020
9d70621
Fixed some ansible linter errors.
pneerincx Aug 6, 2020
751f53c
Lowered allowed number of ansible-linter errors 18 -> 5 and disable c…
pneerincx Aug 7, 2020
16afbf2
Made build_lustre_client role idempotent.
pneerincx Aug 7, 2020
95e0a78
Fixed linter errors: added pipefail option to shell tasks.
pneerincx Aug 7, 2020
e7cc0f9
Fixed linter error.
pneerincx Aug 7, 2020
99215c3
Fixed various linter errors in openldap role.
pneerincx Aug 7, 2020
f55c171
Fixed various linter errors in slurm-management role.
pneerincx Aug 7, 2020
811c840
Wrapped long line in spacewalk_client role.
pneerincx Aug 7, 2020
0d23336
Fixed linter error in ssh_host_signer role and added symlink in singl…
pneerincx Aug 7, 2020
c56809d
Fixed various linter errors in create_subgroup_directories role.
pneerincx Aug 7, 2020
952ca6e
Fixed linter errors, improved idempotency and resolved issue #302.
pneerincx Aug 7, 2020
96f2e3e
Commented tasks only required for debugging.
pneerincx Aug 7, 2020
c6ae22d
Reduced allowed linter errors further 5 -> 2.
pneerincx Aug 7, 2020
a9a8eba
Silly format fix.
pneerincx Aug 7, 2020
1f67196
Fixed linter issue in ssh_host_signer role and reduced allowed linter…
pneerincx Aug 7, 2020
b5f37d3
Wrapped long line.
pneerincx Aug 7, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .ansible-lint
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ exclude_paths:
- '~/.ansible' # Exclude external playbooks.
skip_list:
# We explicitly use latest combined with other tech to pin versions (e.g. Spacewalk).
- '403' # "Package installs should not use latest".
- '403' # "Package installs should not use latest."
- '701' # "No 'galaxy_info' found in meta/main.yml of a role."
...
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
cat lint_results
errors=$(grep -c '^[0-9]* [A-Z].*' lint_results)
echo '###############################################'
printf 'Counted %d ansible-lint errors.' ${errors:-0}
printf 'Counted %d ansible-lint errors.\n' ${errors:-0}
echo '###############################################'
if (( errors > 18 )); then /bin/false; fi
if (( errors > 1 )); then /bin/false; fi
...
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,9 +123,9 @@ Deploying a fully functional virtual cluster from scratch involves the following

3. Configure Ansible settings including the vault.

To create a new virtual cluster you will need ```group_vars``` and an inventory for that HPC cluster:
To create a new virtual cluster you will need ```group_vars``` and an static inventory for that HPC cluster:

* See the ```*_hosts.ini``` files for existing clusters for examples to create a new ```[name-of-the-cluster]*_hosts.ini```.
* See the ```static_inventories/*_hosts.ini``` files for existing clusters for examples to create a new ```[name-of-the-cluster]*_hosts.ini```.
* Create a ```group_vars/[name-of-the-cluster]/``` folder with a ```vars.yml```.
You'll find and example ```vars.yml``` file in ```group_vars/template/```.
To generate a new ```secrets.yml``` with new random passwords for the various daemons/components and encrypt this new ```secrets.yml``` file:
Expand Down Expand Up @@ -196,7 +196,7 @@ Deploying a fully functional virtual cluster from scratch involves the following
Some examples for the *Talos* development cluster:
* Configure the dynamic inventory and jumphost for the *Talos* test cluster:
```bash
export AI_INVENTORY='talos_hosts.ini'
export AI_INVENTORY='static_inventories/talos_hosts.ini'
export AI_PROXY='reception'
export ANSIBLE_VAULT_IDENTITY_LIST='all@.vault/vault_pass.txt.all, talos@.vault/vault_pass.txt.talos'
```
Expand All @@ -206,7 +206,7 @@ Deploying a fully functional virtual cluster from scratch involves the following
. ./lor-init
lof-config talos
```
* Firstly
* Firstly,
* Create local admin accounts, which can then be used to deploy the rest of the playbook.
* Deploy the signed hosts keys.
Without local admin accounts we'll need to use either a ```root``` account for direct login or the default user account of the image used to create the VMs.
Expand Down
118 changes: 55 additions & 63 deletions cluster.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@
# Order of deployment required to prevent chicken versus the egg issues:
# 0. For all deployment phases:
# export AI_PROXY="${jumphost_name}"
# export AI_INVENTORY="${cluster_name}_hosts.ini"
# export AI_INVENTORY="static_inventories/${cluster_name}_hosts.ini"
# ANSIBLE_VAULT_PASSWORD_FILE=".vault_pass.txt.${cluster_name}"
#
# 1. Use standard CentOS cloud image user 'centos' or 'root' user and without host key checking:
# export ANSIBLE_HOST_KEY_CHECKING=False
# ansible-playbook -i inventory.py -u centos -l 'jumphost,cluster' single_role_playbooks/admin-users.yml
Expand All @@ -17,14 +16,29 @@
# ansible-playbook -i inventory.py -u [admin_account] cluster.yml
# This will configure:
# A. Jumphost first as it is required to access the other machines.
# B. SAI as it is required to
# * configure layout on shared storage devices used by other machines.
# * configure Slurm control and Slurm database.
# C. DAI
# D. UI
# E. Compute nodes
# F. Documentation server
# B. Basic roles for all cluster machines part 1:
# * Roles that do NOT require regular accounts or groups to be present.
# C. An LDAP with regular user accounts, which may be required for additional roles.
# (E.g. a chmod or chgrp for a file/folder requires the corresponding user or group to be present.)
# D. Basic roles for all cluster machines part 2:
# * Roles that DO depend on regular accounts and groups.
# E. SAI as it is required to:
# * Configure layout on shared storage devices used by other machines.
# * Configure Slurm control and Slurm database.
# F. DAI
# G. UI
# H. Compute nodes
# I. Documentation server
#

#
# Dummy play to ping jumphosts and establish a persisting SSH connection
# before trying to connect to the machines behind the jumphost,
# which may otherwise fail when SSH connection multiplexing is used.
#
- name: 'Dummy play to ping jumphosts and establish a persistent SSH connection.'
hosts: jumphost

- name: 'Sanity checks before we start.'
hosts: all
pre_tasks:
Expand All @@ -47,7 +61,7 @@
- sshd
- node_exporter
- {role: geerlingguy.security, become: true}
- grafana_proxy
- {role: grafana_proxy, when: ansible_hostname == 'airlock'}
- regular-users
tasks:
- name: 'Install cron job to reboot jumphost regularly to activate kernel updates.'
Expand All @@ -61,28 +75,45 @@
cron_file: reboot
become: true

- name: 'B. Roles for SAIs.'
- name: 'B. Basic roles for all cluster machines part 1.'
hosts:
- sys-admin-interface
- cluster
roles:
- admin-users
- ssh_host_signer
- ssh_known_hosts
- spacewalk_client
- logins
- figlet_motd
- mount-volume
- ldap
- node_exporter
- static-hostname-lookup
- cluster
- sshd
- resolver
- shared_storage
- coredumps

- name: 'C. Create LDAP account server.'
hosts:
- ldap-server
roles:
- role: openldap
when:
- use_ldap | default(true, true) | bool
- create_ldap | default(false, true) | bool

- name: 'D. Basic roles for all cluster machines part 2.'
hosts:
- cluster
roles:
- ldap # client
- sshd
- regular-users
- shared_storage

- hosts: slurm-management
- name: 'E. Roles for SAIs.'
hosts:
- sys-admin-interface
roles:
- mount-volume
- slurm-management
- prom_server
- grafana
Expand All @@ -94,70 +125,31 @@
hostname_node0: "{{ ansible_hostname }}"
ip_node0: "{{ ansible_default_ipv4['address'] }}"

- name: 'C. Roles for DAIs.'
- name: 'F. Roles for DAIs.'
hosts: deploy-admin-interface
roles:
- admin-users
- ssh_host_signer
- ssh_known_hosts
- spacewalk_client
- logins
- figlet_motd
- mount-volume
- build-environment
- ldap
- node_exporter
- static-hostname-lookup
- cluster
- sshd
- resolver
- shared_storage
- regular-users
- envsync

- name: 'D. Roles for UIs.'
- name: 'G. Roles for UIs.'
hosts: user-interface
roles:
- admin-users
- ssh_host_signer
- ssh_known_hosts
- spacewalk_client
- logins
- figlet_motd
- build-environment
- ldap
- node_exporter
- static-hostname-lookup
- cluster
- sshd
- resolver
- shared_storage
- slurm_exporter
- slurm-client
- regular-users
- sudoers
- subgroup_directories
- role: fuse-layer
when: fuse_mountpoint is defined and fuse_mountpoint | length >= 1

- name: 'E. Roles for compute nodes.'
- name: 'H. Roles for compute nodes.'
hosts: compute-vm
roles:
- admin-users
- ssh_host_signer
- ssh_known_hosts
- spacewalk_client
- logins
- figlet_motd
- mount-volume
- ldap
- node_exporter
- static-hostname-lookup
- cluster
- sshd
- resolver
- shared_storage
- slurm-client
- regular-users

- name: 'F. Roles for documentation servers.'
- name: 'I. Roles for documentation servers.'
hosts:
- docs
roles:
Expand Down
1 change: 0 additions & 1 deletion galaxy-requirements.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
- src: geerlingguy.firewall
version: 2.4.0
- src: geerlingguy.postfix
- src: geerlingguy.repo-epel
- src: geerlingguy.security
...
2 changes: 1 addition & 1 deletion group_vars/boxy-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
slurm_cluster_name: 'boxy'
slurm_cluster_domain: 'hpc.rug.nl'
stack_prefix: 'bx'
uri_ldap: 172.23.40.249
ldap_uri: ldap://172.23.40.249
ldap_base: ou=umcg,o=asds
ldap_binddn: cn=clusteradminumcg,o=asds
regular_groups:
Expand Down
3 changes: 1 addition & 2 deletions group_vars/fender-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/ca-key-production-ebi"
use_ldap: yes
create_ldap: yes
uri_ldap: fd-dai
uri_ldaps: fd-dai
ldap_uri: ldap://fd-dai
ldap_port: 389
ldaps_port: 636
ldap_base: dc=hpc,dc=rug,dc=nl
Expand Down
3 changes: 1 addition & 2 deletions group_vars/gearshift-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/umcg-hpc-ca"
use_ldap: yes
create_ldap: no
uri_ldap: '172.23.40.249'
uri_ldaps: 'comanage-in.id.rug.nl'
ldap_uri: 'ldap://172.23.40.249'
ldap_port: '389'
ldaps_port: '636'
ldap_base: 'ou=research,o=asds'
Expand Down
9 changes: 1 addition & 8 deletions group_vars/hyperchicken-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/umcg-hpc-development-ca"
use_ldap: yes
create_ldap: yes
uri_ldap: hc-dai
uri_ldaps: hc-dai
ldap_uri: ldap://hc-dai
ldap_port: 389
ldaps_port: 636
ldap_base: dc=hpc,dc=rug,dc=nl
Expand Down Expand Up @@ -65,9 +64,6 @@ nameservers: [
local_admin_groups:
- 'admin'
- 'docker'
- 'solve-rd'
- 'umcg-atd'
- 'depad'
local_admin_users:
- 'centos'
- 'egon'
Expand All @@ -77,9 +73,6 @@ local_admin_users:
- 'morris'
- 'pieter'
- 'wim'
- 'umcg-atd-dm'
- 'solve-rd-dm'
- 'envsync'
envsync_user: 'envsync'
envsync_group: 'depad'
hpc_env_prefix: '/apps'
Expand Down
3 changes: 1 addition & 2 deletions group_vars/marvin-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/ca-key-production-ebi"
use_ldap: yes
create_ldap: yes
uri_ldap: mv-dai
uri_ldaps: mv-dai
ldap_uri: ldap://mv-dai
ldap_port: 389
ldaps_port: 636
ldap_base: dc=ejp,dc=rd,dc=nl
Expand Down
3 changes: 1 addition & 2 deletions group_vars/nibbler-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/umcg-hpc-development-ca"
use_ldap: yes
create_ldap: no
uri_ldap: ldap.pilot.scz.lab.surf.nl
uri_ldaps: ldap.pilot.scz.lab.surf.nl
ldap_uri: ldap://ldap.pilot.scz.lab.surf.nl
ldap_port: 636
ldaps_port: 636
ldap_base: o=ElixirNL,dc=pilot-clients,dc=scz,dc=lab,dc=surf,dc=nl
Expand Down
3 changes: 1 addition & 2 deletions group_vars/talos-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/umcg-hpc-development-ca"
use_ldap: yes
create_ldap: no
uri_ldap: '172.23.40.249'
uri_ldaps: 'comanage-in.id.rug.nl'
ldap_uri: 'ldap://172.23.40.249'
ldap_port: '389'
ldaps_port: '636'
ldap_base: 'ou=umcg,o=asds'
Expand Down
4 changes: 3 additions & 1 deletion roles/cluster/tasks/build_lustre_client.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,7 @@
dest: '/tmp/lustre-client-dkms-2.11.0-1.el7.src.rpm'

- name: 'Build the Lustre client.'
command: rpmbuild --rebuild --without servers /tmp/lustre-client-dkms-2.11.0-1.el7.src.rpm
command:
cmd: 'rpmbuild --rebuild --without servers /tmp/lustre-client-dkms-2.11.0-1.el7.src.rpm'
creates: '/tmp/lustre-client-dkms-2.11.0-1.el7.src.rpm.rebuild'
...
12 changes: 8 additions & 4 deletions roles/cluster/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,10 @@
become: true

- name: Check if rsync >= 3.1.2 is installed on the managed hosts.
shell: |
rsync --version 2>&1 | head -n 1 | sed 's|^rsync *version *\([0-9\.]*\).*$|\1|' | tr -d '\n'
shell:
cmd: |
set -o pipefail
rsync --version 2>&1 | head -n 1 | sed 's|^rsync *version *\([0-9\.]*\).*$|\1|' | tr -d '\n'
args:
warn: no
changed_when: false
Expand All @@ -66,8 +68,10 @@
failed_when: 'rsync_version_managed_host is failed or (rsync_version_managed_host.stdout is version_compare("3.1.2", operator="<"))'

- name: Check if rsync >= 3.1.2 is installed on the control host.
shell: |
rsync --version 2>&1 | head -n 1 | sed 's|^rsync *version *\([0-9\.]*\).*$|\1|' | tr -d '\n'
shell:
cmd: |
set -o pipefail
rsync --version 2>&1 | head -n 1 | sed 's|^rsync *version *\([0-9\.]*\).*$|\1|' | tr -d '\n'
args:
warn: no
changed_when: false
Expand Down
Loading