Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: new data transfer servers #451

Merged
merged 37 commits into from
Aug 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
5f2e8e9
Added data staging server to inventory for Nibbler and added code to …
pneerincx Aug 23, 2021
c7b8944
Improved readability of long variable.
pneerincx Aug 23, 2021
d5ab514
Fixed comment in dynamic inventory.
pneerincx Aug 23, 2021
9d293d7
Added data staging server to inventory for Nibbler and added code to …
pneerincx Aug 23, 2021
a4461f7
Added data_staging inventory group to several single role playbooks.
pneerincx Aug 23, 2021
c081ee4
Removed very long lists of hostnames out of templates and replaced th…
pneerincx Aug 23, 2021
3574a25
Moved filtering of users/groups that should not be allowed acces from…
pneerincx Aug 23, 2021
64ac585
Added data_staging inventory group to roles/regular_users/tasks/main.…
pneerincx Aug 23, 2021
c406faf
Removed home module from rsyncd configuration: use module for groups …
pneerincx Aug 23, 2021
c571bc4
Added "connection: local" to make tasks work with static inventories …
pneerincx Aug 23, 2021
68f5d46
Added data staging server to inventory for Nibbler and added code to …
pneerincx Aug 23, 2021
74bf220
Improved readability of long variable.
pneerincx Aug 23, 2021
0ecbee2
Fixed comment in dynamic inventory.
pneerincx Aug 23, 2021
306d497
Added data staging server to inventory for Nibbler and added code to …
pneerincx Aug 23, 2021
369c3bd
Added data_staging inventory group to several single role playbooks.
pneerincx Aug 23, 2021
c8bc453
Removed very long lists of hostnames out of templates and replaced th…
pneerincx Aug 23, 2021
229a731
Moved filtering of users/groups that should not be allowed acces from…
pneerincx Aug 23, 2021
5a7e9b4
Added data_staging inventory group to roles/regular_users/tasks/main.…
pneerincx Aug 23, 2021
2bac6f9
Removed home module from rsyncd configuration: use module for groups …
pneerincx Aug 23, 2021
f48c0ee
Added "connection: local" to make tasks work with static inventories …
pneerincx Aug 23, 2021
519692e
Updated mount_volume role: added support for multiple volumes as oppo…
pneerincx Aug 24, 2021
f0211d0
Fixed merge confict.
pneerincx Aug 24, 2021
16abedd
Create group folders in /groups on data staging servers.
pneerincx Aug 24, 2021
0083eee
Added mount_volume role to single role playbook for data staging serv…
pneerincx Aug 25, 2021
8e8fc13
Added data_staging group to single rol playbook for logins role.
pneerincx Aug 25, 2021
aa93ba0
Added login check script to create chrooted fake homes for users in t…
pneerincx Aug 25, 2021
b9f0c8f
Updated rsyncd role: added separate configs for regular users and use…
pneerincx Aug 25, 2021
6f8a66c
Added data_transfer_only_group variable for Gearshift, Talos and Nibb…
pneerincx Aug 25, 2021
8a6555c
Update sshd role to use data_transfer_only_group variable.
pneerincx Aug 25, 2021
edda842
Renamed data_staging inventory group to data_transfer group to preven…
pneerincx Aug 25, 2021
5795b1f
Renamed data_staging inventory group to data_transfer group to preven…
pneerincx Aug 25, 2021
2258244
Renamed nb-ds -> nb-dt.
pneerincx Aug 26, 2021
18374fa
Fixed formatting / linter issues.
pneerincx Aug 26, 2021
79f0b51
Renamed group_vars/data_staging.yml -> group_vars/data_transfer.yml.
pneerincx Aug 26, 2021
ca0916e
Fixed linter issue.
pneerincx Aug 26, 2021
be0e937
wrapped long variable for readability.
pneerincx Aug 26, 2021
b7c56ae
Fixed permissions for data_transfer_only_group.
pneerincx Aug 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 175 additions & 2 deletions deploy-os_servers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,9 @@
# portip: 10.10.1.1
wait: true
timeout: "{{ openstack_api_timeout }}"
#
# Jumphosts security group.
#
- name: "Create security group for {{ slurm_cluster_name }} jumphosts."
openstack.cloud.security_group:
state: present
Expand Down Expand Up @@ -159,6 +162,63 @@
remote_ip_prefix: 0.0.0.0/0
wait: true
timeout: "{{ openstack_api_timeout }}"
#
# Data staging security group.
#
- name: "Create security group for {{ slurm_cluster_name }} data staging servers."
openstack.cloud.security_group:
state: present
name: "{{ slurm_cluster_name }}_ds"
description: |
Security group for data staging severs without access to the internal network.
Allows SSH inbound on both port 22 and 443.
Allows ICMP inbound.
Allows all outbound traffic.
wait: true
timeout: "{{ openstack_api_timeout }}"
- name: "Add rule to {{ slurm_cluster_name }}_ds security group: allow SSH inbound on port 22."
openstack.cloud.security_group_rule:
security_group: "{{ slurm_cluster_name }}_ds"
direction: ingress
protocol: tcp
port_range_min: 22
port_range_max: 22
remote_ip_prefix: 0.0.0.0/0
wait: true
timeout: "{{ openstack_api_timeout }}"
- name: "Add rule to {{ slurm_cluster_name }}_ds security group: allow SSH inbound on port 443."
openstack.cloud.security_group_rule:
security_group: "{{ slurm_cluster_name }}_ds"
direction: ingress
protocol: tcp
port_range_min: 443
port_range_max: 443
remote_ip_prefix: 0.0.0.0/0
wait: true
timeout: "{{ openstack_api_timeout }}"
- name: "Add rule to {{ slurm_cluster_name }}_ds security group: allow LDAPS inbound on port 636."
openstack.cloud.security_group_rule:
security_group: "{{ slurm_cluster_name }}_ds"
direction: ingress
protocol: tcp
port_range_min: 636
port_range_max: 636
remote_ip_prefix: 0.0.0.0/0 # ToDo restrict to {{ ldap_uri }}
wait: true
timeout: "{{ openstack_api_timeout }}"
- name: "Add rule to {{ slurm_cluster_name }}_ds security group: allow ICMP inbound."
openstack.cloud.security_group_rule:
security_group: "{{ slurm_cluster_name }}_ds"
direction: ingress
protocol: icmp
port_range_min: -1 # ICMP protocol does not have any ports.
port_range_max: -1 # ICMP protocol does not have any ports.
remote_ip_prefix: 0.0.0.0/0
wait: true
timeout: "{{ openstack_api_timeout }}"
#
# Cluster security group.
#
- name: Create security group for {{ slurm_cluster_name }} cluster machines behind jumphost.
openstack.cloud.security_group:
state: present
Expand Down Expand Up @@ -289,6 +349,94 @@
when: (jumphost_vm.server.addresses | dict2items(key_name='vlan', value_name='specs') | json_query(query)) != network_private_management_id
vars:
query: '[?specs[?"OS-EXT-IPS:type"==`floating`]].vlan | [0]'
- name: Make sure jumphosts are started.
openstack.cloud.server_action:
action: start
server: "{{ inventory_hostname }}"
wait: true
timeout: "{{ openstack_api_timeout }}"
##############################################################################
# Configure data staging servers from inventory using Openstack API.
##############################################################################
- name: Create data staging servers.
hosts: data_transfer
connection: local
gather_facts: false
vars:
#
# Disable Ansible's interpretor detection logic,
# which fails to use the interpretor from an activated virtual environment
# and hence fails to find the OpenStackSDK if it was installed in a virtual environment.
#
- ansible_python_interpreter: python
tasks:
- name: Create persistent data volume for data staging servers.
openstack.cloud.volume:
display_name: "{{ inventory_hostname }}-volume"
size: "{{ local_volume_size_ds }}"
state: present
availability_zone: "{{ storage_availability_zone | default('nova') }}"
wait: true
timeout: "{{ openstack_api_timeout }}"
- name: Create data staging servers.
openstack.cloud.server:
state: present
name: "{{ inventory_hostname }}"
image: "{{ cloud_image }}"
volume_size: "{{ local_volume_size_root }}"
terminate_volume: true
boot_from_volume: true
flavor: "{{ flavor_ds }}"
security_groups: "{{ slurm_cluster_name }}_ds"
auto_floating_ip: false
nics:
- net-name: "{{ network_private_management_id }}"
userdata: |
#cloud-config
password: "{{ cloud_console_pass }}"
chpasswd: { expire: False }
#
# Add each entry to ~/.ssh/authorized_keys for the configured user
# or the first user defined in the user definition directive.
#
ssh_authorized_keys:
{% for public_key in public_keys_of_local_admins %} - {{ public_key }}
{% endfor %}
availability_zone: "{{ availability_zone }}"
wait: true
timeout: "{{ openstack_api_timeout }}"
register: ds_vm
- name: Assign floating IPs to data staging servers.
openstack.cloud.floating_ip:
server: "{{ inventory_hostname }}"
state: present
reuse: true
network: "{{ network_public_external_id }}"
nat_destination: "{{ network_private_management_id }}"
wait: true
timeout: "{{ openstack_api_timeout }}"
#
# Known bug https://github.com/ansible/ansible/issues/57451
# openstack.cloud.floating_ip is not idempotent:
# Succeeds only the first time and throws error on any subsequent calls.
# Therefore we use a "when" with a silly complex jinja filter including a JMESpath query
# to check is the VM already has a floating IP linked to an interface in the correct VXLAN.
#
when: (ds_vm.server.addresses | dict2items(key_name='vlan', value_name='specs') | json_query(query)) != network_private_management_id
vars:
query: '[?specs[?"OS-EXT-IPS:type"==`floating`]].vlan | [0]'
- name: Attach local storage volume to data staging servers.
openstack.cloud.server_volume:
server: "{{ inventory_hostname }}"
volume: "{{ inventory_hostname }}-volume"
wait: true
timeout: "{{ openstack_api_timeout }}"
- name: Make sure data staging servers are started.
openstack.cloud.server_action:
action: start
server: "{{ inventory_hostname }}"
wait: true
timeout: "{{ openstack_api_timeout }}"
##############################################################################
# Configure repo servers from inventory using Openstack API.
##############################################################################
Expand Down Expand Up @@ -331,7 +479,12 @@
availability_zone: "{{ availability_zone }}"
wait: true
timeout: "{{ openstack_api_timeout }}"
register: repo_vm
- name: Make sure repo servers are started.
openstack.cloud.server_action:
action: start
server: "{{ inventory_hostname }}"
wait: true
timeout: "{{ openstack_api_timeout }}"
##############################################################################
# Configure UIs from inventory using Openstack API.
##############################################################################
Expand Down Expand Up @@ -375,6 +528,12 @@
availability_zone: "{{ availability_zone }}"
wait: true
timeout: "{{ openstack_api_timeout }}"
- name: Make sure UIs are started.
openstack.cloud.server_action:
action: start
server: "{{ inventory_hostname }}"
wait: true
timeout: "{{ openstack_api_timeout }}"
##############################################################################
# Configure compute nodes from inventory using Openstack API.
##############################################################################
Expand Down Expand Up @@ -433,6 +592,12 @@
volume: "{{ inventory_hostname }}-volume"
wait: true
timeout: "{{ openstack_api_timeout }}"
- name: Make sure compute nodes are started.
openstack.cloud.server_action:
action: start
server: "{{ inventory_hostname }}"
wait: true
timeout: "{{ openstack_api_timeout }}"
#############################################################################
# Configure SAI from inventory using Openstack API.
#############################################################################
Expand Down Expand Up @@ -550,6 +715,12 @@
volume: "{{ inventory_hostname }}-volume"
wait: true
timeout: "{{ openstack_api_timeout }}"
- name: Make sure admin interface servers are started.
openstack.cloud.server_action:
action: start
server: "{{ inventory_hostname }}"
wait: true
timeout: "{{ openstack_api_timeout }}"
##############################################################################
# Get IPs addresses from API for static hostname lookup with /etc/hosts.
##############################################################################
Expand Down Expand Up @@ -578,7 +749,9 @@
dest: "{{ playbook_dir }}/group_vars/{{ slurm_cluster_name }}_cluster/ip_addresses.yml.new"
mode: '0644'
vars:
relevant_servers_list: "{{ groups['jumphost'] | default([]) + groups['repo'] | default([]) + groups['cluster'] | default([]) }}"
relevant_servers_list: "{{ groups['jumphost'] | default([]) + \
groups['data_transfer'] | default([]) + \
groups['repo'] | default([]) + groups['cluster'] | default([]) }}"
relevant_servers_info: "{{ api_server_info.openstack_servers | selectattr('name', 'in', relevant_servers_list) | list }}"
- name: "ToDo"
debug:
Expand Down
3 changes: 0 additions & 3 deletions group_vars/administration.yml

This file was deleted.

17 changes: 17 additions & 0 deletions group_vars/data_transfer.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
firewall_allowed_tcp_ports:
- 22 # SSH.
- 443 # SSH.
ssh_host_signer_hostnames: "{{ ansible_hostname }}\
{% if slurm_cluster_domain | length %},{{ ansible_hostname }}.{{ slurm_cluster_domain }}{% endif %}\
{% if public_ip_addresses is defined and public_ip_addresses[ansible_hostname] | length %},{{ public_ip_addresses[ansible_hostname] }}{% endif %}\
{% for host in groups['jumphost'] %},{{ host }}+{{ ansible_hostname }}{% endfor %}"
volumes:
- mount_point: '/groups'
device: '/dev/vdb'
mounted_owner: root
mounted_group: root
mounted_mode: '0755'
mount_options: 'rw,relatime'
type: ext4
...
10 changes: 10 additions & 0 deletions group_vars/deploy_admin_interface.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
volumes:
- mount_point: '/apps'
device: '/dev/vdb'
mounted_owner: root
mounted_group: "{{ envsync_group }}"
mounted_mode: '2755'
mount_options: 'rw,relatime'
type: ext4
...
8 changes: 6 additions & 2 deletions group_vars/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ extra_jumphosts_for_docs_server:
- 'corridor' # Fender
- 'dockingport' # Marvin
ssh_host_signer_hostnames: "{{ ansible_fqdn }},{{ ansible_hostname }}\
{% for jumphost_for_this_cluster in groups['jumphost'] %},{{ jumphost_for_this_cluster }}+{{ ansible_hostname }}{% endfor %}\
{% for extra_jumphost in extra_jumphosts_for_docs_server %},{{ extra_jumphost }}+{{ ansible_hostname }}{% endfor %}"
{% for jumphost_for_this_cluster in groups['jumphost'] %}\
,{{ jumphost_for_this_cluster }}+{{ ansible_hostname }}\
{% endfor %}\
{% for extra_jumphost in extra_jumphosts_for_docs_server %}\
,{{ extra_jumphost }}+{{ ansible_hostname }}\
{% endfor %}"
...
1 change: 1 addition & 0 deletions group_vars/gearshift_cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ ldap_group_quota_hard_limit_template: 'ruggroupumcgquotaLFS'
filter_passwd: '(|(rugpersonentitlementvalue=scz)(rugpersonentitlementvalue=umcg))'
filter_shadow: '(|(rugpersonentitlementvalue=scz)(rugpersonentitlementvalue=umcg))'
pam_authz_search: '(|(&(objectClass=posixGroup)(cn=co_bbmri_g-GRP_Gearshift)(memberUid=$username))(&(cn=$username)(rugpersonentitlementvalue=umcg)))'
data_transfer_only_group: umcg-sftp-only
nameservers: [
'172.23.40.244', # Order is important: local DNS for Isilon storage first!
'8.8.4.4', # Google DNS.
Expand Down
14 changes: 10 additions & 4 deletions group_vars/jumphost.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
---
firewall_allowed_tcp_ports:
- "22" # SSH.
- "443" # SSH fallback when 22 is blocked.
- "3000" # Grafana server.
- 22 # SSH.
- 443 # SSH fallback when 22 is blocked.
- 3000 # Grafana server.
firewall_additional_rules:
- "iptables -A INPUT -i eth1 -p tcp -s 129.125.2.233,129.125.2.225,129.125.2.226 --dport 9090 -j ACCEPT -m comment --comment 'prometheus server'"
ssh_host_signer_hostnames: "{{ ansible_hostname }}{% if slurm_cluster_domain | length %}.{{ slurm_cluster_domain }}{% endif %},{{ ansible_hostname }}{% if public_ip_addresses is defined and public_ip_addresses[ansible_hostname] | length %},{{ public_ip_addresses[ansible_hostname] }}{% endif %}"
ssh_host_signer_hostnames: "{{ ansible_hostname }}\
{% if slurm_cluster_domain | length %}\
,{{ ansible_hostname }}.{{ slurm_cluster_domain }}\
{% endif %}\
{% if public_ip_addresses[ansible_hostname] is defined and public_ip_addresses[ansible_hostname] | length %}\
,{{ public_ip_addresses[ansible_hostname] }}\
{% endif %}"
...
5 changes: 5 additions & 0 deletions group_vars/nibbler_cluster/ip_addresses.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
---
ip_addresses:
nb-dt:
addr: 10.10.1.28
mask: /32
vlan: internal_management
fqdn:
nb-repo:
addr: 10.10.1.56
mask: /32
Expand Down
6 changes: 5 additions & 1 deletion group_vars/nibbler_cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,11 @@ ldap_binddn: 'cn=clusteradminumcg,o=asds'
ldap_group_object_class: 'groupofnames'
ldap_group_quota_soft_limit_template: 'ruggroupumcgquotaLFSsoft'
ldap_group_quota_hard_limit_template: 'ruggroupumcgquotaLFS'
data_transfer_only_group: umcg-sftp-only
cloud_image: CentOS 7
cloud_user: centos
flavor_jumphost: m1.small
flavor_ds: m1.small
flavor_ui: m1.xlarge
#flavor_vcompute: htc-node # Quota currently too small for 40 core compute nodes.
flavor_vcompute: m1.xlarge
Expand All @@ -73,12 +75,13 @@ network_private_storage_id: internal_storage
network_private_storage_cidr: '10.10.2.0/24'
public_ip_addresses:
tunnel: '195.169.22.136'
nb-ds: '195.169.22.166'
availability_zone: nova
local_volume_size_ds: 10000
local_volume_size_repo: 20
local_volume_size_vcompute: 1
local_volume_size_dai: 200 # used for /apps, but test server shouldn't take too much space
local_volume_size_sai: 1

nameservers: [
'8.8.4.4', # Google DNS.
'8.8.8.8', # Google DNS.
Expand Down Expand Up @@ -109,6 +112,7 @@ regular_groups:
- 'umcg-depad'
- 'umcg-gcc'
- 'umcg-lifelines'
- "{{ data_transfer_only_group }}"
regular_users:
- user: 'umcg-atd-ateambot'
groups: ['umcg-atd']
Expand Down
1 change: 1 addition & 0 deletions group_vars/talos_cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ ldap_binddn: 'cn=clusteradminumcg,o=asds'
ldap_group_object_class: 'groupofnames'
ldap_group_quota_soft_limit_template: 'ruggroupumcgquotaLFSsoft'
ldap_group_quota_hard_limit_template: 'ruggroupumcgquotaLFS'
data_transfer_only_group: umcg-sftp-only
nameservers: [
'/gcc-storage001.stor.hpc.local/172.23.40.244', # Local DNS lookups for shared storage.
'8.8.4.4', # Google DNS.
Expand Down
2 changes: 1 addition & 1 deletion inventory.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
with the hostname of one of our proxy/jumphost servers.
Note we only use hostnames and not FQDN nor IP addresses as those are managed
together with usernames and other connection settings in
our ~/.ssh/config files like this:
our ~/.ssh/conf.d/ files like this:

########################################################################################################
#
Expand Down
Loading