Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

20210218 zrq infraops #381

Merged
merged 27 commits into from
Mar 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
ffe2137
Removed gateway node
Zarquan Feb 11, 2021
5f1cb10
Fix to catch all the keys created by create-all
Zarquan Feb 11, 2021
d0906a1
Finish notes on cherry picking
Zarquan Feb 11, 2021
2432401
Moved Hadoop, Spark and Zeppelin vars into hosts.yml
Zarquan Feb 11, 2021
090ba72
Fix a problem with Fedora updates
Zarquan Feb 11, 2021
287a164
Volume mounts for temp space
Zarquan Feb 11, 2021
ebdb26d
Notes on git branches
Zarquan Feb 13, 2021
fb30bea
Configuring Hadoop and HDFS directories
Zarquan Feb 13, 2021
7ba885d
Performance optimizations
Zarquan Feb 13, 2021
56082d9
....
Zarquan Feb 13, 2021
8626e08
....
Zarquan Feb 13, 2021
7f8a589
....
Zarquan Feb 13, 2021
bc2da47
....
Zarquan Feb 15, 2021
d2ee384
Added jq|sed parser for notebook timings
Zarquan Feb 15, 2021
a8a25c9
Performance testing
Zarquan Feb 15, 2021
63271a2
....
Zarquan Feb 15, 2021
7ddae33
....
Zarquan Feb 16, 2021
58c0a30
Renaming notes
Zarquan Feb 16, 2021
6447eb3
Notes on ML meeting and OS resources
Zarquan Feb 18, 2021
b0fdaee
Made a start on Infra-Ops work
Zarquan Feb 19, 2021
a18c7da
Added Ansible role for DNSmasq service
Zarquan Feb 21, 2021
cbefdfc
First pass at configuring DNSmasq server
Zarquan Feb 22, 2021
e2e6368
Generating DNSmasq config from hosts.yml
Zarquan Feb 23, 2021
9a40bf9
Added monitoring tools
Zarquan Feb 23, 2021
5532983
Added check for empty list and updated records in LCN
Zarquan Feb 23, 2021
9901dfe
Notes on updating public DNS records
Zarquan Feb 23, 2021
d3f45fd
Notes on DNS service failure
Zarquan Feb 25, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions experiments/hadoop-yarn/ansible/03-create-masters.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
register:
mastersec

- name: "Add a rule to allow SSH from our gateway"
- name: "Add a rule to allow SSH from zeppelin"
os_security_group_rule:
cloud: "{{ cloudname }}"
state: present
Expand All @@ -44,7 +44,7 @@
protocol: 'tcp'
port_range_min: 22
port_range_max: 22
remote_group: "{{ security['gateway'] }}"
remote_group: "{{ security['zeppelin'] }}"

- name: "Create our masters"
os_server:
Expand Down
4 changes: 2 additions & 2 deletions experiments/hadoop-yarn/ansible/04-create-workers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
register:
secgroup

- name: "Add a rule to allow ssh from the gateway"
- name: "Add a rule to allow ssh from zeppelin"
os_security_group_rule:
cloud: "{{ cloudname }}"
state: present
Expand All @@ -44,7 +44,7 @@
protocol: 'tcp'
port_range_min: 22
port_range_max: 22
remote_group: "{{ security['gateway'] }}"
remote_group: "{{ security['zeppelin'] }}"

- name: "Create our workers"
os_server:
Expand Down
50 changes: 50 additions & 0 deletions experiments/hadoop-yarn/ansible/04-update-fedora.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#
# <meta:header>
# <meta:licence>
# Copyright (c) 2020, ROE (http://www.roe.ac.uk/)
#
# This information is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This information is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# </meta:licence>
# </meta:header>
#
#

# ignore_errors
# https://docs.ansible.com/ansible/latest/user_guide/playbooks_error_handling.html#ignoring-failed-commands

- name: "DNF update"
gather_facts: false
hosts: masters:workers:zeppelin
vars_files:
- /tmp/ansible-vars.yml
tasks:

# This is a noop to force a cache-refresh.
- name: "Update the DNF cache"
become: true
ignore_errors: yes
dnf:
name: 'kernel'
state: present
update_cache: yes


- name: "Install monitoring tools"
become: true
dnf:
name:
- 'atop'
- 'htop'
state: present

6 changes: 3 additions & 3 deletions experiments/hadoop-yarn/ansible/05-config-ssh.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,12 @@
mode: 'u=rwx,g=rx,o=rx'
state: directory

- name: "Discover our gateway nodes"
- name: "Discover our zeppelin node"
os_server_info:
cloud: "{{ cloudname }}"
server: "{{ deployname }}-gateway"
server: "{{ deployname }}-zeppelin"
register:
gatewaynodes
zeppelinnodes

- name: "Generate Ansible SSH config"
template:
Expand Down
26 changes: 4 additions & 22 deletions experiments/hadoop-yarn/ansible/06-config-dns.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,12 @@
- /tmp/ansible-vars.yml
tasks:

- name: "Discover our gateway nodes"
- name: "Discover our Zeppelin node"
os_server_info:
cloud: "{{ cloudname }}"
server: "{{ deployname }}-gateway*"
server: "{{ deployname }}-zeppelin"
register:
gatewaynodes
zeppelinnode

- name: "Discover our master nodes"
os_server_info:
Expand All @@ -47,34 +47,16 @@
register:
workernodes

- name: "Discover our Zeppelin nodes"
os_server_info:
cloud: "{{ cloudname }}"
server: "{{ deployname }}-zeppelin"
register:
zeppelinnode

- name: "Generate our DNS hosts file"
template:
src: 'templates/dns-hosts.j2'
dest: "/tmp/aglais-dns-hosts"

- hosts: gateway
gather_facts: false
tasks:
- name: "Deploy [/etc/hosts] to our gateway"
become: true
copy:
src: /tmp/aglais-dns-hosts
dest: /etc/hosts
owner: root
group: root
mode: u=rw,g=r,o=r

- hosts: zeppelin
gather_facts: false
tasks:
- name: "Deploy [/etc/hosts] to our Zeppelin"
- name: "Deploy [/etc/hosts] to our Zeppelin node"
become: true
copy:
src: /tmp/aglais-dns-hosts
Expand Down
4 changes: 2 additions & 2 deletions experiments/hadoop-yarn/ansible/07-host-keys.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
# https://everythingshouldbevirtual.com/automation/ansible-ssh-known-host-keys/
#

- hosts: gateway
- hosts: zeppelin
gather_facts: false
tasks:

Expand Down Expand Up @@ -50,7 +50,7 @@
dest: "/tmp/aglais-ssh-hosts"


- hosts: gateway:masters:workers:zeppelin
- hosts: masters:workers:zeppelin
gather_facts: false
tasks:
- name: "Deploy the known hosts file to [/etc/ssh/ssh_known_hosts]"
Expand Down
2 changes: 1 addition & 1 deletion experiments/hadoop-yarn/ansible/08-ping-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

---
- name: "Ping tests"
hosts: gateway:masters:workers:zeppelin
hosts: zeppelin:masters:workers
gather_facts: false
tasks:

Expand Down
5 changes: 3 additions & 2 deletions experiments/hadoop-yarn/ansible/09-worker-volumes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,13 @@
become: true
dnf:
name: btrfs-progs
state: latest
state: present

- name: "Mount data volumes for {{ inventory_hostname }}"
- name: "Call the mount-volumes task"
include_tasks: tasks/mount-volumes.yml
loop: "{{ hostvars[ inventory_hostname ].discs }}"
loop_control:
loop_var: disc
when: ((disc.type == 'cinder') or (disc.type == 'local'))


45 changes: 16 additions & 29 deletions experiments/hadoop-yarn/ansible/11-install-hadoop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,6 @@
- name: "Install Hadoop"
hosts: masters:workers:zeppelin
gather_facts: false
vars:
hdname: "hadoop-3.1.3"
hdbase: "/opt"
hdhome: "/opt/hadoop"
hddata: "/var/local/hadoop"
hdhost: "{{groups['masters'][0]}}"
hduser: "{{hostvars[inventory_hostname].login}}"

tasks:

Expand All @@ -46,32 +39,26 @@
dest: "{{hdbase}}"
remote_src: yes

- name: "Create a symbolic link"
- name: "Create a symlink for the Hadoop version"
become: true
file:
src: "{{hdname}}"
path: "{{hdhome}}"
state: link

- name: "Create '{{hddata}}'"
become: true
file:
path: "{{hddata}}"
mode: 'u=rwx,g=rwxs,o=rx'
state: directory
recurse: yes
owner: "{{hduser}}"
group: "{{hduser}}"
- name: "Create Hadoop data directory"
include_tasks: "tasks/create-linked.yml"
vars:
linkdest: "{{hddatadest}}"
linkpath: "{{hddatalink}}"
linkuser: "{{hduser}}"

- name: "Create [{{hddata}}/logs]"
become: true
file:
path: "{{hddata}}/logs"
mode: 'u=rwx,g=rwxs,o=rx'
state: directory
recurse: yes
owner: "{{hduser}}"
group: "{{hduser}}"
- name: "Create Hadoop logs directory"
include_tasks: "tasks/create-linked.yml"
vars:
linkdest: "{{hdlogsdest}}"
linkpath: "{{hdlogslink}}"
linkuser: "{{hduser}}"

# https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_Environment_of_Hadoop_Daemons
- name: "Create [/etc/profile.d/hadoop.sh]"
Expand All @@ -89,8 +76,8 @@
export PATH=${PATH}:{{hdhome}}/bin:{{hdhome}}/sbin
#export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:{{hdhome}}/lib/native
export HADOOP_HOME={{hdhome}}
export HADOOP_DATA={{hddata}}
export HADOOP_CONF_DIR={{hdhome}}/etc/hadoop
export HADOOP_LOG_DIR=${HADOOP_DATA}/logs
export HADOOP_DATA={{hddatalink}}
export HADOOP_CONF_DIR={{hdconf}}
export HADOOP_LOG_DIR={{hdlogslink}}


20 changes: 13 additions & 7 deletions experiments/hadoop-yarn/ansible/12-config-hadoop-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,16 @@
- name: "Configure Hadoop [core-site.xml]"
hosts: masters:workers:zeppelin
gather_facts: false
vars:
hdname: "hadoop-3.1.3"
hdbase: "/opt"
hdhome: "/opt/hadoop"
hddata: "/var/local/hadoop"
hdhost: "{{groups['masters'][0]}}"
hduser: "{{hostvars[inventory_hostname].login}}"

tasks:

- name: "Create Hadoop temp directory"
include_tasks: "tasks/create-linked.yml"
vars:
linkdest: "{{hdtempdest}}"
linkpath: "{{hdtemplink}}"
linkuser: "{{hduser}}"

# https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_the_Hadoop_Daemons
# https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/core-default.html
- name: "Configure [{{hdhome}}/etc/hadoop/core-site.xml]"
Expand All @@ -58,4 +58,10 @@
<value>hdfs://{{hdhost}}:9000</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>{{hdtemplink}}</value>
</property>



7 changes: 0 additions & 7 deletions experiments/hadoop-yarn/ansible/12-config-ssh-access.yml
Original file line number Diff line number Diff line change
Expand Up @@ -102,13 +102,6 @@
- name: "Configure Hadoop [workers] on master nodes"
hosts: masters:zeppelin
gather_facts: false
vars:
hdname: "hadoop-3.1.3"
hdbase: "/opt"
hdhome: "/opt/hadoop"
hddata: "/var/local/hadoop"
hdhost: "{{groups['masters'][0]}}"
hduser: "{{hostvars[inventory_hostname].login}}"

tasks:

Expand Down
32 changes: 17 additions & 15 deletions experiments/hadoop-yarn/ansible/13-config-hdfs-namenode.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,27 +21,29 @@
#

- name: "Configure HDFS namenode"
hosts: master01:zeppelin
hosts: master01
gather_facts: false
vars:
hdname: "hadoop-3.1.3"
hdbase: "/opt"
hdhome: "/opt/hadoop"
hddata: "/var/local/hadoop"
hdhost: "{{groups['masters'][0]}}"
hduser: "{{hostvars[inventory_hostname].login}}"
hdfsimage: "{{hdfsmetalink}}/namenode/fsimage"

tasks:

- name: "Create [{{hddata}}/namenode/fsimage]"
- name: "Create HDFS metadata directory"
include_tasks: "tasks/create-linked.yml"
vars:
linkdest: "{{hdfsmetadest}}"
linkpath: "{{hdfsmetalink}}"
linkuser: "{{hdfsuser}}"

- name: "Create [{{hdfsimage}}]"
become: true
file:
path: "{{hddata}}/namenode/fsimage"
path: "{{hdfsimage}}"
mode: 'u=rwx,g=rwxs,o=rx'
state: directory
recurse: yes
owner: "{{hduser}}"
group: "{{hduser}}"
owner: "{{hdfsuser}}"
group: "{{hdfsuser}}"

# https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_the_Hadoop_Daemons
# https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
Expand All @@ -54,12 +56,12 @@
block: |
<!--+
| Determines where on the local filesystem the DFS name node should store the name table(fsimage).
| If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
| If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
| https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
+-->
<property>
<name>dfs.namenode.name.dir</name>
<value>{{hddata}}/namenode/fsimage</value>
<value>{{hdfsimage}}</value>
</property>

<!--+
Expand All @@ -85,7 +87,7 @@
<!--+
| Default block replication.
| The actual number of replications can be specified when the file is created.
| The default is used if replication is not specified in create time.
| The default is used if replication is not specified in create time.
| https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
+-->
<property>
Expand Down Expand Up @@ -115,7 +117,7 @@
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>

<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
Expand Down
Loading