Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Spark / Hadoop / Zeppelin (Issue #535) #590

Merged
merged 13 commits into from
Dec 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
589 changes: 276 additions & 313 deletions deployments/common/zeppelin/interpreter.json

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions deployments/hadoop-yarn/ansible/27-install-zeppelin.yml
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,18 @@
<description>Enable directory listings on server.</description>
</property>

<property>
<name>zeppelin.interpreter.exclude</name>
<value>angular,livy,alluxio,file,psql,flink,ignite,lens,cassandra,geode,kylin,elasticsearch,scalding,jdbc,hbase,bigquery,beam,groovy,flink-cmd,hazelcastjet,influxdb,java,jupyter,kotlin,ksql,mongodb,neo4j,pig,r,sap,spark-submit,sparql,submarine</value>
<description>All the inteprreters that you would like to exclude. You can only specify either 'zeppelin.interpreter.include' or 'zeppelin.interpreter.exclude'. Specifying them together is not allowed.</description>
</property>
Comment on lines +203 to +207
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these excluded ?
In might be better to explicitly list the interpreters we do want in zeppelin.interpreter.include rather than excluding an arbitrary list in zeppelin.interpreter.exclude.

I've created an issue to follow this up #593.


<property>
<name>zeppelin.jobmanager.enable</name>
<value>true</value>
<description>The Job tab in zeppelin page seems not so useful instead it cost lots of memory and affect the performance. Disable it can save lots of memory</description>
</property>

</configuration>

zeppelinshiro: |
Expand Down
2 changes: 2 additions & 0 deletions deployments/hadoop-yarn/ansible/34-setup-shuffler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

- name: "Fetch the spark-yarn-shuffle jar from one of the master nodes and store it in our /tmp directory"
hosts: master01
become: yes
tasks:
- fetch:
src: /opt/spark/yarn/{{spname}}-yarn-shuffle.jar
Expand All @@ -30,6 +31,7 @@

- name: "Copy Shuffle jar to Hadoop directory on worker & master nodes"
hosts: workers:masters
become: yes
tasks:
- copy:
src: /tmp/{{spname}}-yarn-shuffle.jar
Expand Down
83 changes: 83 additions & 0 deletions deployments/hadoop-yarn/ansible/36-run-benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#
# <meta:header>
# <meta:licence>
# Copyright (c) 2020, ROE (http://www.roe.ac.uk/)
#
# This information is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This information is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# </meta:licence>
# </meta:header>
#

- name: "Get Zeppelin IP Address"
hosts: localhost
vars_files:
- config/ansible.yml
- /tmp/ansible-vars.yml
- config/openstack.yml

tasks:

- name: "Discover our Zeppelin node and store IP address in temp file"
os_server_info:
cloud: "{{ cloudname }}"
server: "{{ deployname }}-zeppelin"
register:
zeppelinnode

- local_action: copy content={{ zeppelinnode.openstack_servers[0].accessIPv4 }} dest=/tmp/zeppelin_ip.txt


- name: "Install and run Python benchmark suite"
hosts: localhost
gather_facts: yes
become: yes
become_method: sudo
vars_files:
- config/ansible.yml
- /tmp/ansible-vars.yml
vars:
zepipaddress: "{{ lookup('file', '/tmp/zeppelin_ip.txt') | trim }}"

tasks:

- name: "Creating our Zeppelin config file"
copy:
dest: "/tmp/user.yml"
content: |
zeppelin_url: http://{{ zepipaddress }}:8080
zeppelin_auth: true
zeppelin_user: gaiauser
zeppelin_password: gaiapass
- name: "Install git"
yum:
name: git
update_cache: yes
state: present

- pip:
name: 'git+https://github.com/wfau/aglais-testing@v0.1.2'
executable: pip

- name: "Creating our Benchmarking script"
copy:
dest: "/tmp/run-test.py"
content: |
import sys
from aglais_benchmark import AglaisBenchmarker
AglaisBenchmarker("/deployments/zeppelin/test/config/notebooks.json", "/tmp/").run(concurrent=False, users=1)
- name: "Run benchmarker"
command: python3 /tmp/run-test.py

10 changes: 5 additions & 5 deletions deployments/hadoop-yarn/ansible/config/cclake-large-06.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ all:

# Hadoop vars

hdname: "hadoop-3.1.3"
hdname: "hadoop-3.2.1"
hdbase: "/opt"
hdhome: "/opt/hadoop"

Expand All @@ -41,8 +41,8 @@ all:

# Spark vars

spname: "spark-2.4.7"
spfull: "spark-2.4.7-bin-hadoop2.7"
spname: "spark-3.1.2"
spfull: "spark-3.1.2-bin-hadoop3.2"
spbase: "/opt"
sphome: "/opt/spark"
sphost: "master01"
Expand Down Expand Up @@ -176,9 +176,9 @@ all:
#mapreduce.reduce.memory.mb = (multiple of yarn.scheduler.minimum-allocation-mb)

# Zeppelin vars
zepname: "zeppelin-0.8.2"
zepname: "zeppelin-0.10.0"
zepbase: "/home/fedora"
zephome: "/home/fedora/zeppelin-0.8.2-bin-all"
zephome: "/home/fedora/zeppelin-0.10.0-bin-all"
zephost: "zeppelin"
zepuser: "fedora"
zepmavendest: "/var/local/zeppelin/maven"
Expand Down
13 changes: 6 additions & 7 deletions deployments/hadoop-yarn/ansible/config/cclake-medium-04.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ all:

# Hadoop vars

hdname: "hadoop-3.1.3"
hdname: "hadoop-3.2.1"
hdbase: "/opt"
hdhome: "/opt/hadoop"

Expand All @@ -39,10 +39,9 @@ all:
hdfsconf: "/var/hdfs/conf"
hdfsuser: "fedora"

# Spark vars

spname: "spark-2.4.7"
spfull: "spark-2.4.7-bin-hadoop2.7"
# Spark vars
spname: "spark-3.1.2"
spfull: "spark-3.1.2-bin-hadoop3.2"
spbase: "/opt"
sphome: "/opt/spark"
sphost: "master01"
Expand Down Expand Up @@ -174,9 +173,9 @@ all:
#mapreduce.reduce.memory.mb = (multiple of yarn.scheduler.minimum-allocation-mb)

# Zeppelin vars
zepname: "zeppelin-0.8.2"
zepname: "zeppelin-0.10.0"
zepbase: "/home/fedora"
zephome: "/home/fedora/zeppelin-0.8.2-bin-all"
zephome: "/home/fedora/zeppelin-0.10.0-bin-all"
zephost: "zeppelin"
zepuser: "fedora"
zepmavendest: "/var/local/zeppelin/maven"
Expand Down
10 changes: 5 additions & 5 deletions deployments/hadoop-yarn/ansible/config/medium-04.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ all:

# Hadoop vars

hdname: "hadoop-3.1.3"
hdname: "hadoop-3.2.1"
hdbase: "/opt"
hdhome: "/opt/hadoop"

Expand All @@ -41,8 +41,8 @@ all:

# Spark vars

spname: "spark-2.4.7"
spfull: "spark-2.4.7-bin-hadoop2.7"
spname: "spark-3.1.2"
spfull: "spark-3.1.2-bin-hadoop3.2"
spbase: "/opt"
sphome: "/opt/spark"
sphost: "master01"
Expand Down Expand Up @@ -174,9 +174,9 @@ all:
#mapreduce.reduce.memory.mb = (multiple of yarn.scheduler.minimum-allocation-mb)

# Zeppelin vars
zepname: "zeppelin-0.8.2"
zepname: "zeppelin-0.10.0"
zepbase: "/home/fedora"
zephome: "/home/fedora/zeppelin-0.8.2-bin-all"
zephome: "/home/fedora/zeppelin-0.10.0-bin-all"
zephost: "zeppelin"
zepuser: "fedora"
zepmavendest: "/var/local/zeppelin/maven"
Expand Down
18 changes: 18 additions & 0 deletions deployments/hadoop-yarn/bin/create-all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@
deployname="${cloudname:?}-$(date '+%Y%m%d')"
deploydate=$(date '+%Y%m%dT%H%M%S')

deploytype="${3:-prod}"

configyml='/tmp/aglais-config.yml'
statusyml='/tmp/aglais-status.yml'
touch "${statusyml:?}"
Expand Down Expand Up @@ -305,3 +307,19 @@

done

# -----------------------------------------------------
# Run Benchmarks

if [[ "$deploytype" == "test" ]]
then

pushd "/deployments/hadoop-yarn/ansible"

ansible-playbook \
--verbose \
--inventory "${inventory:?}" \
"36-run-benchmark.yml"

popd

fi
2 changes: 1 addition & 1 deletion deployments/hadoop-yarn/bin/start-zeppelin.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@

ssh zeppelin \
'
/home/fedora/zeppelin-0.8.2-bin-all/bin/zeppelin-daemon.sh start
/home/fedora/zeppelin-0.10.0-bin-all/bin/zeppelin-daemon.sh start
'

41 changes: 41 additions & 0 deletions deployments/zeppelin/test/config/notebooks.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{
"notebooks" : [
{
"name" : "SetUp",
"filepath" : "https://raw.githubusercontent.com/wfau/aglais-testing/main/notebooks/public_examples/SetUp.json",
"totaltime" : 45,
"results" : []
},
{
"name" : "Mean_proper_motions_over_the_sky",
"filepath" : "https://raw.githubusercontent.com/wfau/aglais-testing/main/notebooks/public_examples/Mean_proper_motions_over_the_sky.json",
"totaltime" : 55,
"results" : []
},
{
"name" : "Source_counts_over_the_sky.json",
"filepath" : "https://raw.githubusercontent.com/wfau/aglais-testing/main/notebooks/public_examples/Source_counts_over_the_sky.json",
"totaltime" : 22,
"results" : []
},
{
"name" : "Good_astrometric_solutions_via_ML_Random_Forrest_classifier",
"filepath" : "https://raw.githubusercontent.com/wfau/aglais-testing/main/notebooks/public_examples/Good_astrometric_solutions_via_ML_Random_Forrest_classifier.json",
"totaltime" : 500,
"results" : []
},
{
"name" : "QC_cuts_dev.json",
"filepath" : "https://raw.githubusercontent.com/wfau/aglais-testing/main/notebooks/public_examples/QC_cuts_dev.json",
"totaltime" : 4700,
"results" : []
},
{
"name" : "WD_detection_dev.json",
"filepath" : "https://raw.githubusercontent.com/wfau/aglais-testing/main/notebooks/public_examples/WD_detection_dev.json",
"totaltime" : 3750,
"results" : []
}

]
}
12 changes: 12 additions & 0 deletions deployments/zeppelin/test/config/notebooks_pi.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"notebooks" : [
{
"name" : "pi_calculation",
"filepath" : "https://raw.githubusercontent.com/wfau/aglais-testing/main/notebooks/pi_calculation.json",
"totaltime" : 160,
"results" : [ "Pi is roughly 3.141854"]

}

]
}
Loading