Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

20211011 zrq hdbscan config #586

Closed
wants to merge 64 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
5d73750
Added some notes on testing with Benchmarking tool
stvoutsin Sep 21, 2021
4ebb9fd
Add some more notes on testing with benchmarking tool
stvoutsin Sep 21, 2021
16fdc28
Added playbook for running a benchmark
stvoutsin Oct 4, 2021
df06564
Add test to deployment script
stvoutsin Oct 4, 2021
946260d
Update create all script, add optional deploytype as param (set as te…
stvoutsin Oct 9, 2021
f5f80f2
Added notes on running a test deploy with benchmarker
stvoutsin Oct 12, 2021
97af358
Merge branch 'wfau:master' into issue-benchmarking
stvoutsin Oct 12, 2021
ae3d5a1
Change repository location of testing tool
stvoutsin Oct 13, 2021
8a3fa33
Fixed Typo
stvoutsin Oct 13, 2021
899754c
Merge branch 'wfau:master' into issue-benchmarking
stvoutsin Oct 13, 2021
4cfc742
Add notes on running a deploy with the automated benchmarking
stvoutsin Oct 14, 2021
a7c4eed
Notes on running a test deploy on large cluster
stvoutsin Oct 14, 2021
18753d7
Update medium test notes
stvoutsin Oct 14, 2021
823c1ad
Update large test notes
stvoutsin Oct 14, 2021
f4b4a50
Notes and tests for HDBSCAN config
Zarquan Oct 15, 2021
70b15aa
Notes and tests for HDBSCAN config
Zarquan Oct 15, 2021
6e2998a
Merge branch '20211011-zrq-hdbscan-config' of github.com:Zarquan/agla…
Zarquan Oct 15, 2021
c5cec9e
Remove deprecated config settings (issue #584)
Zarquan Oct 15, 2021
97ba06d
Upgrade Hadoop / Zeppelin & Spark versions / Fix issue with pip lib i…
stvoutsin Oct 18, 2021
9d1bbc6
Added exclude interpreter setting in Zeppelin (Needed in Zeppelin 0.1…
stvoutsin Oct 18, 2021
42e1e47
Added Benchmark running Ansible script
stvoutsin Oct 18, 2021
c43f0ed
Added notes on testing upgraded version
stvoutsin Oct 18, 2021
5cd5755
Set default interpreter as spark / Write to hadoop dir as root / Excl…
stvoutsin Oct 19, 2021
926facf
Added notes for another test with upgraded versions
stvoutsin Oct 19, 2021
1bc51f9
Change medium-04 config to use upgraded versions
stvoutsin Oct 20, 2021
c190938
Fix Job Manager UI issue in new Zeppelin
stvoutsin Oct 20, 2021
eeeae66
Changes to benchmark setup config based on review
stvoutsin Oct 20, 2021
0d91081
Fixing issue with missing libraries
stvoutsin Oct 20, 2021
2142d65
Added benchmark config to this repo
stvoutsin Oct 20, 2021
412b613
Change location of notebook config for benchmarking to this repo
stvoutsin Oct 20, 2021
58f24e0
Change repo location from stvoutsin to wfau
stvoutsin Oct 20, 2021
8369bc8
Change location of notebook config for benchmarking to this repo
stvoutsin Oct 20, 2021
32d406d
Notes on debugging
Zarquan Oct 20, 2021
bcbf53f
Notes on resource requirements
Zarquan Oct 20, 2021
93f143e
New config naming scheme zeppelin-{cpu}.{mem}-spark-{n}.{cpu}.{mem}.yml
Zarquan Oct 20, 2021
02c0af4
Added missing dependencies / notes on issue
stvoutsin Oct 21, 2021
62aea82
Merge pull request #588 from stvoutsin/issue-lib-dependencies
Zarquan Oct 21, 2021
eff6a4f
Changes to run benchmarker from ansibler client
stvoutsin Oct 22, 2021
f79fcc0
Notes on testing most recent version
stvoutsin Oct 22, 2021
3104073
Merge branch 'wfau:master' into issue-benchmarking
stvoutsin Oct 22, 2021
3531abe
Change pip libs installation to match upstream changes
stvoutsin Oct 22, 2021
8c8a42c
Merge branch 'wfau:master' into issue-upgrade-spark-3
stvoutsin Oct 22, 2021
4108665
Bring in changes from benchmarking branch
stvoutsin Oct 22, 2021
7a3354b
Merge branch 'issue-upgrade-spark-3' of https://github.com/stvoutsin/…
stvoutsin Oct 22, 2021
cc05970
Merge pull request #583 from stvoutsin/issue-benchmarking
Zarquan Dec 1, 2021
f358293
Disable iPython in zeppelin interpreter
stvoutsin Dec 2, 2021
8c90c72
Merge pull request #590 from stvoutsin/issue-upgrade-spark-3
Zarquan Dec 3, 2021
ab159eb
....
Zarquan Dec 3, 2021
68b2732
Notes and tests for HDBSCAN config
Zarquan Oct 15, 2021
a48ae90
Remove deprecated config settings (issue #584)
Zarquan Oct 15, 2021
d6f1629
Notes on debugging
Zarquan Oct 20, 2021
55da716
Notes on resource requirements
Zarquan Oct 20, 2021
81680d1
New config naming scheme zeppelin-{cpu}.{mem}-spark-{n}.{cpu}.{mem}.yml
Zarquan Oct 20, 2021
45e52bb
Merge branch '20211011-zrq-hdbscan-config' of github.com:Zarquan/agla…
Zarquan Dec 3, 2021
3d451f3
Removed accidental commit from master
Zarquan Dec 3, 2021
4234f0b
Notes and tests for HDBSCAN config
Zarquan Oct 15, 2021
dc7ba43
Remove deprecated config settings (issue #584)
Zarquan Oct 15, 2021
8238493
Notes on debugging
Zarquan Oct 20, 2021
5990da4
Notes on resource requirements
Zarquan Oct 20, 2021
03222f6
New config naming scheme zeppelin-{cpu}.{mem}-spark-{n}.{cpu}.{mem}.yml
Zarquan Oct 20, 2021
de428f3
Notes and tests for HDBSCAN config
Zarquan Oct 15, 2021
6209d29
Remove deprecated config settings (issue #584)
Zarquan Oct 15, 2021
f34df2e
New config naming scheme zeppelin-{cpu}.{mem}-spark-{n}.{cpu}.{mem}.yml
Zarquan Oct 20, 2021
d353805
Merge branch '20211011-zrq-hdbscan-config' of github.com:Zarquan/agla…
Zarquan Dec 3, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
#
# <meta:header>
# <meta:licence>
# Copyright (c) 2020, ROE (http://www.roe.ac.uk/)
#
# This information is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This information is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# </meta:licence>
# </meta:header>
#
#

all:

vars:

# Hadoop vars

hdname: "hadoop-3.1.3"
hdbase: "/opt"
hdhome: "/opt/hadoop"

hdconf: "{{hdhome}}/etc/hadoop"
hdhost: "master01"
hduser: "fedora"

# HDFS vars

hdfsconf: "/var/hdfs/conf"
hdfsuser: "fedora"

# Spark vars

spname: "spark-2.4.7"
spfull: "spark-2.4.7-bin-hadoop2.7"
spbase: "/opt"
sphome: "/opt/spark"
sphost: "master01"
spuser: "fedora"

# Flavor sizes

zeppelinflavor: 'gaia.cclake.55vcpu'
masterflavor: 'gaia.cclake.2vcpu'
workerflavor: 'gaia.cclake.27vcpu'

# Flavour values

zeppelinmemory: 92160
zeppelincores: 55

workermemory: 46080
workercores: 27
workercount: 6

# Calculated limits

spminmem: 1024
spmaxmem: "{{workermemory - 1024}}"

spmincores: 1
spmaxcores: "{{workercores}}"


sparkconfig: |

# https://spark.apache.org/docs/latest/configuration.html
# https://spark.apache.org/docs/latest/running-on-yarn.html
# https://stackoverflow.com/questions/37871194/how-to-tune-spark-executor-number-cores-and-executor-memory

spark.master yarn

# Spark config settings calculated using Cheatsheet.xlsx
# https://www.c2fo.io/img/apache-spark-config-cheatsheet/C2FO-Spark-Config-Cheatsheet.xlsx

# https://www.c2fo.io/c2fo/spark/aws/emr/2016/07/06/apache-spark-config-cheatsheet/
# https://github.com/AndresNamm/SparkDebugging/tree/master/ExecutorSizing

# Calculated using Cheatsheet.xlsx
spark.driver.memory 58982m
spark.driver.memoryOverhead 9216
spark.driver.cores 5
spark.driver.maxResultSize 40960m

spark.executor.memory 7168m
spark.executor.memoryOverhead 1024
spark.executor.cores 5
#spark.executor.instances 30

spark.default.parallelism 300
#spark.sql.shuffle.partitions 300

# YARN Application Master settings
spark.yarn.am.memory 2048m
spark.yarn.am.cores 1

spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.dynamicAllocation.minExecutors 1
# spark.executor.instances from Cheatsheet
spark.dynamicAllocation.maxExecutors 30
# maxExecutors / 2
spark.dynamicAllocation.initialExecutors 15
spark.dynamicAllocation.cachedExecutorIdleTimeout 60s
spark.dynamicAllocation.executorIdleTimeout 60s

yarnconfig: |
<!--+
| Maximum limit of memory to allocate to each container request at the Resource Manager.
+-->
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>{{spmaxmem}}</value>
</property>

<!--+
| Minimum limit of memory to allocate to each container request at the Resource Manager.
+-->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>{{spminmem}}</value>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>{{spmincores}}</value>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>{{spmaxcores}}</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>{{spmaxmem}}</value>
</property>

<!--+
| 1:1 -> 1:4 * {{spmaxcores}} based on IO wait
+-->
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>{{spmaxcores}}</value>
</property>

<!--+
| https://stackoverflow.com/questions/38988941/running-yarn-with-spark-not-working-with-java-8
| https://stackoverflow.com/a/39456782
| https://issues.apache.org/jira/browse/YARN-4714
+-->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

#yarn.app.mapreduce.am.resource.mb = (yarn.scheduler.minimum-allocation-mb)
#mapreduce.map.memory.mb = (multiple of yarn.scheduler.minimum-allocation-mb)
#mapreduce.reduce.memory.mb = (multiple of yarn.scheduler.minimum-allocation-mb)

# Zeppelin vars
zepname: "zeppelin-0.8.2"
zepbase: "/home/fedora"
zephome: "/home/fedora/zeppelin-0.8.2-bin-all"
zephost: "zeppelin"
zepuser: "fedora"
zepmavendest: "/var/local/zeppelin/maven"

hosts:

zeppelin:
login: 'fedora'
image: 'Fedora-30-1.2'
flavor: "{{zeppelinflavor}}"
discs:
- type: 'local'
format: 'ext4'
mntpath: "/mnt/local/vdb"
devname: 'vdb'
- type: 'cinder'
size: 1024
format: 'btrfs'
mntpath: "/mnt/cinder/vdc"
devname: 'vdc'
paths:
# Empty on Zeppelin
hddatalink: "/var/hadoop/data"
hddatadest: "/mnt/local/vdb/hadoop/data"
# Empty on Zeppelin
hdtemplink: "/var/hadoop/temp"
hdtempdest: "/mnt/local/vdb/hadoop/temp"
# Empty on Zeppelin
hdlogslink: "/var/hadoop/logs"
hdlogsdest: "/mnt/local/vdb/hadoop/logs"
# Used on Zeppelin
sptemplink: "/var/spark/temp"
sptempdest: "/mnt/cinder/vdc/spark/temp"

monitor:
login: 'fedora'
image: 'Fedora-30-1.2'
flavor: 'gaia.cclake.2vcpu'
discs: []

children:

masters:
hosts:
master[01:01]:
vars:
login: 'fedora'
image: 'Fedora-30-1.2'
flavor: "{{masterflavor}}"
discs: []
paths:
# Empty on master
hddatalink: "/var/hadoop/data"
hddatadest: "/mnt/local/vda/hadoop/data"
# Used on master
# /var/hadoop/temp/dfs/namesecondary/current/
hdtemplink: "/var/hadoop/temp"
hdtempdest: "/mnt/local/vda/hadoop/temp"
# Used on master
hdlogslink: "/var/hadoop/logs"
hdlogsdest: "/mnt/local/vda/hadoop/logs"
# Used on master
# /var/hdfs/meta/namenode/fsimage/current/
hdfsmetalink: "/var/hdfs/meta"
hdfsmetadest: "/mnt/local/vda/hadoop/meta"

workers:
hosts:
worker[01:06]:
vars:
login: 'fedora'
image: 'Fedora-30-1.2'
flavor: "{{workerflavor}}"
discs:
- type: 'local'
format: 'ext4'
mntpath: "/mnt/local/vdb"
devname: 'vdb'
- type: 'cinder'
size: 1024
format: 'btrfs'
mntpath: "/mnt/cinder/vdc"
devname: 'vdc'
paths:
# Used on workers
hddatalink: "/var/hadoop/data"
hddatadest: "/mnt/local/vdb/hadoop/data"
# Used on workers
# /var/hadoop/temp/nm-local-dir/
hdtemplink: "/var/hadoop/temp"
hdtempdest: "/mnt/local/vdb/hadoop/temp"
# Used on workers
hdlogslink: "/var/hadoop/logs"
hdlogsdest: "/mnt/local/vdb/hadoop/logs"
# Empty on workers
hdfslogslink: "/var/hdfs/logs"
hdfslogsdest: "/mnt/local/vdb/hdfs/logs"
# Empty on workers
hdfsdatalink: "/var/hdfs/data"
hdfsdatadest: "/mnt/cinder/vdc/hdfs/data"

7 changes: 5 additions & 2 deletions notes/zrq/20211007-02-slack-export.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,11 @@
#zrq-notes-zeppelin
#

https://github.com/ErikKalkoken/slackchannel2pdf
How to export our data out of Slack ...

https://github.com/ErikKalkoken/slackchannel2pdf

https://webapps.stackexchange.com/questions/130485/how-to-export-slack-conversation-thread-without-admin-account

https://webapps.stackexchange.com/questions/130485/how-to-export-slack-conversation-thread-without-admin-account


Loading