Upgrade Spark / Hadoop / Zeppelin (Issue #535) #590

stvoutsin · 2021-10-22T15:55:46Z

Description
This PR upgrades the versions of our components as described below:

Zeppelin 0.10.0
Hadoop 3.2.1
Spark 3.1.2

What Issue is this related to:
#535

What type of PR is it?
Upgrade

Has this been tested:
Yes. This was tested using the Benchmarking suite that was introduced here: #583
Probably shouldn't have included the same change here, the reason for including them in the first place is that it was used for testing this branch, and the notes are based on a version that includes the benchmarker.

…nstallation

…0.0 to exclude interp.)

…ude all unsupported interpreters

…aglais into issue-upgrade-spark-3

Zarquan · 2021-12-03T12:27:35Z

deployments/hadoop-yarn/ansible/27-install-zeppelin.yml

+                <property>
+                    <name>zeppelin.interpreter.exclude</name>
+                    <value>angular,livy,alluxio,file,psql,flink,ignite,lens,cassandra,geode,kylin,elasticsearch,scalding,jdbc,hbase,bigquery,beam,groovy,flink-cmd,hazelcastjet,influxdb,java,jupyter,kotlin,ksql,mongodb,neo4j,pig,r,sap,spark-submit,sparql,submarine</value>
+                    <description>All the inteprreters that you would like to exclude. You can only specify either 'zeppelin.interpreter.include' or 'zeppelin.interpreter.exclude'. Specifying them together is not allowed.</description>
+                </property>


Why are these excluded ?
In might be better to explicitly list the interpreters we do want in zeppelin.interpreter.include rather than excluding an arbitrary list in zeppelin.interpreter.exclude.

I've created an issue to follow this up #593.

Zarquan · 2021-12-03T12:29:48Z

notes/stv/20211013-Spark3-zeppelin-0.10.0-permission-issue.txt

+# After some investigation, it looks like the new Zeppelin runs Spark jobs as the logged in Zeppelin user, and fails because it lacks permission.
+# Turn this off for now, so that everything is sent as the main Zeppelin user (After this change Spark notebooks work)


How do we turn this ON/OFF ?

We will need to re-visit this.Created a new issue to follow this up #594.

Zarquan

Looks good, tests pass, go for it.

stvoutsin added 13 commits October 18, 2021 16:49

Upgrade Hadoop / Zeppelin & Spark versions / Fix issue with pip lib i…

97ba06d

…nstallation

Added exclude interpreter setting in Zeppelin (Needed in Zeppelin 0.1…

9d1bbc6

…0.0 to exclude interp.)

Added Benchmark running Ansible script

42e1e47

Added notes on testing upgraded version

c43f0ed

Set default interpreter as spark / Write to hadoop dir as root / Excl…

5cd5755

…ude all unsupported interpreters

Added notes for another test with upgraded versions

926facf

Change medium-04 config to use upgraded versions

1bc51f9

Fix Job Manager UI issue in new Zeppelin

c190938

Change pip libs installation to match upstream changes

3531abe

Merge branch 'wfau:master' into issue-upgrade-spark-3

8c8a42c

Bring in changes from benchmarking branch

4108665

Merge branch 'issue-upgrade-spark-3' of https://github.com/stvoutsin/…

7a3354b

…aglais into issue-upgrade-spark-3

Disable iPython in zeppelin interpreter

f358293

Zarquan reviewed Dec 3, 2021

View reviewed changes

Zarquan mentioned this pull request Dec 3, 2021

Revisit Zeppelin user permissions #594

Closed

Zarquan approved these changes Dec 3, 2021

View reviewed changes

Zarquan merged commit 8c90c72 into wfau:master Dec 3, 2021

Zarquan mentioned this pull request Dec 6, 2021

Change test control #601

Closed

stvoutsin deleted the issue-upgrade-spark-3 branch June 3, 2022 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade Spark / Hadoop / Zeppelin (Issue #535) #590

Upgrade Spark / Hadoop / Zeppelin (Issue #535) #590

stvoutsin commented Oct 22, 2021

Zarquan Dec 3, 2021

Zarquan Dec 3, 2021

Zarquan Dec 3, 2021

Zarquan left a comment

		# After some investigation, it looks like the new Zeppelin runs Spark jobs as the logged in Zeppelin user, and fails because it lacks permission.
		# Turn this off for now, so that everything is sent as the main Zeppelin user (After this change Spark notebooks work)

Upgrade Spark / Hadoop / Zeppelin (Issue #535) #590

Upgrade Spark / Hadoop / Zeppelin (Issue #535) #590

Conversation

stvoutsin commented Oct 22, 2021

Zarquan Dec 3, 2021

Choose a reason for hiding this comment

Zarquan Dec 3, 2021

Choose a reason for hiding this comment

Zarquan Dec 3, 2021

Choose a reason for hiding this comment

Zarquan left a comment

Choose a reason for hiding this comment