[query] Upgrade spark to 3.3.0 and dataproc to 2.1 #12701

daniel-goldstein · 2023-02-15T22:44:20Z

CHANGELOG: Query on Spark now officially supports Spark 3.3.0 and Dataproc 2.1.x

Tested on dataproc via make -C hail test-dataproc-37. Updating the dependencies introduced a few new linting checks that I fixed here. Updating pyspark necessitated a couple of changes, namely a different py4j jar and they removed SparkSession._wrapped (but maybe we didn't need that anyway? not sure). Most importantly, the newer spark version brings with it a newer jackson version which is sufficient for the azure-storage-blob dependency, meaning we don't need to build against two different spark versions for spark and batch.

danking · 2023-02-16T22:01:06Z

hail/python/hailtop/hailctl/auth/create_user.py

@@ -20,25 +20,22 @@ def init_parser(parser):


 async def async_main(args):


why make these changes? Do we report the username in the underlying exception?

The linter complained about using too generic an error class, it wants us to subclass Exception. I removed the try except because I felt like it wasn't actually any more helpful than whatever stack trace was underneath, but I didn't think about the username not being in there. Can add that back in if you want.

danking · 2023-02-16T22:02:07Z

hail/python/requirements.txt

@@ -15,6 +15,6 @@ parsimonious<0.9
 plotly>=5.5.0,<5.11
 protobuf==3.20.2
 PyJWT
-pyspark>=3.1.1,<3.2.0
+pyspark==3.3.0


Seems reasonable to do pyspark>=3.3,<3.4?

I guess I was distrusting of pyspark and wanted the exact same version that there is in dataproc, but maybe that is unfounded.

addressed

daniel-goldstein force-pushed the dataproc-2-1 branch from 15d1aef to d8c3a23 Compare February 16, 2023 01:49

daniel-goldstein added 3 commits February 16, 2023 14:11

[query] Upgrade spark to 3.3.0 and dataproc to 2.1

4c0f59f

lint

1b3d3ad

lint

ba6e1d8

daniel-goldstein force-pushed the dataproc-2-1 branch from 5cb38e9 to ba6e1d8 Compare February 16, 2023 19:34

daniel-goldstein marked this pull request as ready for review February 16, 2023 19:41

update zstd-jni

4c6dc3a

daniel-goldstein assigned danking Feb 16, 2023

danking previously requested changes Feb 16, 2023

View reviewed changes

daniel-goldstein added 2 commits February 16, 2023 17:37

trust pyspark on patch versions

2800acf

create specific exceptions to appease pylint

614796a

danking approved these changes Feb 16, 2023

View reviewed changes

danking merged commit 75f351d into hail-is:main Feb 17, 2023

daniel-goldstein mentioned this pull request Mar 8, 2023

[release] 0.2.110 #12770

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[query] Upgrade spark to 3.3.0 and dataproc to 2.1 #12701

[query] Upgrade spark to 3.3.0 and dataproc to 2.1 #12701

daniel-goldstein commented Feb 15, 2023 •

edited

Loading

danking Feb 16, 2023

daniel-goldstein Feb 16, 2023

danking Feb 16, 2023

daniel-goldstein Feb 16, 2023

		@@ -20,25 +20,22 @@ def init_parser(parser):


		async def async_main(args):

[query] Upgrade spark to 3.3.0 and dataproc to 2.1 #12701

[query] Upgrade spark to 3.3.0 and dataproc to 2.1 #12701

Conversation

daniel-goldstein commented Feb 15, 2023 • edited Loading

danking Feb 16, 2023

Choose a reason for hiding this comment

daniel-goldstein Feb 16, 2023

Choose a reason for hiding this comment

danking Feb 16, 2023

Choose a reason for hiding this comment

daniel-goldstein Feb 16, 2023

Choose a reason for hiding this comment

daniel-goldstein commented Feb 15, 2023 •

edited

Loading