-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PySpark with RasterFrames Support profile #54
base: main
Are you sure you want to change the base?
Conversation
Cleanup Spark cpu pools Expose SparkUI in the spawned notebooks, add more details into the PySpark profile description Add the Spark env init script Add Spark executors template to support nodes tolerations Allocate NoteBooks per namespace Add hub and proxy ClusterRoles Make AKS RBAC enabled Rename ensure_service_account function Change the spark executor image Expose the Spark UI in the pyspark profile case Cleanup hub and autohttps ClusterRoles
@@ -85,10 +85,17 @@ daskhub: | |||
c.KubeSpawner.extra_labels = {} | |||
kubespawner: | | |||
c.KubeSpawner.start_timeout = 15 * 60 # 15 minutes | |||
# pass the parent namespace through, needed for pre_spawn_hook to copy resources | |||
c.KubeSpawner.environment['NAMESPACE_PARENT'] = c.KubeSpawner.namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: try the template replacement instead of setting the parent namespace variable.
@@ -147,13 +243,31 @@ daskhub: | |||
|
|||
c.KubeSpawner.pre_spawn_hook = pre_spawn_hook | |||
|
|||
# it is the spawner post stop hook, not related to the notebook lifecycle | |||
# we don't need it | |||
post_stop_hook: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need it here, I will remove it unless we want to keep it here for fun cc @TomAugspurger
@@ -23,6 +23,7 @@ module "resources" { | |||
jupyterhub_singleuser_image_name = "pcccr.azurecr.io/public/planetary-computer/python" | |||
jupyterhub_singleuser_image_tag = "2022.01.17.0" | |||
python_image = "pcccr.azurecr.io/public/planetary-computer/python:2022.01.17.0" | |||
pyspark_image = "daunnc/planetary-computer-pyspark:2021.11.29.0-gdal3.4-3.1-rf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: make PRs against MSFTPC repos with containers.
'spark.driver.memory': '1g', | ||
'spark.executor.cores': '3', | ||
'spark.kubernetes.namespace': namespace_user, | ||
'spark.kubernetes.container.image': 'quay.io/daunnc/spark-k8s-py-3.8.8-gdal32-msftpc:3.1.2', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be replaced by the MSFTPC containers.
namespace_user = os.environ.get('NAMESPACE_USER', '') | ||
spark_config = { | ||
'spark.master': 'k8s://https://kubernetes.default.svc.cluster.local', | ||
'spark.app.name': 'STAC API with RF in K8S', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Spark app name should probably be picked up from the Notebook name / or used the default one (more Microsofty).
default: "True" | ||
description: '4 cores, 32 GiB of memory. <a href="https://github.com/pangeo-data/pangeo-docker-images" target="_blank">Pangeo Notebook</a> environment powered by <a href="https://rasterframes.io/">Raster Frames</a>, <a href="http://geotrellis.io/">GeoTrellis</a> and <a href="https://spark.apache.org/">Apache Spark</a>.' | ||
kubespawner_override: | ||
image: "${pyspark_image}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the only difference between PySpark and Python images is in the amount of the underlying deps.
8da4d5f
to
c4905a4
Compare
This PR adjusts the terraform deployment as well as helmcharts to support the PySpark profile.
Some significant improvements achieved while doing it:
TODO: