-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skypilot with Kubernetes #3033
Skypilot with Kubernetes #3033
Conversation
This commit adds the Skypilot Kubernetes integration to enable remote orchestration of ZenML pipelines on VMs. The integration includes the necessary initialization, configuration, and flavor classes. It also adds the Skypilot Kubernetes orchestrator and its settings. This integration provides an alternative to the local orchestrator for running pipelines on Kubernetes.
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughWalkthroughThe changes introduce a new Kubernetes orchestrator for the SkyPilot integration within the ZenML framework. This includes comprehensive documentation updates, the addition of new classes and constants, and enhancements to existing orchestrator functionalities. The new orchestrator facilitates the execution of ZenML pipelines on virtual machines in Kubernetes environments, with specific configurations and settings tailored for this deployment model. Changes
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
…otKubernetesOrchestrator prepare_environment_variable method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to my comments, we also need to add docs :-)
src/zenml/integrations/skypilot/orchestrators/skypilot_base_vm_orchestrator.py
Outdated
Show resolved
Hide resolved
"DOCKER_USERNAME": docker_username, | ||
"DOCKER_PASSWORD": docker_password, | ||
} | ||
task_envs["DOCKER_USERNAME"] = (docker_username,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a tuple now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not I haven't pushed the latest stuff
|
||
# Run the entire pipeline | ||
|
||
# Set the service connector AWS profile ENV variable | ||
self.prepare_environment_variable(set=True) | ||
|
||
try: | ||
if isinstance(self.cloud, sky.clouds.Kubernetes): | ||
image = image | ||
run_command = f"/opt/venv/bin/{entrypoint_str} {arguments_str}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a weird entrypoint.. have you tested it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is a tricky one, Not sure what would be the exact solution long term but if we give skypilot the entrypoint
it would use the default Python which is Miniconada but we want to use the one we have already set when building docker image in pipeline run and that python version is in /opt/venv/bin/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then i would at lest make it configurable from the outside in case something weird changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just something weird, but anyone using a custom image which does not magically use the same venv.
What is skypilot doing exactly? Because just calling python
should just use the venv/system python that's active, or do they do some magic to prevent that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They actually do some magic which basically forces us to either figure out which env to use or stop building the image as it is and use skypilot image to build on top of it and consider it like lightning studio where we only run command afterward, @htahir1 @michael-zenml which means stop building docker image for this integration and have pip install and zenml install a commands that get's passed to skypilot setup argument
src/zenml/integrations/skypilot/orchestrators/skypilot_base_vm_orchestrator.py
Show resolved
Hide resolved
src/zenml/integrations/skypilot/orchestrators/skypilot_base_vm_orchestrator.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/skypilot_kubernetes/orchestrators/skypilot_kubernetes_vm_orchestrator.py
Outdated
Show resolved
Hide resolved
...zenml/integrations/skypilot_kubernetes/flavors/skypilot_orchestrator_kubernetes_vm_flavor.py
Show resolved
Hide resolved
…otKubernetesOrchestrator prepare_environment_variable method
src/zenml/integrations/skypilot/orchestrators/skypilot_base_vm_orchestrator.py
Outdated
Show resolved
Hide resolved
|
||
# Run the entire pipeline | ||
|
||
# Set the service connector AWS profile ENV variable | ||
self.prepare_environment_variable(set=True) | ||
|
||
try: | ||
if isinstance(self.cloud, sky.clouds.Kubernetes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might seem mostly cosmetic, but I'll say it anyway: you're adding implementation specific conditions in the base class, which is supposed to be implementation agnostic. The correct way to do this is through static attributes, properties or methods that the implementations can override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well i think i want to re-work the whole Skypilot integration if possible so instead of have a base abstraction and many sub integration which blocks one of the most important features of skypilot cross-cloud/env failover
, tho I totally agree with your comment
src/zenml/integrations/skypilot_kubernetes/orchestrators/skypilot_kubernetes_vm_orchestrator.py
Outdated
Show resolved
Hide resolved
…_orchestrator.py Co-authored-by: Stefan Nica <stefan@zenml.io>
…l-io/zenml into feature/skypilot-kubernetes
…l-io/zenml into feature/skypilot-kubernetes
…ent path for Kubernetes
|
||
# Run the entire pipeline | ||
|
||
# Set the service connector AWS profile ENV variable | ||
self.prepare_environment_variable(set=True) | ||
|
||
try: | ||
if isinstance(self.cloud, sky.clouds.Kubernetes): | ||
run_command = f"$VIRTUAL_ENV{entrypoint_str} {arguments_str}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't work right, $VIRTUAL_ENV
is for example /opt/venv
and the python is in /opt/venv/bin/python
. So this needs the bin
directory to work, and then also a fallback in case $VIRTUAL_ENV
is not set.
And we need to make sure this is actually working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is important @safoinme
Raises: | ||
ValueError: If no service connector is found. | ||
""" | ||
connector = self.get_connector() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a misuse of the method, this isn't actually preparing any environment variables.
Maybe this should be in SkypilotKubernetesOrchestrator.prepare_pipeline_deployment(...)
? This method gets called for each stack component before a pipeline run
@coderabbitai summary |
Actions performedSummary regeneration triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved barring michaels comment
|
||
# Run the entire pipeline | ||
|
||
# Set the service connector AWS profile ENV variable | ||
self.prepare_environment_variable(set=True) | ||
|
||
try: | ||
if isinstance(self.cloud, sky.clouds.Kubernetes): | ||
run_command = f"${{VIRTUAL_ENV:+$VIRTUAL_ENV/bin/}}{entrypoint_str} {arguments_str}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which shell should this run in? Doesn't work for me somehow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it's the double brackets because of the f-string, my bad. If you tested this and it works, feel free to merge.
Describe changes
I implemented/fixed _ to achieve _.
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes
Summary by CodeRabbit
Release Notes
New Features
SkypilotKubernetesIntegration
to enhance ZenML's integration capabilities.SkypilotKubernetesOrchestrator
for managing pipeline execution in Kubernetes environments.Documentation
Bug Fixes