Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client-side computation of cached steps #3068

Merged
merged 57 commits into from
Oct 17, 2024

Conversation

schustmi
Copy link
Contributor

@schustmi schustmi commented Oct 9, 2024

Describe changes

This PR adds functionality to compute cached steps client-side. This reduces the time/money spent waiting for remote orchestrators, which can often take a long time to spin up a step container only for it to immediately shut down because the step was cached.

Some things were removed as part of this PR to make the model linking more consistent for cached and non-cached steps:

  • The deprecated functionality of using an external artifact to load an artifact from a model version
  • The ability to specify a model version for a specific step output in the ArtifactConfig

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
  • If my changes require changes to the dashboard, these changes are communicated/requested.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

@github-actions github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels Oct 9, 2024
src/zenml/orchestrators/step_runner.py Outdated Show resolved Hide resolved
src/zenml/orchestrators/step_run_utils.py Outdated Show resolved Hide resolved
src/zenml/orchestrators/step_run_utils.py Outdated Show resolved Hide resolved
src/zenml/orchestrators/step_run_utils.py Outdated Show resolved Hide resolved
@schustmi schustmi force-pushed the feature/PRD-657-precompute-cached-steps branch from 06341ec to 661869f Compare October 10, 2024 12:17
Copy link
Contributor

E2E template updates in examples/e2e have been pushed.

Copy link
Contributor

Classification template updates in examples/mlops_starter have been pushed.

Copy link
Contributor

@AlexejPenner AlexejPenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦭

Copy link
Contributor

@avishniakov avishniakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, but nothing critical - well done!

src/zenml/artifacts/artifact_config.py Show resolved Hide resolved
Comment on lines +47 to +50
data.pop("name", None)
data.pop("version", None)
data.pop("model", None)
return data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like older pipelines might fail silently downstream because of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case would they fail?

I'm guessing this would only be possible if upgrading the server to the new release while still having an old scheduled pipeline running that uses a Docker image of an older version? The server would then remove those attributes before sending the response.

Would that case be fixed if I just don't remove this data here and add extra=allow in the pydantic config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it would be better just to give some instructions on how to fix this and write that deprecation clearly in the new release notes. I don't ask for any technical solution here - these guys should have been gone for a while already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that these attributes will already be removed on the server side, which means the warning/instructions will be very hidden. I instead added some instructions on the runtime error below that happens if the ID is missing.

src/zenml/model/utils.py Outdated Show resolved Hide resolved
src/zenml/model/utils.py Outdated Show resolved Hide resolved
src/zenml/orchestrators/step_run_utils.py Show resolved Hide resolved
src/zenml/orchestrators/step_runner.py Outdated Show resolved Hide resolved
src/zenml/steps/step_context.py Outdated Show resolved Hide resolved
schustmi and others added 2 commits October 16, 2024 16:50
Co-authored-by: Andrei Vishniakov <31008759+avishniakov@users.noreply.github.com>
Copy link
Contributor

@avishniakov avishniakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦭

@schustmi schustmi merged commit 3f0699f into develop Oct 17, 2024
73 of 75 checks passed
@schustmi schustmi deleted the feature/PRD-657-precompute-cached-steps branch October 17, 2024 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change enhancement New feature or request internal To filter out internal PRs and issues run-slow-ci
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants