Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Backend + SDK): Update kfp backend and kubernetes sdk to allow enabling shared memory #10704

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hsteude
Copy link
Contributor

@hsteude hsteude commented Apr 17, 2024

This PR adds a method called 'enable_shared_memory' to the kubernetes_platform python sdk.

Why?
Without this, we can't train PyTorch models using dataloaders with multiple workers (see #9893).

Next Steps:
This PR has to be rebased once #10703 is merged. We'll also need to update the go.mod, go.sum, and the license CSV files.

Checklist:

Copy link

Hi @hsteude. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hsteude hsteude force-pushed the feature/support_shm_in_backend_and_sdk branch from 95fb2ae to 6e158ce Compare April 17, 2024 19:06
@hsteude hsteude changed the title feat(Backend + SDK): Update kfp backend and kubernetes sdk to allow enabling shared memory, fixes #9893 feat(Backend + SDK): Update kfp backend and kubernetes sdk to allow enabling shared memory Apr 17, 2024
@hsteude hsteude marked this pull request as draft April 17, 2024 19:51
@rimolive
Copy link
Member

@hsteude Can you please add this PR as a topic to the next Pipelines WG meeting?

@hsteude hsteude force-pushed the feature/support_shm_in_backend_and_sdk branch 4 times, most recently from 1916727 to 9268be2 Compare April 24, 2024 13:17
@hsteude hsteude force-pushed the feature/support_shm_in_backend_and_sdk branch 2 times, most recently from b741f97 to 0549ca3 Compare May 10, 2024 11:18
@hsteude hsteude marked this pull request as ready for review May 10, 2024 11:38
@hsteude
Copy link
Contributor Author

hsteude commented May 10, 2024

@chensun: Thanks for merging #10703. Could you also take a look at this one, please?

@qadeem-qureshi
Copy link

Any update on this? Would be really nice to have.

@rimolive
Copy link
Member

cc @chensun @zijianjoy @james-jwu

@hbelmiro
Copy link
Contributor

/ok-to-test

Copy link

@hsteude: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipeline-upgrade-test 0549ca3 link false /test kubeflow-pipeline-upgrade-test
kubeflow-pipelines-samples-v2 0549ca3 link false /test kubeflow-pipelines-samples-v2
kfp-kubernetes-execution-tests 0549ca3 link false /test kfp-kubernetes-execution-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hsteude
Copy link
Contributor Author

hsteude commented Jul 11, 2024

@rimolive @chensun @zijianjoy @james-jwu Would really appreciate a review on this.

@sanchesoon
Copy link

Hi all, any plans to merge it ?

@hbelmiro
Copy link
Contributor

hbelmiro commented Aug 9, 2024

@hsteude can you please rebase so you get the latest changes on tests?

@hsteude hsteude force-pushed the feature/support_shm_in_backend_and_sdk branch from 0549ca3 to 960a965 Compare August 9, 2024 14:27
Signed-off-by: hsteude <henrik.steude@prokube.ai>
Signed-off-by: hsteude <henrik.steude@prokube.ai>
@hsteude hsteude force-pushed the feature/support_shm_in_backend_and_sdk branch from 960a965 to 9213ec7 Compare September 20, 2024 15:02
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chensun for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@hbelmiro hbelmiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@hbelmiro
Copy link
Contributor

@HumairAK do you have this on your plate?

@hbelmiro
Copy link
Contributor

@HumairAK @chensun
Can we merge this if it makes sense to you? Otherwise, could you please share your feedback so the author can work on it on time for KFP 2.4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants