Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix default suspension of Deployment/StatefulSet with Kueue-managed owners #3803

Merged
merged 8 commits into from
Dec 11, 2024

Conversation

dgrove-oss
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adjusts default suspension logic in Deployment/StatefulSet webhooks to correctly handle the case where they
are children of another Kueue-managed Kind.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Based on a suggestion in abandoned PR to allow us to avoid needing a full-blown replicaset integration.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. release-note-none Denotes a PR that doesn't merit a release note. labels Dec 10, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 10, 2024
Copy link

netlify bot commented Dec 10, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 47a8aaf
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6759034d12e34200085a64ed

@dgrove-oss
Copy link
Contributor Author

dgrove-oss commented Dec 10, 2024

/hold
I did initial testing with some e2e scenarios with appwrappers already. I want to work through a few more cases to make sure there aren't any surprises. Except to finish that today.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 10, 2024
@dgrove-oss
Copy link
Contributor Author

/unhold I've tested various combinations of deployments, statefulsets, and appwrappers in both managed and unmanaged namespaces. Everything appears to be working reasonably.

@dgrove-oss
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 11, 2024
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, the main improvement could be to eliminate the hard-coding of special cases.

@@ -263,6 +263,10 @@ func GetIntegration(name string) (IntegrationCallbacks, bool) {
return manager.get(name)
}

func IsIntegrationEnabled(name string) bool {
return manager.enabledIntegrations.Has(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requires a read lock, because integrations can be added during runtime. We may also be useful to return the GVK (renaming to e.g. GetEnabledIntegrationGVK) to avoid hard-coding (see another comment).

@@ -47,6 +49,7 @@ const (
ManagedLabelKey = constants.ManagedByKueueLabel
ManagedLabelValue = "true"
PodFinalizer = ManagedLabelKey
SuspendedByParentLabelKey = "kueue.x-k8s.io/pod-suspending-parent"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be an annotation? If so I would prefer annotation as labels are more user-facing (they can be used for filtering), while annotations are more for internal framework use, which is the case here.

Also, please add the label / annotation to the Pod Group KEP in the serving section: https://github.com/kubernetes-sigs/kueue/tree/main/keps/976-plain-pods#serving-workload.

}
owner = metav1.GetControllerOf(rs)
if owner != nil {
if owner.Kind == "Deployment" && owner.APIVersion == "apps/v1" && jobframework.IsIntegrationEnabled("deployment") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this out to the same level where StatefulSet is considered, for symmetry. Then move the comment about the Special case for Deployment down just above the if. I also believe we can eliminate the special cases alltogether by the lookup as suggested in the other comment.

}
// Special case for StatefulSet integration which intentionally does not register an IsManagingObjectsOwner
// function so that it can use the standaloneJob path in the GenericReconciler for its Pods.
if owner.Kind == "StatefulSet" && owner.APIVersion == "apps/v1" && jobframework.IsIntegrationEnabled("statefulset") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I would prefer less hard-coding here.

  • the "statefulset" also does not need to be hard-coded it is the value of pod.pod.GetLabels()[SuspendedByParentLabelKey].
  • the "StatefulSet" and "APIVersion", we can read them from GVK of the integration. Example here. For that we need to change the function IsIntegrationEnabled to enabledIntegrationGVK() which would return the GVK if enabled.

@mimowo mimowo mentioned this pull request Dec 11, 2024
34 tasks
@mimowo
Copy link
Contributor

mimowo commented Dec 11, 2024

/lgtm
/approve
Since this PR was extensively tested manually per #3803 (comment) I'm good to go with it. The main concern is the hard-coded lookup of the integrations. I believe this is something we can clean up in a follow up @dgrove-oss.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 11, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9468205bb09cc19bffa05a64cbb3fde6c293a7d5

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrove-oss, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 11, 2024
@k8s-ci-robot k8s-ci-robot merged commit 36396e7 into kubernetes-sigs:main Dec 11, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.10 milestone Dec 11, 2024
@dgrove-oss dgrove-oss deleted the 3589-pod-owner branch December 11, 2024 18:44
kannon92 added a commit to openshift/kubernetes-sigs-kueue that referenced this pull request Dec 13, 2024
* chore: use variadic function to set levels in TopologyWrapper (kubernetes-sigs#3744)

* Use QueueNameForObject in StatefulSet Webhook. (kubernetes-sigs#3752)

* TAS: Support rank-based ordering for StatefulSet. (kubernetes-sigs#3751)

* TAS: Support rank-based ordering for StatefulSet.

* Review remarks.

* Sort labels and annotations. (kubernetes-sigs#3753)

* document manageJobsNamespaceSelector (kubernetes-sigs#3748)

* document manageJobsNamespaceSelector

* Apply suggestions from code review

Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>

* use full name for queue-name label

---------

Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>

* performance optimization: only copy back when changes are made (kubernetes-sigs#3766)

* managedJobsNamespaceSelector for Deployment, StatefulSets, and Pods (kubernetes-sigs#3765)

* managedJobsNamespaceSelector for deployment and statefulset

* normalize manageJobsWithoutQueueName logic in pod webhook

* refactor to use common helper function

* unit tests for WorkloadShouldBeSuspended

* linter fixes

* add test for job owned by kueue managed parent

* bug fix in deployment/statefulset webhooks

* Bump github.com/mikefarah/yq/v4 in /hack/internal/tools (kubernetes-sigs#3768)

Bumps [github.com/mikefarah/yq/v4](https://github.com/mikefarah/yq) from 4.44.5 to 4.44.6.
- [Release notes](https://github.com/mikefarah/yq/releases)
- [Changelog](https://github.com/mikefarah/yq/blob/master/release_notes.txt)
- [Commits](mikefarah/yq@v4.44.5...v4.44.6)

---
updated-dependencies:
- dependency-name: github.com/mikefarah/yq/v4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove Denkensk from Kueue reviewers (kubernetes-sigs#3771)

* Replace time.Now() with clock.Now() in pkg/core (kubernetes-sigs#3773)

* add clock field to ClusterQueueReconciler
and propagate clock into it

* replace time.Now with Now() in ClusterQueueReconciler

* fix lint errors

* Implement default LocalQueue (kubernetes-sigs#3610)

* Make sure test does not flake. (kubernetes-sigs#3734)

* Refresh Kueue roadmap on the main page for 2025 (kubernetes-sigs#3763)

* Refresh Kueue roadmap on the main page

* Graduate the API to v1

* Review remarks

* use node 22 instead of node 16 (kubernetes-sigs#3777)

* README.md: Fix some minor typos (kubernetes-sigs#3781)

Fix some minor typos in README.md.

Signed-off-by: David Weinehall <david.weinehall@intel.com>

* Add documentation for ProvReqRetry mechanism (kubernetes-sigs#3774)

* Add doc for ProvReqRetry mechanism

* Fix formula

* Delete feature gate note as it's deprecated

* Configure dependabot for kueue-viz. (kubernetes-sigs#3780)

* Bump node from 22 to 23 in /cmd/experimental/kueue-viz/frontend (kubernetes-sigs#3784)

Bumps node from 22 to 23.

---
updated-dependencies:
- dependency-name: node
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/onsi/gomega from 1.36.0 to 1.36.1 (kubernetes-sigs#3786)

Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.36.0 to 1.36.1.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.36.0...v1.36.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump react-toastify in /cmd/experimental/kueue-viz/frontend (kubernetes-sigs#3787)

Bumps [react-toastify](https://github.com/fkhadra/react-toastify) from 9.1.3 to 10.0.6.
- [Release notes](https://github.com/fkhadra/react-toastify/releases)
- [Commits](fkhadra/react-toastify@v9.1.3...v10.0.6)

---
updated-dependencies:
- dependency-name: react-toastify
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump react-chartjs-2 in /cmd/experimental/kueue-viz/frontend (kubernetes-sigs#3788)

Bumps [react-chartjs-2](https://github.com/reactchartjs/react-chartjs-2) from 4.3.1 to 5.0.0.
- [Release notes](https://github.com/reactchartjs/react-chartjs-2/releases)
- [Changelog](https://github.com/reactchartjs/react-chartjs-2/blob/master/CHANGELOG.md)
- [Commits](reactchartjs/react-chartjs-2@v4.3.1...v5.0.0)

---
updated-dependencies:
- dependency-name: react-chartjs-2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/gohugoio/hugo in /hack/internal/tools (kubernetes-sigs#3789)

Bumps [github.com/gohugoio/hugo](https://github.com/gohugoio/hugo) from 0.139.3 to 0.139.4.
- [Release notes](https://github.com/gohugoio/hugo/releases)
- [Changelog](https://github.com/gohugoio/hugo/blob/master/hugoreleaser.toml)
- [Commits](gohugoio/hugo@v0.139.3...v0.139.4)

---
updated-dependencies:
- dependency-name: github.com/gohugoio/hugo
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump axios from 0.27.2 to 1.7.9 in /cmd/experimental/kueue-viz/frontend (kubernetes-sigs#3790)

Bumps [axios](https://github.com/axios/axios) from 0.27.2 to 1.7.9.
- [Release notes](https://github.com/axios/axios/releases)
- [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md)
- [Commits](axios/axios@v0.27.2...v1.7.9)

---
updated-dependencies:
- dependency-name: axios
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump react from 18.3.1 to 19.0.0 in /cmd/experimental/kueue-viz/frontend (kubernetes-sigs#3791)

Bumps [react](https://github.com/facebook/react/tree/HEAD/packages/react) from 18.3.1 to 19.0.0.
- [Release notes](https://github.com/facebook/react/releases)
- [Changelog](https://github.com/facebook/react/blob/main/CHANGELOG.md)
- [Commits](https://github.com/facebook/react/commits/v19.0.0/packages/react)

---
updated-dependencies:
- dependency-name: react
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump babel/plugin-proposal-private-property-in-object (kubernetes-sigs#3793)

Bumps the all group in /cmd/experimental/kueue-viz/frontend with 1 update: [babel/plugin-proposal-private-property-in-object](https://github.com/babel/babel/tree/HEAD/packages/babel-plugin-proposal-private-property-in-object).

Updates `babel/plugin-proposal-private-property-in-object` from 7.18.6 to 7.21.11
- [Release notes](https://github.com/babel/babel/releases)
- [Commits](https://github.com/babel/babel/commits/HEAD/packages/babel-plugin-proposal-private-property-in-object)

---
updated-dependencies:
- dependency-name: "babel/plugin-proposal-private-property-in-object"
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: all
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [website] Add Innovatrics adopter (kubernetes-sigs#3800)

* Migrate to React 19 (kubernetes-sigs#3799)

* Update resource group example (kubernetes-sigs#3806)

Signed-off-by: Troy Chiu <y.troychiu@gmail.com>

* Do not default the queue-name for jobs with kueue managed owners (kubernetes-sigs#3795)

* chore: add default topologies for tests (kubernetes-sigs#3769)

* Bump actions/setup-go from 5.1.0 to 5.2.0 in the all group (kubernetes-sigs#3808)

Bumps the all group with 1 update: [actions/setup-go](https://github.com/actions/setup-go).


Updates `actions/setup-go` from 5.1.0 to 5.2.0
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](actions/setup-go@41dfa10...3041bf5)

---
updated-dependencies:
- dependency-name: actions/setup-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: all
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add configuration for setting feature gates (kubernetes-sigs#3805)

* generate should generate kueuectl and apiref rather than verify (kubernetes-sigs#3810)

* update kuberbac proxy defaults (kubernetes-sigs#3811)

* fix default suspension of Deployment/StatefulSet with Kueue-managed owners (kubernetes-sigs#3803)

* fix deployment and statefulset setup so ownership works

* enable deployment/statefulset to be recognized as child jobs

* fix typo in rbacs

* adjust podwebhook warning for forced suspension

* redesign special pod/deployment/statefulset ownership logic

* revert trivial change

* linter fix

* nit: grammar

* Deactivate ClusterQueue if there is no Topology (kubernetes-sigs#3770)

* Deactivate ClusterQueue if there is no Topology

* Review remarks

* Use lock when accessing IsIntegrationEnabled (kubernetes-sigs#3814)

* Add kueue.x-k8s.io/pod-suspending-parent to the documentation page. (kubernetes-sigs#3816)

* Use annotation rather than label for serving workloads (kubernetes-sigs#3815)

* Documentation for LocalQueueDefaulting (kubernetes-sigs#3783)

Update site/content/en/docs/reference/labels-and-annotations.md

Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>

* avoid hardcoding framework names in pod wehook  (kubernetes-sigs#3817)

* remove hardcoding in pod webhook using GVK to look up framework

* add unit tests

* typo: match function name in comment

* update Pod integration docs to prefer managedJobsNamespaceSelector (kubernetes-sigs#3828)

* add section to metrics with (alpha) LQ metrics (kubernetes-sigs#3826)

Signed-off-by: Kevin <kpostlet@redhat.com>

* Added Log output for LocalQueue and ClusterQueue (kubernetes-sigs#3605)

* Added Log output for LocalQueue and ClusterQueue

Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>

* Added changes

Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>

* added println

Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>

* updated the log output using default error

Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>

* Added test case

Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>

* Added changes

Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>

* Adding return statements for ci-lint

Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>

* removed comments

Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>

---------

Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>

* update golang crypto to 0.31.0 (kubernetes-sigs#3825)

* drop pod taints toleration from kueue repo (kubernetes-sigs#3834)

* Check if unexpected errors did not occur in queue unit testings (kubernetes-sigs#3838)

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: David Weinehall <david.weinehall@intel.com>
Signed-off-by: Troy Chiu <y.troychiu@gmail.com>
Signed-off-by: Kevin <kpostlet@redhat.com>
Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Co-authored-by: Mykyta Derhunov <ndergunov@gmail.com>
Co-authored-by: Mykhailo Bobrovskyi <mikhail.bobrovsky@gmail.com>
Co-authored-by: David Grove <dgrove-oss@users.noreply.github.com>
Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Patryk Bundyra <pbundyra@google.com>
Co-authored-by: Tushar Mohapatra <137442734+TusharMohapatra07@users.noreply.github.com>
Co-authored-by: Yaroslava Serdiuk <yaroslava@google.com>
Co-authored-by: Michał Szadkowski <michal_szadkowski@epam.com>
Co-authored-by: David Weinehall <david.weinehall@intel.com>
Co-authored-by: mmolisch <141235459+mmolisch@users.noreply.github.com>
Co-authored-by: Akram Ben Aissi <akram.benaissi@gmail.com>
Co-authored-by: Troy Chiu <114708546+troychiu@users.noreply.github.com>
Co-authored-by: Kevin Postlethwait <kpostlet@redhat.com>
Co-authored-by: 7h3-3mp7y-m4n <115151332+7h3-3mp7y-m4n@users.noreply.github.com>
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants