Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose nodeclaim disruption through new disruption condition, improves pod eviction event message #1370

Merged

Conversation

cnmcavoy
Copy link
Contributor

@cnmcavoy cnmcavoy commented Jun 28, 2024

Fixes #N/A

Description

Add's a new nodeclaim condition DisruptionCandidate which is set when a nodeclaim is being disrupted, and is applied after the disruption taint is set. The DisruptionCandidate nodeclaim condition contains the reason why the nodeclaim is being terminated (e.g node worker-mgn6n/ip-10-115-200-242.us-east-2.compute.internal was single node consolidated).

The motivation for this new nodeclaim condition is so that when evicting pods, we can look up this condition and use the condition's message in the pod event.

Example of what the pod events look like now from testing in our clusters:

keda                                          85s         Normal    Evicted                           pod/keda-admission-webhooks-5bd6b554ff-tn25h                                                Evicted pod: node worker-qa-czn28/ip-10-115-195-50.us-east-2.compute.internal drifted
keda                                          86s         Normal    Evicted                           pod/keda-operator-688bc9b887-7t4gd                                                          Evicted pod: node worker-qa-czn28/ip-10-115-195-50.us-east-2.compute.internal drifted

How was this change tested?

Built Karpenter with this change locally and tested in our clusters. Also make presubmit

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 28, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 28, 2024
@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch from 83de6f8 to a7f19d9 Compare June 28, 2024 21:30
@coveralls
Copy link

coveralls commented Jun 28, 2024

Pull Request Test Coverage Report for Build 9718717778

Details

  • 32 of 54 (59.26%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.1%) to 78.704%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/operator/operator.go 0 3 0.0%
pkg/controllers/disruption/controller.go 14 23 60.87%
pkg/controllers/node/termination/terminator/eviction.go 16 26 61.54%
Totals Coverage Status
Change from base Build 9718332603: -0.1%
Covered Lines: 8622
Relevant Lines: 10955

💛 - Coveralls

@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch from a7f19d9 to 3e04007 Compare July 1, 2024 20:15
@coveralls
Copy link

coveralls commented Jul 1, 2024

Pull Request Test Coverage Report for Build 9750492374

Details

  • 32 of 54 (59.26%) changed or added relevant lines in 4 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.1%) to 78.684%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/operator/operator.go 0 3 0.0%
pkg/controllers/disruption/controller.go 14 23 60.87%
pkg/controllers/node/termination/terminator/eviction.go 16 26 61.54%
Files with Coverage Reduction New Missed Lines %
pkg/test/expectations/expectations.go 2 93.69%
pkg/utils/atomic/lazy.go 2 87.88%
Totals Coverage Status
Change from base Build 9748897421: -0.1%
Covered Lines: 8619
Relevant Lines: 10954

💛 - Coveralls

@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch 5 times, most recently from 9819849 to 3041c3e Compare July 3, 2024 16:14
@coveralls
Copy link

coveralls commented Jul 3, 2024

Pull Request Test Coverage Report for Build 9781364600

Details

  • 39 of 57 (68.42%) changed or added relevant lines in 9 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.05%) to 78.695%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/operator/operator.go 0 3 0.0%
pkg/controllers/disruption/controller.go 6 11 54.55%
pkg/controllers/node/termination/terminator/eviction.go 16 26 61.54%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/provisioning/scheduling/nodeclaim.go 2 89.13%
Totals Coverage Status
Change from base Build 9772878439: -0.05%
Covered Lines: 8636
Relevant Lines: 10974

💛 - Coveralls

@cnmcavoy cnmcavoy requested a review from Bryce-Soghigian July 3, 2024 16:41
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 4, 2024
@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch from 3041c3e to 0d31c2d Compare July 8, 2024 18:32
@coveralls
Copy link

coveralls commented Jul 8, 2024

Pull Request Test Coverage Report for Build 12399190527

Details

  • 66 of 82 (80.49%) changed or added relevant lines in 6 files are covered.
  • 7 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.05%) to 81.247%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/node/termination/terminator/eviction.go 20 22 90.91%
pkg/controllers/state/statenode.go 18 22 81.82%
pkg/controllers/disruption/controller.go 24 34 70.59%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/disruption/controller.go 1 71.13%
pkg/utils/termination/termination.go 2 92.31%
pkg/controllers/disruption/consolidation.go 4 88.55%
Totals Coverage Status
Change from base Build 12303032942: -0.05%
Covered Lines: 9068
Relevant Lines: 11161

💛 - Coveralls

@cnmcavoy
Copy link
Contributor Author

cnmcavoy commented Jul 8, 2024

/remove-needs-rebase

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 9, 2024
@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch from 0d31c2d to ef86002 Compare July 12, 2024 17:52
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2024
@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch from ef86002 to 2b235b5 Compare July 12, 2024 18:03
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 26, 2024
@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch from 2b235b5 to 0807939 Compare July 26, 2024 19:23
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 26, 2024
@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch 2 times, most recently from ab5dbb4 to 22ed1fe Compare October 2, 2024 20:48
@cnmcavoy cnmcavoy requested a review from njtran October 2, 2024 20:48
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/state/statenode.go Outdated Show resolved Hide resolved
pkg/controllers/state/statenode.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/lifecycle/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/termination/terminator/eviction.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch 2 times, most recently from 5743ceb to 5381d84 Compare October 8, 2024 20:04
Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, think we're getting close!

pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/state/statenode.go Show resolved Hide resolved
pkg/controllers/state/statenode.go Outdated Show resolved Hide resolved
pkg/controllers/state/statenode.go Outdated Show resolved Hide resolved
pkg/controllers/state/statenode.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/lifecycle/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/termination/terminator/eviction.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch 2 times, most recently from 8f0d122 to 23471fa Compare November 6, 2024 20:55
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 9, 2024
@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch from 23471fa to 001c05c Compare November 19, 2024 18:59
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2024
@cnmcavoy cnmcavoy requested a review from njtran November 19, 2024 20:36
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2024
@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch from 001c05c to ed1453b Compare November 25, 2024 15:41
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 25, 2024
Copy link

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2024
@engedaam
Copy link
Contributor

/assign @njtran

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2024
…dd eviction message from condition

Signed-off-by: Cameron McAvoy <cmcavoy@indeed.com>
@cnmcavoy cnmcavoy force-pushed the cmcavoy/eviction-reason branch from ed1453b to 4dd4e79 Compare December 18, 2024 18:36
Copy link

github-actions bot commented Jan 2, 2025

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2025
Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

thanks for the contribution!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 2, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cnmcavoy, njtran

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 2, 2025
@k8s-ci-robot k8s-ci-robot merged commit cfda355 into kubernetes-sigs:main Jan 2, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants