Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: improve NodeNUMAResource handling node cpu bind policy #1892

Conversation

eahydra
Copy link
Member

@eahydra eahydra commented Feb 4, 2024

Ⅰ. Describe what this PR does

If the node enables label node.koordinator.sh/cpu-bind-policy and configures the corresponding value to FullPCPUsOnly or SpreadByPCPUs, then the scheduler should ensure that the allocation result corresponding to ResourceSpec.RequiredCPUBindPolicy is consistent.

And the node policy may conflict with the ResourceSpec.RequiredCPUBindPolicy defined in the Pod. For example: the node defined the policy with FullPCPUsOnly, and the Pod required SpreadByPCPUs, this creates a conflict that should be filtered.

If the Pod does not declare the ResourceSpec with CPUBindPolicy(either required or preferred), and the node declares the CPUBindPolicy, the scheduler should allocate CPUs by node's CPUBindPolicy, even though the Pod's QoSClass is LS. Although the previous implementation could work, but the implementation was not complete enough.
As for whether koordlet or kubelet will set the cgroup according to the scheduler allocation result on the LS Pod, I think this is a strategic issue.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@koordinator-bot koordinator-bot bot requested review from buptcozy and FillZpp February 4, 2024 10:14
@eahydra eahydra requested review from hormes and ZiMengSheng and removed request for FillZpp February 4, 2024 10:14
@eahydra
Copy link
Member Author

eahydra commented Feb 4, 2024

/hold

Copy link

codecov bot commented Feb 4, 2024

Codecov Report

Attention: Patch coverage is 75.52448% with 35 lines in your changes are missing coverage. Please review.

Project coverage is 67.53%. Comparing base (93f2bc2) to head (d465bf3).
Report is 17 commits behind head on main.

Files Patch % Lines
pkg/scheduler/plugins/nodenumaresource/plugin.go 74.07% 13 Missing and 8 partials ⚠️
pkg/scheduler/plugins/nodenumaresource/scoring.go 60.00% 4 Missing and 2 partials ⚠️
...cheduler/plugins/nodenumaresource/topology_hint.go 53.84% 4 Missing and 2 partials ⚠️
...duler/plugins/nodenumaresource/resource_manager.go 50.00% 0 Missing and 1 partial ⚠️
pkg/scheduler/plugins/nodenumaresource/service.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1892   +/-   ##
=======================================
  Coverage   67.52%   67.53%           
=======================================
  Files         413      413           
  Lines       46072    46115   +43     
=======================================
+ Hits        31111    31143   +32     
- Misses      12705    12710    +5     
- Partials     2256     2262    +6     
Flag Coverage Δ
unittests 67.53% <75.52%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eahydra eahydra force-pushed the imporve_nodenumaresource_nodecpubindpolicy branch from 90308dd to a2f2107 Compare February 4, 2024 16:25
@koordinator-bot koordinator-bot bot added size/XL and removed size/L labels Feb 4, 2024
Signed-off-by: Joseph <joseph.t.lee@outlook.com>
@eahydra eahydra force-pushed the imporve_nodenumaresource_nodecpubindpolicy branch from a2f2107 to d465bf3 Compare February 5, 2024 03:02
@eahydra
Copy link
Member Author

eahydra commented Feb 5, 2024

/hold cancel

PTAL

@hormes
Copy link
Member

hormes commented Feb 28, 2024

/lgtm
/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hormes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit a6159e4 into koordinator-sh:main Feb 28, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants