scheduler: improve NodeNUMAResource handling node cpu bind policy #1892

eahydra · 2024-02-04T10:14:44Z

Ⅰ. Describe what this PR does

If the node enables label node.koordinator.sh/cpu-bind-policy and configures the corresponding value to FullPCPUsOnly or SpreadByPCPUs, then the scheduler should ensure that the allocation result corresponding to ResourceSpec.RequiredCPUBindPolicy is consistent.

And the node policy may conflict with the ResourceSpec.RequiredCPUBindPolicy defined in the Pod. For example: the node defined the policy with FullPCPUsOnly, and the Pod required SpreadByPCPUs, this creates a conflict that should be filtered.

If the Pod does not declare the ResourceSpec with CPUBindPolicy(either required or preferred), and the node declares the CPUBindPolicy, the scheduler should allocate CPUs by node's CPUBindPolicy, even though the Pod's QoSClass is LS. Although the previous implementation could work, but the implementation was not complete enough.
As for whether koordlet or kubelet will set the cgroup according to the scheduler allocation result on the LS Pod, I think this is a strategic issue.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

I have written necessary docs and comments
I have added necessary unit tests and integration tests
All checks passed in make test

eahydra · 2024-02-04T10:15:58Z

/hold

codecov · 2024-02-04T10:18:29Z

Codecov Report

Attention: Patch coverage is 75.52448% with 35 lines in your changes are missing coverage. Please review.

Project coverage is 67.53%. Comparing base (93f2bc2) to head (d465bf3).
Report is 17 commits behind head on main.

Files	Patch %	Lines
pkg/scheduler/plugins/nodenumaresource/plugin.go	74.07%	13 Missing and 8 partials ⚠️
pkg/scheduler/plugins/nodenumaresource/scoring.go	60.00%	4 Missing and 2 partials ⚠️
...cheduler/plugins/nodenumaresource/topology_hint.go	53.84%	4 Missing and 2 partials ⚠️
...duler/plugins/nodenumaresource/resource_manager.go	50.00%	0 Missing and 1 partial ⚠️
pkg/scheduler/plugins/nodenumaresource/service.go	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1892   +/-   ##
=======================================
  Coverage   67.52%   67.53%           
=======================================
  Files         413      413           
  Lines       46072    46115   +43     
=======================================
+ Hits        31111    31143   +32     
- Misses      12705    12710    +5     
- Partials     2256     2262    +6

Flag	Coverage Δ
unittests	`67.53% <75.52%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Joseph <joseph.t.lee@outlook.com>

eahydra · 2024-02-05T03:06:43Z

/hold cancel

PTAL

hormes · 2024-02-28T02:14:03Z

/lgtm
/approve

koordinator-bot · 2024-02-28T02:14:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hormes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [hormes]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

koordinator-bot bot requested review from buptcozy and FillZpp February 4, 2024 10:14

eahydra requested review from hormes and ZiMengSheng and removed request for FillZpp February 4, 2024 10:14

koordinator-bot bot added the size/L label Feb 4, 2024

koordinator-bot bot added the do-not-merge/hold label Feb 4, 2024

eahydra force-pushed the imporve_nodenumaresource_nodecpubindpolicy branch from 90308dd to a2f2107 Compare February 4, 2024 16:25

koordinator-bot bot added size/XL and removed size/L labels Feb 4, 2024

scheduler: improve NodeNUMAResource handling node cpu bind policy

d465bf3

Signed-off-by: Joseph <joseph.t.lee@outlook.com>

eahydra force-pushed the imporve_nodenumaresource_nodecpubindpolicy branch from a2f2107 to d465bf3 Compare February 5, 2024 03:02

koordinator-bot bot removed the do-not-merge/hold label Feb 5, 2024

eahydra requested review from saintube and zwzhang0107 February 5, 2024 03:07

koordinator-bot bot assigned hormes Feb 28, 2024

koordinator-bot bot added the lgtm label Feb 28, 2024

koordinator-bot bot added the approved label Feb 28, 2024

koordinator-bot bot merged commit a6159e4 into koordinator-sh:main Feb 28, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler: improve NodeNUMAResource handling node cpu bind policy #1892

scheduler: improve NodeNUMAResource handling node cpu bind policy #1892

eahydra commented Feb 4, 2024 •

edited

Loading

eahydra commented Feb 4, 2024

codecov bot commented Feb 4, 2024 •

edited

Loading

eahydra commented Feb 5, 2024

hormes commented Feb 28, 2024

koordinator-bot bot commented Feb 28, 2024

scheduler: improve NodeNUMAResource handling node cpu bind policy #1892

scheduler: improve NodeNUMAResource handling node cpu bind policy #1892

Conversation

eahydra commented Feb 4, 2024 • edited Loading

Ⅰ. Describe what this PR does

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

eahydra commented Feb 4, 2024

codecov bot commented Feb 4, 2024 • edited Loading

Codecov Report

eahydra commented Feb 5, 2024

hormes commented Feb 28, 2024

koordinator-bot bot commented Feb 28, 2024

eahydra commented Feb 4, 2024 •

edited

Loading

codecov bot commented Feb 4, 2024 •

edited

Loading