koordlet: fix prodReclaimablePredictor result to avoid influence of o… #2325

lijunxin559 · 2025-01-20T10:29:52Z

…versold

Ⅰ. Describe what this PR does

When calculating Allocatable[mid] resources, due to possible oversold, ProdReclaimableMetric will be greater than NodeAllocatable * thresholdRatio, so the calculated Allocatable[mid] value accidentally includes the oversold part. However, our previous attempts at modifying the computational model in PR #2291 were not sufficient as they would erase the role of the prodPod estimation model, resulting in the loss of the more stable mid resource characteristics after modification. Therefore, further modifications to the prodPod are needed.

Ⅱ. Does this pull request fix one issue?

Therefore, I optimized the behavior of ProdReclaimablePredictor by adjusting the values based on the node's runtime information when returning the prediction results, thereby affecting the collectMetric results. And added necessary related tests has proved that the modified calculations are reasonable.

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

I have written necessary docs and comments
I have added necessary unit tests and integration tests
All checks passed in make test

pkg/koordlet/prediction/peak_predictor.go

pkg/koordlet/statesinformer/impl/states_nodemetric.go

codecov · 2025-01-21T03:25:56Z

Codecov Report

Attention: Patch coverage is 66.66667% with 31 lines in your changes missing coverage. Please review.

Project coverage is 66.08%. Comparing base (79036cf) to head (7c9ac0b).
Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/koordlet/prediction/peak_predictor.go	74.32%	13 Missing and 6 partials ⚠️
pkg/koordlet/metrics/resource_summary.go	0.00%	7 Missing ⚠️
.../koordlet/statesinformer/impl/states_nodemetric.go	58.33%	3 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2325      +/-   ##
==========================================
- Coverage   66.09%   66.08%   -0.02%     
==========================================
  Files         458      458              
  Lines       54200    54270      +70     
==========================================
+ Hits        35823    35862      +39     
- Misses      15803    15828      +25     
- Partials     2574     2580       +6

Flag	Coverage Δ
unittests	`66.08% <66.66%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pkg/koordlet/metrics/resource_summary.go

pkg/koordlet/prediction/peak_predictor.go

pkg/koordlet/metrics/resource_summary.go

pkg/koordlet/prediction/peak_predictor.go

…versold Signed-off-by: lijunxin <lijunxin.ljx@alibaba-inc.com>

saintube

/lgtm

saintube · 2025-01-21T13:35:42Z

PTAL /cc @zwzhang0107 @hormes @jasonliu747

saintube reviewed Jan 20, 2025

View reviewed changes

pkg/koordlet/prediction/peak_predictor.go Outdated Show resolved Hide resolved

pkg/koordlet/prediction/peak_predictor.go Outdated Show resolved Hide resolved

pkg/koordlet/statesinformer/impl/states_nodemetric.go Outdated Show resolved Hide resolved

lijunxin559 force-pushed the fix-prod-reclaimable-predictor-result-to-avoid-oversold branch from f068dab to a13d624 Compare January 21, 2025 03:20

saintube reviewed Jan 21, 2025

View reviewed changes

lijunxin559 force-pushed the fix-prod-reclaimable-predictor-result-to-avoid-oversold branch 2 times, most recently from 4f05190 to 6cfc773 Compare January 21, 2025 06:41

saintube reviewed Jan 21, 2025

View reviewed changes

pkg/koordlet/metrics/resource_summary.go Show resolved Hide resolved

pkg/koordlet/prediction/peak_predictor.go Show resolved Hide resolved

koordlet: fix prodReclaimablePredictor result to avoid influence of o…

7c9ac0b

…versold Signed-off-by: lijunxin <lijunxin.ljx@alibaba-inc.com>

lijunxin559 force-pushed the fix-prod-reclaimable-predictor-result-to-avoid-oversold branch from 6cfc773 to 7c9ac0b Compare January 21, 2025 09:44

saintube reviewed Jan 21, 2025

View reviewed changes

saintube added the lgtm label Jan 21, 2025

saintube added the approved label Jan 23, 2025

koordinator-bot bot merged commit 98057ae into koordinator-sh:main Jan 23, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

koordlet: fix prodReclaimablePredictor result to avoid influence of o… #2325

koordlet: fix prodReclaimablePredictor result to avoid influence of o… #2325

lijunxin559 commented Jan 20, 2025

codecov bot commented Jan 21, 2025 •

edited

Loading

saintube left a comment

saintube commented Jan 21, 2025

koordlet: fix prodReclaimablePredictor result to avoid influence of o… #2325

koordlet: fix prodReclaimablePredictor result to avoid influence of o… #2325

Conversation

lijunxin559 commented Jan 20, 2025

Ⅰ. Describe what this PR does

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

codecov bot commented Jan 21, 2025 • edited Loading

Codecov Report

saintube left a comment

Choose a reason for hiding this comment

saintube commented Jan 21, 2025

codecov bot commented Jan 21, 2025 •

edited

Loading