Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable fixing balloon instances to hardcoded CPUs and memories for benchmarking #1006

Closed
wants to merge 6 commits into from

Conversation

askervin
Copy link
Contributor

These patches add support for BalloonInstances configuration. It enables setting constraints and overrides on individual balloon instances regarding CPU allocation and memory pinning.

This feature is useful for benchmarking well-known hardware under different loads. However, using this feature makes the balloons policy configuration totally inapplicable to heterogenous clusters and very prone to configuration errors. These issues are likely to manifest themselves by not being able to run pods that have been scheduled on a node. Therefore this feature is not available by default, but only if cri-resmgr is built with benchmarking build tag (or make BENCHMARKING=1).

@askervin askervin temporarily deployed to dev April 24, 2023 13:53 — with GitHub Actions Inactive
@askervin askervin temporarily deployed to dev April 24, 2023 13:53 — with GitHub Actions Inactive
Copy link
Contributor

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM. Just some nits about the commit messages.

As a general note, it would be nice to see some text in the commit messages answering "what" this is about and "why" this is needed.

Do we want to port this to nri balloons plugin?

@askervin askervin temporarily deployed to dev April 25, 2023 05:12 — with GitHub Actions Inactive
@askervin askervin temporarily deployed to dev April 25, 2023 05:12 — with GitHub Actions Inactive
@codecov-commenter
Copy link

Codecov Report

Merging #1006 (b17ed04) into master (49ad8ca) will decrease coverage by 0.01%.
The diff coverage is 24.00%.

@@            Coverage Diff             @@
##           master    #1006      +/-   ##
==========================================
- Coverage   31.86%   31.85%   -0.01%     
==========================================
  Files          65       66       +1     
  Lines        9835     9872      +37     
==========================================
+ Hits         3134     3145      +11     
- Misses       6411     6437      +26     
  Partials      290      290              
Impacted Files Coverage Δ
...manager/policy/builtin/balloons/balloons-policy.go 1.67% <0.00%> (-0.04%) ⬇️
.../resource-manager/policy/builtin/balloons/flags.go 46.42% <ø> (ø)
...er/policy/builtin/balloons/instances-production.go 0.00% <0.00%> (ø)
...esource-manager/policy/builtin/balloons/cputree.go 74.37% <66.66%> (+1.49%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

BalloonInstances configuration sets constraints and overrides on CPU
allocation and memory pinning of individual balloon instances.

This feature is useful for benchmarking well-known hardware topology
with well-known sequences of workload deployments. However, using this
feature under other circumstances is prone to errors and likely to
lead into situations where pods that have been scheduled on a node
cannot be run. Therefore enabling this feature requires building
cri-resmgr with "benchmarking" build tag (make BENCHMARKING=1).
@askervin askervin force-pushed the 5OV_balloon_instances branch from b17ed04 to 4c62b5d Compare April 25, 2023 05:36
@askervin askervin temporarily deployed to dev April 25, 2023 05:36 — with GitHub Actions Inactive
@askervin askervin temporarily deployed to dev April 25, 2023 05:36 — with GitHub Actions Inactive
@askervin
Copy link
Contributor Author

Thanks @jukkar, good point. I added both "why" and "why not" reasoning on the commit that introduces BalloonInstances to the configuration.

@askervin
Copy link
Contributor Author

askervin commented Jan 5, 2024

For real use (instead of manually tuned benchmarking) we should do something similar to preferCloseToDevices like in containers/nri-plugins#203.

@askervin
Copy link
Contributor Author

askervin commented Jan 5, 2024

Let's close this PR for now.

@askervin askervin closed this Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants