Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get correct cpu cores in k8s pod #6430

Merged
merged 22 commits into from
Dec 8, 2022
Merged

Conversation

xzhangxian1008
Copy link
Contributor

@xzhangxian1008 xzhangxian1008 commented Dec 6, 2022

What problem does this PR solve?

Issue Number: close #6434

Problem Summary:

max_streams of xxxxInputStream is determined by Context's max_threads which is initialized by getNumberofPhysicalCPUCores(). In x86 virtual environment, we get max_threads as 48 which is physical cpu cores while getting max_threads as 128 which is logical cpu cores in arm virtual environment. 128 is much larger than 48, so it will use more threads and exceed the limit more easily.

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Dec 6, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • Lloyd-Pottiger
  • zanmato1984

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 6, 2022
@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 6, 2022
@xzhangxian1008
Copy link
Contributor Author

/run-build-arm64-release

@xzhangxian1008
Copy link
Contributor Author

/run-build-arm64-release

@sre-bot
Copy link
Collaborator

sre-bot commented Dec 6, 2022

@xzhangxian1008
Copy link
Contributor Author

/rebuild

@xzhangxian1008
Copy link
Contributor Author

/cc @windtalker @SeaRise

Copy link
Contributor

@yibin87 yibin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test_infra tests to check?

/// Let's limit ourself to the number of physical cores.
/// But if the number of logical cores is small - maybe it is a small machine
/// or very limited cloud instance and it is reasonable to use all the cores.
if (cpu_count >= 32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little weired here, this cpu_count = 32, its max stream will be 16, smaller than cpu_count = 24. That's machine with more cpus has smaller max_streams.

Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 6, 2022
@Lloyd-Pottiger
Copy link
Contributor

can use the value in server info https://github.com/pingcap/tiflash/blob/master/dbms/src/Server/ServerInfo.h which is more accurate and compatible.

@xzhangxian1008
Copy link
Contributor Author

can use the value in server info https://github.com/pingcap/tiflash/blob/master/dbms/src/Server/ServerInfo.h which is more accurate and compatible.

values in ServerInfo is calculated by this function, so ServerInfo can't help us.

@Lloyd-Pottiger
Copy link
Contributor

Lloyd-Pottiger commented Dec 6, 2022

can use the value in server info https://github.com/pingcap/tiflash/blob/master/dbms/src/Server/ServerInfo.h which is more accurate and compatible.

values in ServerInfo is calculated by this function, so ServerInfo can't help us.

no, actually it reuse the logic from tikv https://github.com/tikv/tikv/blob/master/components/tikv_util/src/sys/cgroup.rs,

if (tiflash_instance_wrap.proxy_helper)
{
diagnosticspb::ServerInfoRequest request;
request.set_tp(static_cast<diagnosticspb::ServerInfoType>(1));
diagnosticspb::ServerInfoResponse response;
std::string req = request.SerializeAsString();
auto * helper = tiflash_instance_wrap.proxy_helper;
helper->fn_server_info(helper->proxy_ptr, strIntoView(&req), &response);
server_info.parseSysInfo(response);
LOG_INFO(log, "ServerInfo: {}", server_info.debugString());
}

@ti-chi-bot ti-chi-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 6, 2022
@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Dec 6, 2022

  1. We can get the server_info from tikv as @Lloyd-Pottiger said, which is more portable than the CH's one. feat: add a function to get number of logical cpu cores #4879
  2. I did not get why this PR cause the error in TiFlash get wrong cpu cores in k8s pod #6434, can you elaborate more about it?

@xzhangxian1008
Copy link
Contributor Author

  1. We can get the server_info from tikv as @Lloyd-Pottiger said, which is more portable than the CH's one. feat: add a function to get number of logical cpu cores #4879
  2. I did not get why this PR cause the error in TiFlash get wrong cpu cores in k8s pod #6434, can you elaborate more about it?

max_streams of xxxxInputStream is determined by Context's max_threads which is initialized by getNumberofPhysicalCPUCores(). In x86 virtual environment, we get max_threads as 48 which is physical cpu cores while getting max_threads as 128 which is logical cpu cores in arm virtual environment. 128 is much larger than 48, so it will use more threads and exceed the limit more easily.

@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 7, 2022
@ti-chi-bot
Copy link
Member

@Lloyd-Pottiger: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: c2b73ff

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 8, 2022
@ti-chi-bot
Copy link
Member

@xzhangxian1008: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@xzhangxian1008
Copy link
Contributor Author

/run-all-tests

@ti-chi-bot ti-chi-bot merged commit 966e7e2 into pingcap:master Dec 8, 2022
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #6449.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Dec 8, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #6450.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #6451.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Dec 8, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #6452.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Dec 8, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #6453.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Dec 8, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #6454.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Dec 8, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Dec 8, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #6455.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-5.0 PR which needs to be cherry-picked to release-5.0 needs-cherry-pick-release-5.1 PR which needs to be cherry-picked to release-5.1 needs-cherry-pick-release-5.2 PR which needs to be cherry-picked to release-5.2 needs-cherry-pick-release-5.3 Type: Need cherry pick to release-5.3 needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TiFlash get wrong cpu cores in k8s pod
7 participants