Get correct cpu cores in k8s pod #6430

xzhangxian1008 · 2022-12-06T06:03:11Z

What problem does this PR solve?

Issue Number: close #6434

Problem Summary:

max_streams of xxxxInputStream is determined by Context's max_threads which is initialized by getNumberofPhysicalCPUCores(). In x86 virtual environment, we get max_threads as 48 which is physical cpu cores while getting max_threads as 128 which is logical cpu cores in arm virtual environment. 128 is much larger than 48, so it will use more threads and exceed the limit more easily.

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

None

ti-chi-bot · 2022-12-06T06:03:13Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

Lloyd-Pottiger
zanmato1984

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

xzhangxian1008 · 2022-12-06T06:07:57Z

/run-build-arm64-release

xzhangxian1008 · 2022-12-06T06:39:53Z

/run-build-arm64-release

sre-bot · 2022-12-06T07:29:53Z

download tiflash binary(linux arm64) at http://fileserver.pingcap.net/download/builds/pingcap/test/tiflash/master/3608ae5daad64281f376563444418be8ec1c64b5/centos7/tiflash-linux-arm64.tar.gz

xzhangxian1008 · 2022-12-06T08:23:41Z

/rebuild

xzhangxian1008 · 2022-12-06T08:24:14Z

/cc @windtalker @SeaRise

yibin87

Add test_infra tests to check?

yibin87 · 2022-12-06T08:41:58Z

dbms/src/Common/getNumberOfPhysicalCPUCores.cpp

+    /// Let's limit ourself to the number of physical cores.
+    /// But if the number of logical cores is small - maybe it is a small machine
+    /// or very limited cloud instance and it is reasonable to use all the cores.
+    if (cpu_count >= 32)


A little weired here, this cpu_count = 32, its max stream will be 16, smaller than cpu_count = 24. That's machine with more cpus has smaller max_streams.

zanmato1984

LGTM

Lloyd-Pottiger · 2022-12-06T09:30:16Z

can use the value in server info https://github.com/pingcap/tiflash/blob/master/dbms/src/Server/ServerInfo.h which is more accurate and compatible.

xzhangxian1008 · 2022-12-06T09:45:35Z

can use the value in server info https://github.com/pingcap/tiflash/blob/master/dbms/src/Server/ServerInfo.h which is more accurate and compatible.

values in ServerInfo is calculated by this function, so ServerInfo can't help us.

Lloyd-Pottiger · 2022-12-06T10:09:06Z

can use the value in server info https://github.com/pingcap/tiflash/blob/master/dbms/src/Server/ServerInfo.h which is more accurate and compatible.

values in ServerInfo is calculated by this function, so ServerInfo can't help us.

no, actually it reuse the logic from tikv https://github.com/tikv/tikv/blob/master/components/tikv_util/src/sys/cgroup.rs,

tiflash/dbms/src/Server/Server.cpp

Lines 876 to 886 in 110bda2

    
           if (tiflash_instance_wrap.proxy_helper) 
        
           { 
        
               diagnosticspb::ServerInfoRequest request; 
        
               request.set_tp(static_cast<diagnosticspb::ServerInfoType>(1)); 
        
               diagnosticspb::ServerInfoResponse response; 
        
               std::string req = request.SerializeAsString(); 
        
               auto * helper = tiflash_instance_wrap.proxy_helper; 
        
               helper->fn_server_info(helper->proxy_ptr, strIntoView(&req), &response); 
        
               server_info.parseSysInfo(response); 
        
               LOG_INFO(log, "ServerInfo: {}", server_info.debugString()); 
        
           }

JaySon-Huang · 2022-12-06T12:18:53Z

We can get the server_info from tikv as @Lloyd-Pottiger said, which is more portable than the CH's one. feat: add a function to get number of logical cpu cores #4879
I did not get why this PR cause the error in TiFlash get wrong cpu cores in k8s pod #6434, can you elaborate more about it?

xzhangxian1008 · 2022-12-07T01:33:00Z

We can get the server_info from tikv as @Lloyd-Pottiger said, which is more portable than the CH's one. feat: add a function to get number of logical cpu cores #4879

I did not get why this PR cause the error in TiFlash get wrong cpu cores in k8s pod #6434, can you elaborate more about it?

max_streams of xxxxInputStream is determined by Context's max_threads which is initialized by getNumberofPhysicalCPUCores(). In x86 virtual environment, we get max_threads as 48 which is physical cpu cores while getting max_threads as 128 which is logical cpu cores in arm virtual environment. 128 is much larger than 48, so it will use more threads and exceed the limit more easily.

ti-chi-bot · 2022-12-08T05:58:40Z

@Lloyd-Pottiger: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2022-12-08T05:58:41Z

This pull request has been accepted and is ready to merge.

Commit hash: c2b73ff

ti-chi-bot · 2022-12-08T05:58:54Z

@xzhangxian1008: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

xzhangxian1008 · 2022-12-08T06:57:38Z

/run-all-tests

ti-chi-bot · 2022-12-08T07:30:43Z

In response to a cherrypick label: new pull request created: #6449.