Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate the state of the cluster using the CPU usages reported by TiKV #1875

Merged
merged 19 commits into from
Dec 2, 2019

Conversation

shafreeck
Copy link
Contributor

What problem does this PR solve?

By collecting CPU usages from TiKV's heartbeat, PD has a global view of the load of the whole cluster.

What is changed and how it works?

Keep a 5 minutes history of the heartbeat and calculate the state according to the CPU usage.

Check List

Tests

  • Unit test

@shafreeck shafreeck self-assigned this Oct 31, 2019
@shafreeck shafreeck added DNM type/enhancement The issue or PR belongs to an enhancement. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Oct 31, 2019
server/cluster_stat.go Outdated Show resolved Hide resolved
@shafreeck shafreeck changed the title WIP: Calculate the state of the cluster using the CPU usages reported by TiKV Calculate the state of the cluster using the CPU usages reported by TiKV Nov 6, 2019
@shafreeck shafreeck removed DNM do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Nov 6, 2019
@shafreeck
Copy link
Contributor Author

/run-all-tests

@shafreeck
Copy link
Contributor Author

/run-all-tests

@rleungx
Copy link
Member

rleungx commented Nov 6, 2019

Does this PR conflict with #1903?

Copy link
Contributor

@Luffbee Luffbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can become a separate package which provides store statistics information.

server/cluster_stat.go Outdated Show resolved Hide resolved
server/cluster_stat.go Show resolved Hide resolved
server/cluster_stat.go Outdated Show resolved Hide resolved
server/cluster_stat.go Outdated Show resolved Hide resolved
server/cluster_stat.go Outdated Show resolved Hide resolved
server/cluster_stat.go Outdated Show resolved Hide resolved
server/cluster_stat.go Outdated Show resolved Hide resolved
server/cluster_stat.go Outdated Show resolved Hide resolved
server/cluster_stat.go Outdated Show resolved Hide resolved
server/cluster_stat.go Outdated Show resolved Hide resolved
@Luffbee
Copy link
Contributor

Luffbee commented Nov 11, 2019

I've think about this PR and #1903. There is a question: do we really need to call CPU() or Keys() methods with different duration?
If no, this PR can implemented based on PR #1903.

@shafreeck
Copy link
Contributor Author

shafreeck commented Nov 12, 2019

I've think about this PR and #1903. There is a question: do we really need to call CPU() or Keys() methods with different duration?
If no, this PR can implemented based on PR #1903.

I am going to implement this use the rolling stats. The store CPU usage in #1903 is not what I want.

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
state

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
@codecov-io
Copy link

codecov-io commented Nov 20, 2019

Codecov Report

Merging #1875 into master will increase coverage by 0.07%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1875      +/-   ##
==========================================
+ Coverage   77.79%   77.87%   +0.07%     
==========================================
  Files         174      175       +1     
  Lines       17626    17710      +84     
==========================================
+ Hits        13713    13791      +78     
- Misses       2844     2853       +9     
+ Partials     1069     1066       -3
Impacted Files Coverage Δ
server/cluster_stat.go 66.66% <66.66%> (ø)
server/schedulers/random_merge.go 61.81% <0%> (-3.64%) ⬇️
client/client.go 68.76% <0%> (-1.89%) ⬇️
server/grpc_service.go 57.32% <0%> (-1.73%) ⬇️
server/schedule/operator_controller.go 84.6% <0%> (+0.18%) ⬆️
server/handler.go 52.6% <0%> (+0.47%) ⬆️
server/config/option.go 93.08% <0%> (+0.92%) ⬆️
server/server.go 83.43% <0%> (+0.97%) ⬆️
server/heartbeat_streams.go 70% <0%> (+1%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9923a25...a10746b. Read the comment docs.

@nolouch nolouch added this to the v4.0.0-beta milestone Nov 26, 2019
server/cluster_stat.go Show resolved Hide resolved
server/cluster_stat.go Outdated Show resolved Hide resolved
server/cluster_stat.go Outdated Show resolved Hide resolved
@nolouch
Copy link
Contributor

nolouch commented Nov 26, 2019

PTAL @disksing @rleungx

server/cluster_stat.go Outdated Show resolved Hide resolved
server/cluster_stat_test.go Outdated Show resolved Hide resolved
server/cluster_stat.go Outdated Show resolved Hide resolved
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>
@shafreeck
Copy link
Contributor Author

@nolouch @rleungx PTAL

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the LoadStateIdle state. The problem will be changed in the next PR.

@nolouch
Copy link
Contributor

nolouch commented Dec 2, 2019

/merge

@sre-bot sre-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 2, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Dec 2, 2019

/run-all-tests

@sre-bot sre-bot merged commit 895ccdb into tikv:master Dec 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/can-merge Indicates a PR has been approved by a committer. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants