Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikv: avoid switch peer when batchRequest be cancelled #10822

Merged
merged 4 commits into from
Jun 19, 2019
Merged

tikv: avoid switch peer when batchRequest be cancelled #10822

merged 4 commits into from
Jun 19, 2019

Conversation

lysu
Copy link
Contributor

@lysu lysu commented Jun 17, 2019

separate cancel/deadline err in batchSendRequest

What problem does this PR solve?

[2019/06/17 13:44:25.675 +08:00] [ERROR] [region_cache.go:372] ["switch region peer to next due to send request fail"] [conn=27] [current="region ID: 16, meta: id:16 start_key:\"t\\200\\000\\000\\000\\000\\000\\000'_i\\200\\000\\000\\000\\000\\000\\000\\002\\004\\000\\000\\000\\000\\000\\005\\340Y\" end_key:\"t\\200\\000\\000\\000\\000\\000\\000'_r\\200\\000\\000\\000\\000\\000\\006\\033\" region_epoch:<conf_ver:1 version:4 > peers:<id:9 store_id:1 > , peer: id:9 store_id:1 , addr: 127.0.0.1:9191, idx: 0"] [needReload=true] [error="rpc error: code = DeadlineExceeded desc = Canceled or timeout"] [errorVerbose="rpc error: code = DeadlineExceeded desc = Canceled or timeout\ngithub.com/pingcap/errors.AddStack\n\t/home/robi/code/go/pkg/mod/github.com/pingcap/errors@v0.11.4/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/home/robi/code/go/pkg/mod/github.com/pingcap/errors@v0.11.4/juju_adaptor.go:15\ngithub.com/pingcap/tidb/store/tikv.sendBatchRequest\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/client.go:634\ngithub.com/pingcap/tidb/store/tikv.(*rpcClient).SendRequest\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/client.go:658\ngithub.com/pingcap/tidb/store/tikv.(*RegionRequestSender).sendReqToRegion\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/region_request.go:145\ngithub.com/pingcap/tidb/store/tikv.(*RegionRequestSender).SendReqCtx\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/region_request.go:116\ngithub.com/pingcap/tidb/store/tikv.(*RegionRequestSender).SendReq\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/region_request.go:72\ngithub.com/pingcap/tidb/store/tikv.(*tikvStore).SendReq\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/kv.go:367\ngithub.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).prewriteSingleBatch\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/2pc.go:497\ngithub.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).doActionOnBatches.func1\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/2pc.go:423\nruntime.goexit\n\t/home/robi/runtime/go/src/runtime/asm_amd64.s:1337"] [stack="github.com/pingcap/tidb/store/tikv.(*RegionCache).OnSendFail\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:372\ngithub.com/pingcap/tidb/store/tikv.(*RegionRequestSender).onSendFail\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/region_request.go:175\ngithub.com/pingcap/tidb/store/tikv.(*RegionRequestSender).sendReqToRegion\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/region_request.go:148\ngithub.com/pingcap/tidb/store/tikv.(*RegionRequestSender).SendReqCtx\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/region_request.go:116\ngithub.com/pingcap/tidb/store/tikv.(*RegionRequestSender).SendReq\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/region_request.go:72\ngithub.com/pingcap/tidb/store/tikv.(*tikvStore).SendReq\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/kv.go:367\ngithub.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).prewriteSingleBatch\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/2pc.go:497\ngithub.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).doActionOnBatches.func1\n\t/home/robi/code/go/src/github.com/pingcap/tidb/store/tikv/2pc.go:423"]

What is changed and how it works?

it's better return ctx.Err() which can return different error for "cancel" and "timeout", then region cache can only switch peer for "timeout" but not "cancel"

Check List

Tests

  • Manual test (add detailed scripts or steps below)
run unionstore with  -region-size=1000000

using test script in:
https://gist.github.com/coocood/c8250a07b724d57b313f8ce26abd69d4

Code changes

  • change ret error

Side effects

  • N/A

Related changes

  • Need to cherry-pick to the release 3.0

This change is Reviewable

@lysu lysu added type/bugfix This PR fixes a bug. component/tikv labels Jun 17, 2019
@lysu lysu requested a review from hicqu June 17, 2019 06:07
@lysu
Copy link
Contributor Author

lysu commented Jun 17, 2019

/run-all-tests

@lysu
Copy link
Contributor Author

lysu commented Jun 17, 2019

/run-unit-test

@codecov
Copy link

codecov bot commented Jun 17, 2019

Codecov Report

Merging #10822 into master will increase coverage by 0.0399%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##             master    #10822        +/-   ##
===============================================
+ Coverage   80.8281%   80.868%   +0.0399%     
===============================================
  Files           419       419                
  Lines         88635     88637         +2     
===============================================
+ Hits          71642     71679        +37     
+ Misses        11760     11732        -28     
+ Partials       5233      5226         -7

@lysu
Copy link
Contributor Author

lysu commented Jun 18, 2019

/run-all-tests

@lysu lysu requested a review from coocood June 18, 2019 11:15
@coocood
Copy link
Member

coocood commented Jun 18, 2019

@lysu
I think we can remove the second test, it has the race issue, the first test is enough.

@coocood
Copy link
Member

coocood commented Jun 19, 2019

LGTM

@lysu lysu requested review from jackysp and tiancaiamao June 19, 2019 03:00
@lysu lysu added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 19, 2019
@lysu
Copy link
Contributor Author

lysu commented Jun 19, 2019

/rebuild

Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jackysp jackysp merged commit 8b16948 into pingcap:master Jun 19, 2019
@lysu lysu deleted the fix-cancel-switch-node branch June 19, 2019 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tikv status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants