-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store : add store limit to restrain bad store from occupying too much token limit. #12779
Conversation
hi @AilinKid, Please refine the description. |
|
0d23ba0
to
d4c95e5
Compare
Codecov Report
@@ Coverage Diff @@
## master #12779 +/- ##
===========================================
Coverage 80.2157% 80.2157%
===========================================
Files 471 471
Lines 114864 114864
===========================================
Hits 92139 92139
Misses 15579 15579
Partials 7146 7146 |
2895d1e
to
e81f939
Compare
/rebuild |
config/config.go
Outdated
@@ -339,6 +340,7 @@ var defaultConf = Config{ | |||
SplitTable: true, | |||
Lease: "45s", | |||
TokenLimit: 1000, | |||
StoreLimit: -1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to set the default value to 0
and make compatible with the old configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
store/tikv/region_request.go
Outdated
@@ -15,6 +15,7 @@ package tikv | |||
|
|||
import ( | |||
"context" | |||
"github.com/pingcap/tidb/config" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The github.com/**
should be placed in an individual group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed,Done.
store/tikv/region_request.go
Outdated
@@ -178,6 +192,27 @@ func (s *RegionRequestSender) sendReqToRegion(bo *Backoffer, ctx *RPCContext, re | |||
return | |||
} | |||
|
|||
func (s *RegionRequestSender) getStoreToken(storeID uint64) error { | |||
s.storeLimit.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the lock is too coarse. Maybe we can define a guard field in the store struct and use atomic to reduce contention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have tried to reconstruct the code to decrease granularity of the lock
store/tikv/region_request.go
Outdated
@@ -167,7 +170,18 @@ func (s *RegionRequestSender) sendReqToRegion(bo *Backoffer, ctx *RPCContext, re | |||
if e := tikvrpc.SetContext(req, ctx.Meta, ctx.Peer); e != nil { | |||
return nil, false, errors.Trace(e) | |||
} | |||
resp, err = s.client.SendRequest(bo.ctx, ctx.Addr, req, timeout) | |||
// judge the store limit switch. | |||
if atomic.LoadInt32(&config.GetGlobalConfig().StoreLimit) != -1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "> 0" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
store/tikv/region_request.go
Outdated
return nil | ||
} | ||
s.storeLimit.Unlock() | ||
return errors.New("store token is up to the limit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an error number and refined message are needed, such as 'maybe TiKV ...(IP) is unavailable'. Otherwise users don't know what to do in this emergency.
store/tikv/region_request.go
Outdated
s.storeLimit.Lock() | ||
limit, ok := s.storeLimit.limit[storeID] | ||
if !ok { | ||
limit = atomic.LoadInt32(&config.GetGlobalConfig().StoreLimit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if it's up to StoreLimit
, and the user wants to raise StoreLimit
, it won't work, is it right?
I think it's better in this way:s.storeLimit.limit[storeID]
is initialized to 0, and increases each time a request comes in, then compares it to s.storeLimit.limit[storeID]
.
In this way, StoreLimit
can be hot reloaded and work immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
It's still empty in "What is changed and how it works?" |
good, done |
|
||
} | ||
|
||
func (s *RegionRequestSender) releaseStoreToken(st *Store) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think st.tokenCount
cannot be less than or equal to 0, even if it is really less than 0, it doesn't matter.
So prefer just st.tokenCount.Sub(1)
. (But it's not a strong opinion)
@djshow832 @bb7133
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok here. @.@
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -0,0 +1,8 @@ | |||
package storeutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a license.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, Addressed.
# store-limit is used to restrain TiDB from sending request to some stores which is up to the limit. | ||
# If a store has been up to the limit, it will return error for the successive request in same store. | ||
# default 0 means shutting off store limit. | ||
store-limit = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, addressed.
@@ -348,6 +348,8 @@ func ValidateSetSystemVar(vars *SessionVars, name string, value string) (string, | |||
return value, ErrWrongValueForVar.GenWithStackByArgs(name, value) | |||
case SQLSelectLimit: | |||
return checkUInt64SystemVar(name, value, 0, math.MaxUint64, vars) | |||
case TiDBStoreLimit: | |||
return checkInt64SystemVar(name, value, 0, math.MaxInt64, vars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add an upper limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hard to define a valid upper limit (big table scanning can cause store limit increase). now is math.MaxInt64.
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
@AilinKid merge failed. |
/run-all-tests |
cherry pick to release-2.1 failed |
cherry pick to release-3.0 failed |
cherry pick to release-3.1 failed |
It seems that, not for sure, we failed to cherry-pick this commit to release-2.1. Please comment '/run-cherry-picker' to try to trigger the cherry-picker if we did fail to cherry-pick this commit before. @AilinKid PTAL. |
What problem does this PR solve?
In some cases, when a store is broken in tikv cluster, the store will hang all write requests sent to it, which will cause token limit exhausted in tidb server, consequently leading QPS to decrease sharply.
What is changed and how it works?
Add storeLimit in
Store
structure, we should do the count on it before sending the request to each store, eventually, control the traffic of each store.Check List
Tests
Code changes
Related changes
Release note