Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve degradation mechanism of RT-based circuit breaking #2123

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

single-wolf
Copy link

@single-wolf single-wolf commented Apr 11, 2021

Describe what this PR does / why we need it

Improve degradation mechanism of RT-based circuit breaking, support more timely downgrades in some bad situations.
e.g. If something went wrong which may cause RT down to more than 10s (100ms for normal), the RT-based breaker will cut off the traffic after 10s.

Does this pull request fix one issue?

Fixes #1405

Describe how you did it

  1. Construct SlowRequestLeapArray , ensure windowLengthInMs is fixed to statIntervalMs and intervalInMs is just happen to greater than maxAllowedRt.
  2. SlowRequestCounter in SlowRequestLeapArray should count inflight entries, increment counts when entries try pass and decrement counts when entries exit. Each SlowRequestCounter should track the predecessor deprecated bucket when it reset.
  3. Get current window using create timestamp when entries try pass, then check previous SlowRequestCounters in the deprecated buckets just once.
  4. Get current window using create timestamp when entries exit, if that window is not deprecated and rt > maxAllowedRt, then check current SlowRequestCounter.

Status of the SlowRequestCounter

  • UNCHECKED:indicate previous counts has not been checked. status after new and reset
  • CHECKED_BY_ENTRY:indicate previous count has been checked by a successor entry.
  • CHECKED_BY_SLOW:indicate current count has been checked by slowly exit or a no successor entry.

Normal status flow : UNCHECKED -> CHECKED_BY_ENTRY -> UNCHECKED -> CHECKED_BY_ENTRY
Slow RT status flow:UNCHECKED -> CHECKED_BY_ENTRY -> CHECKED_BY_SLOW
Simplify the status control (2021-04-19)

Describe how to verify it

Run the test cases.

Special notes for reviews

Should consider is it works well at some concurrency scenarios not considered.

…ing previous buckets when the entry tries to pass (alibaba#1405)

Signed-off-by: Jerry.Zhong <15951609026@163.com>
Signed-off-by: Jerry.Zhong <15951609026@163.com>
@CLAassistant
Copy link

CLAassistant commented Apr 11, 2021

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

…edRt is too lesser than statIntervalMs.

Signed-off-by: Jerry.Zhong <15951609026@163.com>
@sczyh30 sczyh30 added the area/circuit-breaking Issues or PRs related to circuit breaking label Apr 12, 2021
Signed-off-by: Jerry.Zhong <15951609026@163.com>
Signed-off-by: Jerry.Zhong <15951609026@163.com>
@single-wolf
Copy link
Author

Any suggestions? @sczyh30
The Travis CI build seems failed caused by some weird test cases that's not related to this PR ;-(

@sczyh30 sczyh30 added kind/enhancement Category issues or prs related to enhancement. to-review To review labels Apr 27, 2021
@sczyh30
Copy link
Member

sczyh30 commented Apr 27, 2021

Any suggestions? @sczyh30
The Travis CI build seems failed caused by some weird test cases that's not related to this PR ;-(

I'v re-triggered the CI. I'll take a review for this PR these days. Also cc @cdfive @jasonjoo2010

@sczyh30 sczyh30 self-assigned this Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/circuit-breaking Issues or PRs related to circuit breaking kind/enhancement Category issues or prs related to enhancement. to-review To review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Degrade by RT should consider inflight request
3 participants