-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdutil: add dial keep alive params to switch connect as soon as possible #6059
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #6059 +/- ##
=======================================
Coverage 73.96% 73.97%
=======================================
Files 385 385
Lines 37973 37982 +9
=======================================
+ Hits 28087 28096 +9
- Misses 7397 7399 +2
+ Partials 2489 2487 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
pkg/utils/etcdutil/etcdutil.go
Outdated
AutoSyncInterval: autoSyncInterval, | ||
TLS: tlsConfig, | ||
LogConfig: &lgc, | ||
DialKeepAliveTime: 10 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it have the config item in before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add these configs to avoid failed endponit cannot switched, refer to etcd-io/etcd#7941 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value size is refered to https://github.com/tikv/pd/blob/master/server/region_syncer/client.go#L38-L39
How about add one unit test for testing the leader's network has been isolated? |
@@ -41,6 +41,13 @@ const ( | |||
// defaultAutoSyncInterval is the interval to sync etcd cluster. | |||
defaultAutoSyncInterval = 60 * time.Second | |||
|
|||
// defaultDialKeepAliveTime is the time after which client pings the server to see if transport is alive. | |||
defaultDialKeepAliveTime = 10 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the default value if we don't set them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zero
add a test to ingest delay by tcp reverse proxy |
…sible Signed-off-by: lhy1024 <admin@liudos.us>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left few comments
@binshi-bing: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a critical comment
require.NoError(t, failpoint.Disable("github.com/tikv/pd/pkg/utils/etcdutil/autoSyncInterval")) | ||
func ioCopy(dst io.Writer, src io.Reader, enableDiscard *atomic.Bool) (err error) { | ||
buffer := make([]byte, 32*1024) | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a dead loop when src.Read(buffer) returns non-zero, EOF then next Read returns 0, EOF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just direct return
Signed-off-by: lhy1024 <admin@liudos.us>
There seems to be a file permission problem in ci.
I try to run tests in dev env and it always was successful.
|
Signed-off-by: lhy1024 <admin@liudos.us>
/merge |
@lhy1024: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 975a400
|
@lhy1024: Your PR was out of date, I have automatically updated it for you. If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
What problem does this PR solve?
Issue Number: Close #6053
What is changed and how does it work?
After #6046, we support multi endpoint client, but it can not switch connect as soon as possible when endpoint hang. so in this pr I add timeout param.
Check List
Tests
In the same network isolation test
Before #6046
After #6046
After #6059
Release note