Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VIP unbound after more than 10s when privious leader network recovered #266

Closed
XiuhuaRuan opened this issue Oct 24, 2024 · 3 comments · Fixed by #267
Closed

VIP unbound after more than 10s when privious leader network recovered #266

XiuhuaRuan opened this issue Oct 24, 2024 · 3 comments · Fixed by #267
Assignees

Comments

@XiuhuaRuan
Copy link
Contributor

When previous leader network down, patroni failovered to new leader and vip-manager bound VIP to new leader. But when previous leader network recovered, VIP still bound on previous leader with disired state true and only unbound after more than 10s with disired state false. During this period before VIP unbound on previous leader, new connection to VIP may connect to previous leader, which was not leader any more.
Is there any solution to elminate the VIP conflict when previous leader network recovered? Thanks.

@XiuhuaRuan
Copy link
Contributor Author

I made an initial investigation that this issue may be related with the etcd clientv3 keepalive parmeters. The default value set in vip-manager is 5 seconds for DialKeepAlive and 5 seconds for DialKeepAliveTimeTimeout. So when network down and recovered, etcd client may take 10 seconds to re-establish connection with etcd endpoints. Could we decrease the default values to re-establish connection more quickly? For example, set 2 seconds for DialKeepAlive and 1 second for DialKeepAliveTimeTimeout. Looking forward to your opinion.

	DialKeepAliveTimeout: 5 * time.Second,
	DialKeepAliveTime:    5 * time.Second,

@pashagolub
Copy link
Collaborator

Well, that will certainly affect unstable connections. Meaning vipm will try to remove VIP more frequently. But I'm ok with such an aggressive settings. Would you mind to create a pull request?

@XiuhuaRuan
Copy link
Contributor Author

Thanks for your confirmation. I'm not sure about the optimal timer, will montor if any effect after this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants