Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vault: recover from standby losing etcd lease (#3031) #3511

Merged
merged 2 commits into from
Nov 3, 2017

Conversation

bhiggins
Copy link
Contributor

This change makes these errors transient instead of permanent:

[ERROR] core: failed to acquire lock: error=etcdserver: requested lease not found

After this change, there can still be one of these errors when a
standby vault that lost its lease tries to become leader, but on the
next lock acquisition attempt a new session will be created. With this
new session, the standby will be able to become the leader.

@bhiggins
Copy link
Contributor Author

Fixes #3031

}

if needNewSession {
session, err := concurrency.NewSession(c.etcd, concurrency.WithTTL(etcd3LockTimeoutInSeconds))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the code to line 273 directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@xiang90
Copy link
Contributor

xiang90 commented Oct 31, 2017

@bhiggins the fix looks right. can you simplify the code? have you tested it manually? if yes, then it looks good to me.

@jefferai jefferai added this to the 0.8.4 milestone Oct 31, 2017
@xiang90
Copy link
Contributor

xiang90 commented Oct 31, 2017

@bhiggins thanks!

This change makes these errors transient instead of permanent:

[ERROR] core: failed to acquire lock: error=etcdserver: requested lease not found

After this change, there can still be one of these errors when a
standby vault that lost its lease tries to become leader, but on the
next lock acquisition attempt a new session will be created. With this
new session, the standby will be able to become the leader.
@bhiggins
Copy link
Contributor Author

Simplified code as requested.

Yes, I tested this patch against 0.7.3 by having three vaults running in HA config. I manually deleted the standby's leases out of etcd, and then killed the leader. After that, as described in the commit message, the "lease not found" error occurs once but then on the next lock attempt a new session is created and the standby becomes the leader.

@xiang90
Copy link
Contributor

xiang90 commented Oct 31, 2017

lgtm. defer to @jefferai

@jefferai
Copy link
Member

jefferai commented Nov 3, 2017

Hah, it's the other way, when it comes to etcd3 I defer to @xiang90 :-D

Merging.

@jefferai jefferai merged commit 3d51b92 into hashicorp:master Nov 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants