Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add POC for discussion around CAS method on backends #5356

Closed
wants to merge 1 commit into from
Closed

Add POC for discussion around CAS method on backends #5356

wants to merge 1 commit into from

Conversation

sethvargo
Copy link
Contributor

This is more of a POC than a PR, but I find it easiest to discuss in code.

One of the challenges with initialization currently is that if two Vault servers simultaneously try to initialize the same storage backend, there's a race and they might both initialize, overwriting the other one. Then you have Vault servers with in-memory keys that they think are correct, but have actually been overwritten by another Vault server that tried to initialize at the same time.

As such, I propose added a PutCAS method on physical.Backend (in the form of an optionally implemented interface). If implemented, operations like init would use this CAS operation instead of the regular put operation. This would prevent overwriting and would at least return an error (that we could retry on).

While the init function does first check if the backend is initialized before proceeding, it's possible that another backend initializes between that check and when the actual initialization call is made a few lines down. Adding a CAS method that backends could implement would make some backends safer to start a collection of Vault servers in parallel.

/cc @jefferai @chrishoffman

@chrishoffman
Copy link
Contributor

I'd like to understand this use case a little more. Considering that this would need to be implemented at every part of the physical layer (cache, WAL, etc) and the current proposed use case could be solved with a sleep or some other simple orchestration, I'm not sure it makes sense. I would also be curious on what physical backends actually support a CAS Put operation.

@sethvargo
Copy link
Contributor Author

Hey @chrishoffman

current proposed use case could be solved with a sleep or some other simple orchestration

I don't think it's fair to call it "simple". Each Vault implementer must use some kind of globally available locking mechanism to accomplish this. Even with Consul, it requires a somewhat nasty script with combining consul lock and init checks. It would be better for Vault to support this first hand, especially given HashiCorp's increased investment in the K8S world (where it's feasible that multiple vault pods will be started/restarted simultaneously).

I would also be curious on what physical backends actually support a CAS Put operation.

For now, we'd just add the functionality to the interface. Any HashiCorp backends would likely be updated by you all. We'd submit PRs to update the Google ones, same as the other community backends (maybe a post in the Vault mailing list noting the availability of this functionality). I view this the same as "HA"; some backends won't support it and that's okay. We just document it.

The ones I know for sure are:

  • Consul
  • GCS
  • Spanner
  • DynamoDB

I'm sure there are more, but those I know support CAS (or CAS-like) operations today.

@sethvargo
Copy link
Contributor Author

This isn't really relevant anymore

@sethvargo sethvargo closed this Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants