Continuously attempt to unseal if sealed keys are supported #6039

sethvargo · 2019-01-14T18:53:41Z

This fixes a bug / adds a missing feature that occurs on bootstrapping an initial cluster. Given a collection of Vault nodes and an initialized storage backend, they will all go into standby waiting for initialization. After one node is initialized, the other nodes had no mechanism by which they "re-check" to see if unseal keys are present. This adds a goroutine to the server command which continually waits for unseal keys to exist. It exits in the following conditions:

the node is unsealed
the node does not support stored keys
a fatal error occurs (as defined by Vault)
the server is shutting down

In all other situations, the routine wakes up at the specified interval and attempts to unseal with the stored keys.

Why is this important?

When running on a scheduler for the first time with auto-unseal configured, Vault will start all the pods/services, but then the vault nodes all immediately enter a "deadlock" mode in which they are waiting for unseal keys (because the config says they are supported), but no keys exist because the storage backend is uninitialized. Without this change, upon initializing the storage backend, the only way to get the pods to unseal themselves is to fully restart the pod. This isn't a great experience and is incredibly difficult to automate.

Other things considered

Have a sidecar container that runs `vault init ...`

This works great in theory, but Vault's storage backends don't support CAS operations. If multiple Vault servers start at the same time (as they would when scheduled by something like Kubernetes or Nomad), the sidecar container would run vault init... on each of them. Depending on the storage backend, there's a race condition where one init process will overwrite another between the time of the init call and the time of the data persistence. This sounds rare, but in practice I've been able to reproduce it reliably 88% of the time. See also #5356.

Add / update API

I considered adding a new API like sys/unseal-check and considered updating the existing sys/unseal endpoint to not require a key parameter and instead try to unseal with stored keys. I lied, I didn't consider it, I actually wrote it first. It worked. I deployed and tested it. It just didn't "feel" correct. It's difficult to orchestrate an API call to all the Vault nodes after their deployment, and it honestly felt like an unnecessary extra step. I kept asking why Vault wouldn't just do this for me since it's already pointed at the shared storage backend. I'm not opposed to going back this route, but it felt wrong.

Check on HUP

This is probably the simplest option in terms of code changes. We simply update the reload handlers to attempt an auto-unseal. The drawbacks, however, are not as simple. Just like the new API endpoint, it's difficult to orchestrate sending signals across multiple pods. Additionally, it creates a weird edge case where someone manually sealed a Vault node, but another process was able to HUP it, effectively unsealing it. It felt a little priv-esc to me, and I think there's probably a security hole there if you look hard enough.

Do nothing

This is my least favorite option, as it pushes far too much orchestration back onto the user.

hashicorp-cla · 2019-01-14T18:53:48Z

All committers have signed the CLA.

sethvargo · 2019-01-14T18:54:24Z

command/server.go

@@ -741,7 +741,7 @@ CLUSTER_SYNTHESIS_COMPLETE:
 	// Initialize the core
 	core, newCoreError := vault.NewCore(coreConfig)
 	if newCoreError != nil {
-		if !errwrap.ContainsType(newCoreError, new(vault.NonFatalError)) {


These changes are somewhat unrelated, but the "if not contains a non-fatal error" got really confusing.

sethvargo · 2019-01-14T18:56:27Z

command/server.go

@@ -937,13 +937,40 @@ CLUSTER_SYNTHESIS_COMPLETE:
 		return 1
 	}

+	// Attempt an initial unseal.
 	if err := core.UnsealWithStoredKeys(context.Background()); err != nil {


Not sure what to do here. We already have code that attempted an unseal with stored keys. The previous method would return nil if unseal keys aren't supported. Now it returns a fatal error, so I think we need to wrap these calls with a check to see if unseal keys are supported. Additionally, I'm not sure if we need this block anyway since the goroutine will do it later. The only difference is that the goroutine won't force the CLI to exit on error (it'll just log).

sethvargo · 2019-01-14T18:58:03Z

@briankassouf @jefferai here's a fun one for ya 😄. This isn't merge-ready yet because I'm looking for feedback from you all around the best paths here.

vault/init.go

jefferai

I think this looks good except for that one comment about suspected log spamming.

The double-double negative was really confusing, and this pattern is used a few places in Vault. This negates the double negative, making the devx a bit easier to follow.

Prior to this commit, "nil" could have meant unsupported auto-unseal, a transient error, or success. This updates the function to return the correct error type, signaling to the caller whether they should retry or fail.

briankassouf

Just one question, otherwise looks great!

command/server.go

This fixes a bug that occurs on bootstrapping an initial cluster. Given a collection of Vault nodes and an initialized storage backend, they will all go into standby waiting for initialization. After one node is initialized, the other nodes had no mechanism by which they "re-check" to see if unseal keys are present. This adds a goroutine to the server command which continually waits for unseal keys to exist. It exits in the following conditions: - the node is unsealed - the node does not support stored keys - a fatal error occurs (as defined by Vault) - the server is shutting down In all other situations, the routine wakes up at the specified interval and attempts to unseal with the stored keys.

jefferai · 2019-01-23T21:33:56Z

Mergin'

jefferai · 2019-01-23T21:34:44Z

Thanks Seth!

sethvargo commented Jan 14, 2019

View reviewed changes

jefferai added this to the 1.0.3 milestone Jan 14, 2019

jefferai reviewed Jan 23, 2019

View reviewed changes

vault/init.go Show resolved Hide resolved

jefferai reviewed Jan 23, 2019

View reviewed changes

sethvargo added 3 commits January 23, 2019 13:14

Add helper for checking if an error is a fatal error

a5ea384

The double-double negative was really confusing, and this pattern is used a few places in Vault. This negates the double negative, making the devx a bit easier to follow.

Check return value of UnsealWithStoredKeys in sys/init

a5e6c60

Return proper error types when attempting unseal with stored key

5ad512c

Prior to this commit, "nil" could have meant unsupported auto-unseal, a transient error, or success. This updates the function to return the correct error type, signaling to the caller whether they should retry or fail.

jefferai previously approved these changes Jan 23, 2019

View reviewed changes

briankassouf previously approved these changes Jan 23, 2019

View reviewed changes

command/server.go Outdated Show resolved Hide resolved

sethvargo dismissed stale reviews from briankassouf and jefferai via 9e646ee January 23, 2019 21:24

jefferai merged commit f0ab6b5 into hashicorp:master Jan 23, 2019

sethvargo deleted the sethvargo/watch_init branch January 23, 2019 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuously attempt to unseal if sealed keys are supported #6039

Continuously attempt to unseal if sealed keys are supported #6039

sethvargo commented Jan 14, 2019

hashicorp-cla commented Jan 14, 2019 •

edited

Loading

sethvargo Jan 14, 2019

sethvargo Jan 14, 2019

sethvargo commented Jan 14, 2019

jefferai left a comment

briankassouf left a comment

jefferai commented Jan 23, 2019

jefferai commented Jan 23, 2019

Continuously attempt to unseal if sealed keys are supported #6039

Continuously attempt to unseal if sealed keys are supported #6039

Conversation

sethvargo commented Jan 14, 2019

Why is this important?

Other things considered

Have a sidecar container that runs vault init ...

Add / update API

Check on HUP

Do nothing

hashicorp-cla commented Jan 14, 2019 • edited Loading

sethvargo Jan 14, 2019

Choose a reason for hiding this comment

sethvargo Jan 14, 2019

Choose a reason for hiding this comment

sethvargo commented Jan 14, 2019

jefferai left a comment

Choose a reason for hiding this comment

briankassouf left a comment

Choose a reason for hiding this comment

jefferai commented Jan 23, 2019

jefferai commented Jan 23, 2019

Have a sidecar container that runs `vault init ...`

hashicorp-cla commented Jan 14, 2019 •

edited

Loading