Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend failure during the re-election process causes Vault cluster to become unusable. #4491

Closed
vespian opened this issue Apr 30, 2018 · 11 comments

Comments

@vespian
Copy link
Contributor

vespian commented Apr 30, 2018

Environment:

  • 3 Vault instances backed by 3-node ZK quorum
  • Vault Version: Vault v0.8.3 ('84ec52693502eef900cc531c59f0670bd3f5cff3')
  • Operating System/Architecture: x86_64

Vault Config File:

disable_mlock = true

backend "zookeeper" {
  address = "zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181,zk-4.zk:2181,zk-5.zk:2181"
  advertise_addr = "https://172.16.0.1:8200"
  path = "dcos/vault/default"
  znode_owner = "digest:<some-user>:<some-pass>"
  auth_info = "digest:<some-user>:<some-pass>"
}

listener "tcp" {
  address = "127.0.0.1:8200"
  cluster_address = "172.16.0.1:8201"
  tls_disable = 1
}

Startup Log Output:

Apr 27 16:16:28 master1 vault[22918]: ==> Vault server configuration:
Apr 27 16:16:28 master1 vault[22918]: Cgo: disabled
Apr 27 16:16:28 master1 vault[22918]: Cluster Address: https://172.16.0.1:8201
Apr 27 16:16:28 master1 vault[22918]: Listener 1: tcp (addr: "127.0.0.1:8200", cluster address: "172.16.0.1:8201", tls: "disabled")
Apr 27 16:16:28 master1 vault[22918]: Log Level: trace
Apr 27 16:16:28 master1 vault[22918]: Mlock: supported: true, enabled: false
Apr 27 16:16:28 master1 vault[22918]: Redirect Address: https://172.16.0.1:8200
Apr 27 16:16:28 master1 vault[22918]: Storage: zookeeper (HA available)
Apr 27 16:16:28 master1 vault[22918]: Version: Vault v0.8.3
Apr 27 16:16:28 master1 vault[22918]: Version Sha: 84ec52693502eef900cc531c59f0670bd3f5cff3
Apr 27 16:16:28 master1 vault[22918]: ==> Vault server started! Log data will stream in below:
Apr 27 16:16:28 master1 vault[22918]: 2018/04/27 16:16:28.169802 [TRACE] physical/cache: creating LRU cache: size=32768
Apr 27 16:16:28 master1 vault[22918]: 2018/04/27 16:16:28.169980 [TRACE] cluster listener addresses synthesized: cluster_addresses=[172.16.0.1:8201]

Expected Behavior:
Zookeeper/backend issues during leader reelection do not cause Vault cluster to become unreachable.

Actual Behavior:
Please see the logs from the moment of a failure and imediatelly after:

Apr 27 16:22:00 master1 vault[22918]: 2018/04/27 16:22:00.228100 [TRACE] forwarding: successful heartbeat
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.648177 [WARN ] transport: http2Client.notifyError got notified that the client transport was broken EOF.
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.648284 [TRACE] core: creating rpc dialer: host=fw-b34dfde0-ca7c-7bd2-dfdf-9e2ce8e08425
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.649161 [WARN ] grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 172.16.0.2:8201: getsockopt: connection refused"; Reconnecting to
{172.16.0.2:8201 0  <nil>}
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.662772 [INFO ] core: acquired lock, enabling active operation
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.664012 [TRACE] core: generating cluster private key
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.674921 [TRACE] core: generating local cluster certificate
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.704861 [INFO ] core: post-unseal setup starting
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.704886 [TRACE] core: clearing forwarding clients
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.704907 [TRACE] core: done clearing forwarding clients
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.704919 [TRACE] forwarding: stopping heartbeating
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.705047 [WARN ] get error from resetTransport context canceled, transportMonitor returning
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.705066 [WARN ] grpc: addrConn.transportMonitor exits due to: context canceled
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.705354 [INFO ] core: loaded wrapping token key
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.705378 [INFO ] core: successfully setup plugin catalog: plugin-directory=
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.712728 [INFO ] core: successfully mounted backend: type=kv path=secret/
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.712957 [INFO ] core: successfully mounted backend: type=system path=sys/
Apr 27 16:22:13 master1 vault[22918]: 2018/04/27 16:22:13.712994 [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14 Recv loop terminated: err=EOF
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14 Send loop terminated: err=<nil>
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14.068094 [INFO ] core: pre-seal teardown starting
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14.068114 [INFO ] core: cluster listeners not running
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14.068203 [INFO ] core: pre-seal teardown complete
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14.068237 [ERROR] core: post-unseal setup failed: error=error fetching response-wrapping policy from store: failed to read policy: zk: connection closed
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14 Failed to connect to 172.16.0.2:2181: dial tcp 172.16.0.2:2181: getsockopt: connection refused
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14 Connected to 172.16.0.3:2181
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14 Authentication failed: EOF
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14 Connected to 172.16.0.1:2181
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14 Authentication failed: EOF
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14 Connected to 172.16.0.3:2181
Apr 27 16:22:14 master1 vault[22918]: 2018/04/27 16:22:14 Authentication failed: EOF
Apr 27 16:22:15 master1 vault[22918]: 2018/04/27 16:22:15 Connected to 172.16.0.1:2181
Apr 27 16:22:15 master1 vault[22918]: 2018/04/27 16:22:15 Authentication failed: EOF
Apr 27 16:22:15 master1 vault[22918]: 2018/04/27 16:22:15 Failed to connect to 172.16.0.2:2181: dial tcp 172.16.0.2:2181: getsockopt: connection refused
Apr 27 16:22:15 master1 vault[22918]: 2018/04/27 16:22:15 Connected to 172.16.0.3:2181
Apr 27 16:22:15 master1 vault[22918]: 2018/04/27 16:22:15 Authentication failed: EOF
Apr 27 16:22:15 master1 vault[22918]: 2018/04/27 16:22:15 Connected to 172.16.0.1:2181
Apr 27 16:22:15 master1 vault[22918]: 2018/04/27 16:22:15 Authentication failed: EOF
Apr 27 16:22:15 master1 vault[22918]: 2018/04/27 16:22:15 Connected to 172.16.0.3:2181
Apr 27 16:22:15 master1 vault[22918]: 2018/04/27 16:22:15 Authentication failed: EOF
Apr 27 16:22:15 master1 vault[22918]: 2018/04/27 16:22:15.154421 [ERROR] core: failed to acquire lock: error=zk: could not connect to a server
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16 Connected to 172.16.0.1:2181
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16 Authenticated: id=99932284807544887, timeout=4000
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16 Re-submitting `1` credentials after reconnect
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16.159659 [TRACE] core: found new active node information, refreshing
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16.160189 [TRACE] core: parsing information for new active node: active_cluster_addr=https://172.16.0.1:8201 active_redirect_addr=https://172.16.0.1:8200
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16.160435 [TRACE] core: refreshing forwarding connection
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16.160448 [TRACE] core: clearing forwarding clients
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16.160461 [TRACE] core: done clearing forwarding clients
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16.160508 [TRACE] core: done refreshing forwarding connection
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16.160601 [TRACE] core: creating rpc dialer: host=fw-7def624e-a79c-8092-d6a2-02d95289b43c
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16.160823 [WARN ] grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 172.16.0.1:8201: getsockopt: connection refused"; Reconnecting to {:
172.16.0.1:8201 0  <nil>}
Apr 27 16:22:16 master1 vault[22918]: 2018/04/27 16:22:16.160851 [DEBUG] forwarding: error sending echo request to active node: error=rpc error: code = Unavailable desc = grpc: the connection is unavailable
Apr 27 16:22:17 master1 vault[22918]: 2018/04/27 16:22:17.160742 [TRACE] core: creating rpc dialer: host=fw-7def624e-a79c-8092-d6a2-02d95289b43c
Apr 27 16:22:17 master1 vault[22918]: 2018/04/27 16:22:17.160953 [WARN ] grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 172.16.0.1:8201: getsockopt: connection refused"; Reconnecting to {172.16.0.1:8201 0  <nil>}
Apr 27 16:22:18 master1 vault[22918]: 2018/04/27 16:22:18.562972 [TRACE] core: creating rpc dialer: host=fw-7def624e-a79c-8092-d6a2-02d95289b43c
Apr 27 16:22:18 master1 vault[22918]: 2018/04/27 16:22:18.563191 [WARN ] grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 172.16.0.1:8201: getsockopt: connection refused"; Reconnecting to {172.16.0.1:8201 0  <nil>}
Apr 27 16:22:21 master1 vault[22918]: 2018/04/27 16:22:21.516748 [TRACE] core: creating rpc dialer: host=fw-7def624e-a79c-8092-d6a2-02d95289b43c
Apr 27 16:22:21 master1 vault[22918]: 2018/04/27 16:22:21.516996 [WARN ] grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 172.16.0.1:8201: getsockopt: connection refused"; Reconnecting to {172.16.0.1:8201 0  <nil>}

Steps to Reproduce:
I belive that the problem occurs when a ZK failure occurs right after the new leader was advertised to the Vault cluster but before leader acquiring process has finished [1].

[1] https://github.com/hashicorp/vault/blob/v0.8.3/vault/core.go#L1583-L1605

Possible root cause:

My theory is that the ZK failure was long enough to not to allow subsequent lock.Unlock() execute
in the c.postUnseal() error handling block but not enough to expire ephemeral nodes made by ZK lock - other Vault instances were not allowed to perform reelection as the leader lock is still there.

Is this something well know? If not - could you please advice - whether the theory/idea for the root cause of this bug is correct and/or whether this is something that could/should be fixed in the vault/core.go module itself?

Thanks in advance.

@jefferai
Copy link
Member

You may want to try more recent versions of Vault that have updated zk libraries to see if any of those fixes help this issue.

@vespian
Copy link
Contributor Author

vespian commented Apr 30, 2018

@jefferai Do you mean #3896?

If so, we are already running these fixes, they were submitted by one of my colleagues.

@jefferai
Copy link
Member

jefferai commented May 2, 2018

If the error was ephermeral it seems like the zk library or the zk physical implementation should be handling retries there. Additionally the logic in core has changed somewhat in more recent versions of Vault so I don't know if that would make a difference. Vault is ignoring any error coming from the Unlock call, and while I don't konw the genesis of that, my guess is that it's unclear what the right path is if an error is returned.

@mhrabovcin
Copy link
Contributor

Vault is ignoring any error coming from the Unlock call, and while I don't konw the genesis of that, my guess is that it's unclear what the right path is if an error is returned.

This can definitely happen and vault is accepting here that lock was never released (so every instance in a cluster cannot get hold of a new lock) and vault instance that was a leader is stepping down. This causes a deadlock and only restart of the vault would resolve this situation.

@jefferai
Copy link
Member

jefferai commented May 4, 2018

If you have a proposal for addressing this I'm happy to hear it, but I don't think having vault spin on the Unlock call is the right answer. Maybe for a few retries, but if you're trying to shut down Vault, having it hang for potentially long periods of time isn't great.

@mhrabovcin
Copy link
Contributor

mhrabovcin commented May 7, 2018

@jefferai I agree that trying to repeat unlock multiple times wouldn't be acceptable. I'd propose to let each backend handle potential retries. vault should end its operation with os.Exit() with non-zero code when the backend unlock fails too. Vault is usually launched by systemd or some scheduling system and it would probably get restarted. This case can happen only in the HA mode and so there should be other instances ready to step in and resume correct operation.

To be more clear I'd like to replace not looking at lock.Unlock output with:

if lock.Unlock() != nil { 
  os.Exit(1)
}

@jefferai
Copy link
Member

jefferai commented May 7, 2018

It's more complicated than that -- we can't simply os.Exit because there are important cleanup operations that can happen during normal shutdown. I think instead it would have to trigger shutdown and pass along an appropriate error.

However, I also don't quite understand how this would help. If the lock is stuck, why does restarting Vault matter, instead of simply trying to create a new one from the host in a running Vault? It sounds like in your scenario the lock is able to be grabbed again by the same host, but then it shouldn't reqiure a restart of Vault to do so.

@mhrabovcin
Copy link
Contributor

However, I also don't quite understand how this would help.

I wasn't clear about that in my comment. This would help in the ZK backend case. I am not familiar with other backends implementations. A ZK lock is implemented by creating a ZK ephemeral node. While a ZK client is connected to ZK, the ZK keeps node around until client disconnects. However there is a SessionTimeout variable that would keep ephemeral node around after short client disconnects.

This is what is happening in a case that @vespian described. The vaults ZK client looses connection to a ZK and tries to remove the lock. Lock removal fails. Vault ignores the error and steps down as a leader. However if underlying ZK client in vault connects back to ZK within the SessionTimeout the node will not get removed.

Here restarting the vault would give a ZK client new session and old one would get expired and the leader election node would get removed.

@jefferai
Copy link
Member

jefferai commented May 7, 2018

It feels like the better option is to explicitly close the connection after a step down (perhaps only in the case that the lock fails). Forcing a restart of Vault also means forcing a re-unseal of Vault...it's a very big hammer.

@mhrabovcin
Copy link
Contributor

Proposed solution in #4569

@jefferai
Copy link
Member

Closed by #4569

victorr added a commit that referenced this issue Aug 24, 2023
* Seal HA: Use new SealWrappedValue type to abstract seal wrapped values

Introduce SealWrappedValue to abstract seal wrapped values.

Make SealWrappedValue capable of marshalling into a BlobInfo, when there is
plaintext or a single encryption, or to a custom serialization consisting of a
header, length and a marshalled MultiWrapValue protobuf.

* Vault-13769: Support configuring and using multiple seals for unsealing

* Make sealWrapBackend start using multiple seals

* Make seal.Access no longer implement wrapping.Wrapper.

Instead, add the Encrypt and Decrypt methods to the Access interface.

* Make raft snapshot system use funcs SealWrapValue + UnsealWrapValue.

Move the snapshot.Sealer implementation to the vault package to
avoid circular imports.

* Update sealWrapBackend to use multiple seals for encryption.

Use all the encryption wrappers when storing seal wrapped values.

Try do decrypt using the highest priority wrapper, but try all
combinations of encrypted values and wrappers if necessary.

* Allow the use of multiple seals for entropy augmentation

Add seal_name variable in entropy stanza
Add new MultiSourcer to accommodate the new entropy augmentation behavior.

* Individually health check each wrapper, and add a sys/seal-backend-status endpoint.

* Address a race, and also a failed test mock that I didn't catch

* Track partial wrapping failures...

... where one or more but not all access.Encrypts fail for a given write.
Note these failures by adding a time ordered UUID storage entry containing
the path in a special subdirectory of root storage. Adds a callback
pattern to accomplish this, with certain high value writes like initial
barrier key storage not allowing a partial failure. The followup work
would be to detect return to health and iterate through these storage
entries, rewrapping.

* Add new data structure to track seal config generation (#4492)

* Add new data structure to track seal config generation

* Remove import cycle

* Fix undefined variable errors

* update comment

* Update setSeal response

* Fix setSealResponse in operator_diagnose

* Scope the wrapper health check locks individually (#4491)

* Refactor setSeal function in server.go. (#4505)

Refactor setSeal function in server.go.

* Decouple CreateSecureRandomReaderFunc from seal package.

Instead of using a list of seal.SealInfo structs, make
CreateSecureRandomReaderFunc use a list of new EntropySourcerInfo structs. This
brakes the denpency of package configutil on the seal package.

* Move SealGenerationInfo tracking to the seal Access.

* Move SealGenerationInfo tracking to the seal Access.

The SealGenerationInfo is now kept track by a Seal's Access instead of by the
Config object. The access implementation now records the correct generation
number on seal wrapped values.

* Only store and read SealGenerationInfo if VAULT_ENABLE_SEAL_HA_BETA is true.

* Add MultiWrapValue protobuf message

MultiWrapValue can be used to keep track of different encryptions of a value.
---------

Co-authored-by: Victor Rodriguez <vrizo@hashicorp.com>

* Use generation to determine if a seal wrapped value is up-to-date. (#4542)

* Add logging to seal Access implementation.

* Seal HA buf format run (#4561)

* Run buf format.

* Add buf.lock to ensure go-kms-wrapping module is imported.

* Vault-18958: Add unit tests for config checks

* Add safety logic for seal configuration changes

* Revert "Add safety logic for seal configuration changes"

This reverts commit 7fec48035a5cf274e5a4d98901716d08d766ce90.

* changes and tests for checking seal config

* add ent tests

* remove check for empty name and add type into test cases

* add error message for empty name

* fix no seals test

---------

Co-authored-by: divyapola5 <divya@hashicorp.com>

* Handle migrations between single-wrapper and multi-wrapper autoSeals

* Extract method SetPhysicalSealConfig.

* Extract function physicalSealConfig.

The extracted function is the only code now reading SealConfig entries from
storage.

* Extract function setPhysicalSealConfig.

The extracted function is the only code now writing SealConfig entries from
storage (except for migration from the old recovery config path).

* Move SealConfig to new file vault/seal_config.go.

* Add SealConfigType quasy-enumeration.

SealConfigType is to serve as the typed values for field SealConfig.Type.

* Rename Seal.RecoveryType to RecoverySealConfigType.

Make RecoverySealConfigType  return a SealConfigType instead of a string.

* Rename Seal.BarrierType to BarrierSealConfigType.

Make BarrierSealConfigType return a SealConfigType.

Remove seal.SealType (really a two-step rename to SealConfigType).

* Add Seal methods ClearBarrierConfig and ClearRecoveryConfig.

* Handle autoseal <-> multiseal migrations.

While going between single-wrapper and multiple-wrapper autoseals are not
migrations that require an unwrap seal (such as going from shamir to autoseal),
the stored "barrier" SealConfig needs to be updated in these cases.

Specifically, the value of SealConfg.Type is "multiseal" for autoSeals that have
more than one wrapper; on the other hand, for autoseals with a single wrapper,
SealConfig.Type is the type of the wrapper.

* Remove error return value from NewAutoSeal constructor.

* Automatically rewrap partially seal wrapped values on an interval

* Add in rewrapping of partially wrapped values on an interval, regardless of seal health/status.

* Don't set SealGenerationInfo Rewrapped flag in the partial rewrap call.

* Unexport the SealGenerationInfo's Rewrapped field, add a mutex to it for thread safe access, and add accessor methods for it.

* Add a success callback to the manual seal rewrap process that updates the SealGenerationInfo's rewrapped field. This is done via a callback to avoid an import cycle in the SealRewrap code.

* Fix a failing seal wrap backend test which was broken by the unexporting of SealGenerationInfo's Rewrapped field.

* Nil check the seal rewrap success callback before calling it.

* Change SealGenerationInfo rewrapped parameter to an atomic.Bool rather than a sync.RWMutex for simplicity and performance.

* Add nil check for SealAccess before updating SealGenerationInfo rewrapped status during seal rewrap call.

* Update partial rewrap check interval from 10 seconds to 1 minute.

* Update a reference to SealGenerationInfo Rewrapped field to use new getter method.

* Fix up some data raciness in partial rewrapping.

* Account for possibly nil storage entry when retrieving partially wrapped value.

* Allow multi-wrapper autoSeals to include disabled seal wrappers.

* Restore propagation of wrapper configuration errors by setSeal.

Function setSeal is meant to propagate non KeyNotFound errors returned by calls
to configutil.ConfigureWrapper.

* Remove unused Access methods SetConfig and Type.

* Allow multi-wrapper autoSeals to include disabled seal wrappers.

Make it possible for an autoSeal that uses multiple wrappers to include disabled
wrappers that can be used to decrypt entries, but are skipped for encryption.
e an unwrapSeal when there are disabled seals.

* Fix bug with not providing name (#4580)

* add suffix to name defaults

* add comment

* only change name for disabled seal

* Only attempt to rewrap partial values when all seals are healthy.

* Only attempt to rewrap partial values when all seals are healthy.

* Change logging level from info to debug for notice about rewrap skipping based on seal health.

* Remove stale TODOs and commented out code.

---------

Co-authored-by: rculpepper <rculpepper@hashicorp.com>
Co-authored-by: Larroyo <95649169+DeLuci@users.noreply.github.com>
Co-authored-by: Scott G. Miller <smiller@hashicorp.com>
Co-authored-by: Divya Pola <87338962+divyapola5@users.noreply.github.com>
Co-authored-by: Matt Schultz <matt.schultz@hashicorp.com>
Co-authored-by: divyapola5 <divya@hashicorp.com>
Co-authored-by: Rachel Culpepper <84159930+rculpepper@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants