Support for exponential backup logic on retries #65

dagen · 2024-05-16T19:54:37Z

Is your feature request related to a problem? Please describe.
We operate a large Vault Enterprise cluster and we see significant spikes in requests/sec when some teams refresh their infrastructure (apply a high state to ~10k virtual machines). This results in spikes of a few thousand requests/sec which impacts Vault's performance for other clients.

Describe the solution you'd like
HashiCorp Vault Enterprise supports a rate limiting feature - resource quotas. When implemented, requests to a resource that exceed the limit receive a HTTP Response Code of 429. Example:

Code: 429. Errors:* request path "transit/encrypt/test": rate limit quota exceeded

I'm requesting to support retry logic on reads (and writes) from minions when receiving a 429 response code so that the state apply can gracefully handle situations where the Vault cluster is under too much load. Obviously, you need an 'exit' condition if the 429 response codes persist. I recommend retrying 5 times with increasing delays in http requests to the vault server. (0.2s, 0.3s, 0.5s, 0.8s, 1.3s, exit).

Describe alternatives you've considered
Additional solutions involve asking teams with large infrastructure to 'stage' their rollouts to a few hundred instances at a time. We've also considered providing a dedicated Vault cluster for this one infrastructure team, but that means additional operating costs (cloud spend, observability, alerting, configuration management, etc.).

Additional context

The text was updated successfully, but these errors were encountered:

lkubb · 2024-05-16T22:35:29Z

Agreed, this (= retrying retryable errors with exponential backoff) should be supported.

A nice-to-have would be to respect the Retry-After header, which can be enabled in Vault: https://developer.hashicorp.com/vault/api-docs/system/quotas-config#enable_rate_limit_response_headers

Fixes #65

lkubb self-assigned this May 16, 2024

lkubb mentioned this issue May 18, 2024

Support retries and connection settings #66

Merged

3 tasks

lkubb closed this as completed in #66 May 19, 2024

lkubb added a commit that referenced this issue May 19, 2024

Merge pull request #66 from lkubb/rate-retry

9794f02

Fixes #65

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for exponential backup logic on retries #65

Support for exponential backup logic on retries #65

dagen commented May 16, 2024

lkubb commented May 16, 2024 •

edited

Loading

Support for exponential backup logic on retries #65

Support for exponential backup logic on retries #65

Comments

dagen commented May 16, 2024

lkubb commented May 16, 2024 • edited Loading

lkubb commented May 16, 2024 •

edited

Loading