You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We operate a large Vault Enterprise cluster and we see significant spikes in requests/sec when some teams refresh their infrastructure (apply a high state to ~10k virtual machines). This results in spikes of a few thousand requests/sec which impacts Vault's performance for other clients.
Describe the solution you'd like
HashiCorp Vault Enterprise supports a rate limiting feature - resource quotas. When implemented, requests to a resource that exceed the limit receive a HTTP Response Code of 429. Example:
I'm requesting to support retry logic on reads (and writes) from minions when receiving a 429 response code so that the state apply can gracefully handle situations where the Vault cluster is under too much load. Obviously, you need an 'exit' condition if the 429 response codes persist. I recommend retrying 5 times with increasing delays in http requests to the vault server. (0.2s, 0.3s, 0.5s, 0.8s, 1.3s, exit).
Describe alternatives you've considered
Additional solutions involve asking teams with large infrastructure to 'stage' their rollouts to a few hundred instances at a time. We've also considered providing a dedicated Vault cluster for this one infrastructure team, but that means additional operating costs (cloud spend, observability, alerting, configuration management, etc.).
Additional context
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
We operate a large Vault Enterprise cluster and we see significant spikes in requests/sec when some teams refresh their infrastructure (apply a high state to ~10k virtual machines). This results in spikes of a few thousand requests/sec which impacts Vault's performance for other clients.
Describe the solution you'd like
HashiCorp Vault Enterprise supports a rate limiting feature - resource quotas. When implemented, requests to a resource that exceed the limit receive a HTTP Response Code of 429. Example:
Code: 429. Errors:* request path "transit/encrypt/test": rate limit quota exceeded
I'm requesting to support retry logic on reads (and writes) from minions when receiving a 429 response code so that the state apply can gracefully handle situations where the Vault cluster is under too much load. Obviously, you need an 'exit' condition if the 429 response codes persist. I recommend retrying 5 times with increasing delays in http requests to the vault server. (0.2s, 0.3s, 0.5s, 0.8s, 1.3s, exit).
Describe alternatives you've considered
Additional solutions involve asking teams with large infrastructure to 'stage' their rollouts to a few hundred instances at a time. We've also considered providing a dedicated Vault cluster for this one infrastructure team, but that means additional operating costs (cloud spend, observability, alerting, configuration management, etc.).
Additional context
The text was updated successfully, but these errors were encountered: