-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource-based circuit breaking #3332
Comments
IMO this is best done as a dedicated filter, as the circuit breaking is a bit different from what we currently do and I think it's pretty self contained. I think I would make this a general resource based ingress circuit breaking filter that could be eventually extended to memory and other things. As long as we have the right platform abstractions for getting the information we need I think this sounds like a very useful feature to add. |
I think we may also want to tie this in to the centralized system for #373. I can imagine hitting some threshold (event loop time?) at which we simply stop accepting new requests so we can make forward progress on existing ones. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
came across this issue and wondered about the implementation part in the context of containers and running envoy as a sidecar in an ecs task or in k8s pod, one way to do this would be to share process namespace, that way the envoy can track cpu of the other container using but this solutions adds too many requirements on the end user like having to share process namespace, run envoy as root user, allow envoy to access entire disk of the main container. is there any other easy/secure way to track resources of system in a container based system? |
Description
Envoy should have the ability to circuit break on system resources like CPU.
Circuit breakers at ingress are used to protect our hosts from resource exhaustion. To determine circuit breaker thresholds, we run a "redline" test, which increasingly ramps traffic on a single host until it degrades. We note
rq_active
, then set the threshold less some buffer.Over time this ends up being a poor approximation for the real bottleneck for most of our services, CPU:
Working out the platform-dependent implementation and the algorithm will be the fun part. I'd like to get a first impression from other users before getting into that.
The text was updated successfully, but these errors were encountered: