-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a circuit breaker for http and tchannel client calls #539
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking good, left comments on a few minor things that need to be fixed.
A couple of more important things:
- the client call wrapper is not free, we should know the performance implications. Let's benchmark this change and compare with master, read more at https://github.com/uber/zanzibar/blob/master/benchmarks/README.md.
- the circuit breaker should be configurable, i.e., users can decide whether to have circuit breaker or not. That said, we can make it opt-in by default.
@@ -5,6 +5,7 @@ package {{$instance.PackageInfo.PackageName}} | |||
import ( | |||
"context" | |||
"fmt" | |||
"github.com/afex/hystrix-go/hystrix" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first import group should be stdlib, third party libraries should go into the following group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, while doing codegen we are running goimports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, but consistency is good.
runtime/gateway.go
Outdated
@@ -23,6 +23,8 @@ package zanzibar | |||
import ( | |||
"context" | |||
"fmt" | |||
"github.com/afex/hystrix-go/hystrix/metric_collector" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import grouping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
@@ -74,7 +74,7 @@ func TestCallMetrics(t *testing.T) { | |||
) | |||
assert.NoError(t, err) | |||
|
|||
numMetrics := 10 | |||
numMetrics := 14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have assertions for the new metrics, not just the count.
runtime/plugins/m3_aggregator.go
Outdated
package plugins | ||
|
||
import ( | ||
"github.com/uber-go/tally" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import grouping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
codegen/templates/http_client.tmpl
Outdated
@@ -167,7 +176,11 @@ func (c *{{$clientName}}) {{$methodName}}( | |||
} | |||
{{- end}} | |||
|
|||
res, err := req.Do() | |||
var res *zanzibar.ClientHTTPResponse | |||
err = hystrix.Do("{{$clientID}}", func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call hystrix.DoC
with the proper context here, so the context is respected.
@@ -5,6 +5,7 @@ package {{$instance.PackageInfo.PackageName}} | |||
import ( | |||
"context" | |||
"errors" | |||
"github.com/afex/hystrix-go/hystrix" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import grouping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
Copyright change is landed on master, rebase would get rid of the noises. |
################################################# benchmark output circuit_breaker################################################# Running 30s test @ http://localhost:8093/contacts/foo/contacts ################################################# benchmark output master################################################# Running 30s test @ http://localhost:8093/contacts/foo/contacts |
…ting downstream from failures/outages/high latency
dfc9e9a
to
e98e2a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a short passage in the readme doc to describe the circuit breaker feature and how to enable/disable and config?
runtime/gateway.go
Outdated
"github.com/opentracing/opentracing-go" | ||
metricCollector "github.com/afex/hystrix-go/hystrix/metric_collector" | ||
"github.com/uber/zanzibar/runtime/plugins" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary blank line
"clients.multi.ip": "127.0.0.1", | ||
"clients.multi.port": 4003, | ||
"clients.multi.timeout": 10000, | ||
"clients.multi.maxConcurrentRequests": 1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a setting where circuit breaker is disabled as a configuration example.
@@ -95,7 +95,7 @@ func TestCallMetrics(t *testing.T) { | |||
metrics := cg.M3Service.GetMetrics() | |||
// we don't care about jaeger emitted metrics | |||
for key := range metrics { | |||
if strings.HasPrefix(key, "jaeger") { | |||
if strings.HasPrefix(key, "jaeger") || strings.Contains(key, "circuitbreaker") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think circuitbreaker metrics are critical (in the sense we will create dashboard and monitor/alert).
And ideally we should have tests that covers failure scenarios to verify our integration with the library is as expected.
README.md
Outdated
@@ -140,6 +140,23 @@ func (c *quux) Echo(s string) string { return s } | |||
``` | |||
Note the type of `deps` parameter passed to `NewClient` constructor function is generated by Zanzibar, as indicated by the import path. Zanzibar takes care of initializing and passing in the acutal `deps` argument, as mentioned in [Dependency Injection](#dependency-injection). | |||
|
|||
###### Circuit Breaker | |||
For increasing overall system resiliency, zanzibar uses a [Circuit Breaker](https://msdn.microsoft.com/en-us/library/dn589784.aspx) to avoid calling clients when there is a increase in failure rate beyond a set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: an increase
README.md
Outdated
@@ -140,6 +140,23 @@ func (c *quux) Echo(s string) string { return s } | |||
``` | |||
Note the type of `deps` parameter passed to `NewClient` constructor function is generated by Zanzibar, as indicated by the import path. Zanzibar takes care of initializing and passing in the acutal `deps` argument, as mentioned in [Dependency Injection](#dependency-injection). | |||
|
|||
###### Circuit Breaker | |||
For increasing overall system resiliency, zanzibar uses a [Circuit Breaker](https://msdn.microsoft.com/en-us/library/dn589784.aspx) to avoid calling clients when there is a increase in failure rate beyond a set | |||
threshold. It recovers it after elasped time to again fo in half-open and close state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It recovers it after elasped time to again fo in half-open and close state.
Can you rework this sentence? a few typos and the information it tries to convey doesn't seem clear to me.
README.md
Outdated
``` | ||
"clients.<clientID>.errorPercentThreshold": 20 | ||
``` | ||
By default circuit breaker is enabled, if you want to disable it , set : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant white space before last ,
clients.google-now.ip: 127.0.0.1 | ||
clients.google-now.port: 14120 | ||
clients.google-now.timeout: 10000 | ||
clients.google-now.maxConcurrentRequests: 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put the disabled config here as well, to be consistent with the json config. (yaml is prefered, but zanzibar is backward compatible with json config)
@@ -3,27 +3,41 @@ clients.bar.defaultHeaders: | |||
clients.bar.ip: 127.0.0.1 | |||
clients.bar.port: 4001 | |||
clients.bar.timeout: 10000 | |||
clients.bar.maxConcurrentRequests: 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
README.md
Outdated
For increasing overall system resiliency, zanzibar uses a [Circuit Breaker](https://msdn.microsoft.com/en-us/library/dn589784.aspx) which avoids calling client when there is an increase in failure rate beyond a set | ||
threshold. After a sleepWindowInMilliseconds, client calls are attempted recovery by going in half-open and then close state. | ||
|
||
circuitBreakerDisabled: Default false. Disables the circuit-breaker: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disables -> To disable
README.md
Outdated
"clients.<clientID>.circuitBreakerDisabled" : true | ||
``` | ||
|
||
maxConcurrentRequests: Default 50. Sets how many requests can be run at the same time, beyond which requests are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sets -> To set
Same for the following 3 configurations
Added a circuit breaker for http and tchannel client calls for preventing downstream from failures/outages/high latency