Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a circuit breaker for http and tchannel client calls #539

Merged
merged 12 commits into from
Jan 17, 2019

Conversation

abhishekparwal
Copy link
Contributor

Added a circuit breaker for http and tchannel client calls for preventing downstream from failures/outages/high latency

@CLAassistant
Copy link

CLAassistant commented Jan 8, 2019

CLA assistant check
All committers have signed the CLA.

@coveralls
Copy link

coveralls commented Jan 8, 2019

Coverage Status

Coverage increased (+0.4%) to 68.629% when pulling fd04ba9 on circuit_breaker into 6f0903f on master.

Copy link
Contributor

@ChuntaoLu ChuntaoLu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good, left comments on a few minor things that need to be fixed.

A couple of more important things:

  • the client call wrapper is not free, we should know the performance implications. Let's benchmark this change and compare with master, read more at https://github.com/uber/zanzibar/blob/master/benchmarks/README.md.
  • the circuit breaker should be configurable, i.e., users can decide whether to have circuit breaker or not. That said, we can make it opt-in by default.

@@ -5,6 +5,7 @@ package {{$instance.PackageInfo.PackageName}}
import (
"context"
"fmt"
"github.com/afex/hystrix-go/hystrix"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first import group should be stdlib, third party libraries should go into the following group.

Copy link
Contributor Author

@abhishekparwal abhishekparwal Jan 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, while doing codegen we are running goimports

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, but consistency is good.

@@ -23,6 +23,8 @@ package zanzibar
import (
"context"
"fmt"
"github.com/afex/hystrix-go/hystrix/metric_collector"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import grouping

Copy link
Contributor Author

@abhishekparwal abhishekparwal Jan 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -74,7 +74,7 @@ func TestCallMetrics(t *testing.T) {
)
assert.NoError(t, err)

numMetrics := 10
numMetrics := 14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have assertions for the new metrics, not just the count.

package plugins

import (
"github.com/uber-go/tally"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import grouping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -167,7 +176,11 @@ func (c *{{$clientName}}) {{$methodName}}(
}
{{- end}}

res, err := req.Do()
var res *zanzibar.ClientHTTPResponse
err = hystrix.Do("{{$clientID}}", func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call hystrix.DoC with the proper context here, so the context is respected.

@@ -5,6 +5,7 @@ package {{$instance.PackageInfo.PackageName}}
import (
"context"
"errors"
"github.com/afex/hystrix-go/hystrix"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import grouping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@ChuntaoLu
Copy link
Contributor

Copyright change is landed on master, rebase would get rid of the noises.

@abhishekparwal
Copy link
Contributor Author

#################################################

benchmark output circuit_breaker

#################################################
#################################################

Running 30s test @ http://localhost:8093/contacts/foo/contacts
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 32.00ms 33.25ms 1.08s 88.01%
Req/Sec 1.20k 203.56 2.12k 70.97%
429092 requests in 30.04s, 47.06MB read
Socket errors: connect 0, read 235, write 0, timeout 0
Non-2xx or 3xx responses: 14
Requests/sec: 14284.97
Transfer/sec: 1.57MB

#################################################

benchmark output master

#################################################
#################################################

Running 30s test @ http://localhost:8093/contacts/foo/contacts
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 35.75ms 32.25ms 392.37ms 65.52%
Req/Sec 1.05k 152.10 2.02k 77.17%
376117 requests in 30.04s, 41.25MB read
Socket errors: connect 0, read 253, write 0, timeout 0
Requests/sec: 12520.04
Transfer/sec: 1.37MB
Switched to branch 'circuit_breaker'

Copy link
Contributor

@ChuntaoLu ChuntaoLu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a short passage in the readme doc to describe the circuit breaker feature and how to enable/disable and config?

"github.com/opentracing/opentracing-go"
metricCollector "github.com/afex/hystrix-go/hystrix/metric_collector"
"github.com/uber/zanzibar/runtime/plugins"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary blank line

"clients.multi.ip": "127.0.0.1",
"clients.multi.port": 4003,
"clients.multi.timeout": 10000,
"clients.multi.maxConcurrentRequests": 1000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a setting where circuit breaker is disabled as a configuration example.

@@ -95,7 +95,7 @@ func TestCallMetrics(t *testing.T) {
metrics := cg.M3Service.GetMetrics()
// we don't care about jaeger emitted metrics
for key := range metrics {
if strings.HasPrefix(key, "jaeger") {
if strings.HasPrefix(key, "jaeger") || strings.Contains(key, "circuitbreaker") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think circuitbreaker metrics are critical (in the sense we will create dashboard and monitor/alert).

And ideally we should have tests that covers failure scenarios to verify our integration with the library is as expected.

README.md Outdated
@@ -140,6 +140,23 @@ func (c *quux) Echo(s string) string { return s }
```
Note the type of `deps` parameter passed to `NewClient` constructor function is generated by Zanzibar, as indicated by the import path. Zanzibar takes care of initializing and passing in the acutal `deps` argument, as mentioned in [Dependency Injection](#dependency-injection).

###### Circuit Breaker
For increasing overall system resiliency, zanzibar uses a [Circuit Breaker](https://msdn.microsoft.com/en-us/library/dn589784.aspx) to avoid calling clients when there is a increase in failure rate beyond a set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: an increase

README.md Outdated
@@ -140,6 +140,23 @@ func (c *quux) Echo(s string) string { return s }
```
Note the type of `deps` parameter passed to `NewClient` constructor function is generated by Zanzibar, as indicated by the import path. Zanzibar takes care of initializing and passing in the acutal `deps` argument, as mentioned in [Dependency Injection](#dependency-injection).

###### Circuit Breaker
For increasing overall system resiliency, zanzibar uses a [Circuit Breaker](https://msdn.microsoft.com/en-us/library/dn589784.aspx) to avoid calling clients when there is a increase in failure rate beyond a set
threshold. It recovers it after elasped time to again fo in half-open and close state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It recovers it after elasped time to again fo in half-open and close state.

Can you rework this sentence? a few typos and the information it tries to convey doesn't seem clear to me.

README.md Outdated
```
"clients.<clientID>.errorPercentThreshold": 20
```
By default circuit breaker is enabled, if you want to disable it , set :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant white space before last ,

clients.google-now.ip: 127.0.0.1
clients.google-now.port: 14120
clients.google-now.timeout: 10000
clients.google-now.maxConcurrentRequests: 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put the disabled config here as well, to be consistent with the json config. (yaml is prefered, but zanzibar is backward compatible with json config)

@@ -3,27 +3,41 @@ clients.bar.defaultHeaders:
clients.bar.ip: 127.0.0.1
clients.bar.port: 4001
clients.bar.timeout: 10000
clients.bar.maxConcurrentRequests: 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

README.md Outdated
For increasing overall system resiliency, zanzibar uses a [Circuit Breaker](https://msdn.microsoft.com/en-us/library/dn589784.aspx) which avoids calling client when there is an increase in failure rate beyond a set
threshold. After a sleepWindowInMilliseconds, client calls are attempted recovery by going in half-open and then close state.

circuitBreakerDisabled: Default false. Disables the circuit-breaker:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disables -> To disable

README.md Outdated
"clients.<clientID>.circuitBreakerDisabled" : true
```

maxConcurrentRequests: Default 50. Sets how many requests can be run at the same time, beyond which requests are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sets -> To set

Same for the following 3 configurations

@ChuntaoLu ChuntaoLu merged commit 9a4028e into master Jan 17, 2019
@ChuntaoLu ChuntaoLu deleted the circuit_breaker branch January 17, 2019 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants