-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for retries #2038
Add support for retries #2038
Conversation
How I tested
|
routes = append(routes, pbRoute) | ||
} | ||
budget := DefaultRetryBudget | ||
if profile.RetryBudget != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this is addressed elsewhere and I couldn't find it: would this be an appropriate place to validate the budget arguments? Would an error here be more visible to a user?
The proxy will already disregard invalid budgets and log at WARN level, which should be visible, but only if the user is looking at the proxy logs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Probably a good idea to validate here too. Unfortunately, the suffers from the same problem that validation errors will only be visible if you're looking at the logs.
We may be able to some limited validation with CRD schema validation too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for linkerd inject
or other related command to do the validation then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be a great addition to the inject
checks.
@adleong I can't get |
@grampelberg uh oh. can you share a screenshot of the stat command and output? |
$ bin/go-run cli -n booksapp stat deploy
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TLS
authors 1/1 - - - - - -
books 1/1 - - - - - -
traffic 1/1 - - - - - -
webapp 3/3 - - - - - - |
I've replicated my stat bug on stable, so it isn't your code! I'm going to go figure out what the deal is there now ..... |
is it feasible to read the We could use something like The reason I ask this is that our openapi spec is auto generated based on the routes in the controllers. It would be annoying to maintain a separate list of routes and their linkerd settings to then patch the json before passing into the linkerd cli |
What do you think about reporting effective by default and adding a
|
@jon-walton I love this idea! It should be pretty simple to add this to the |
@grampelberg I dig it. ⛏ |
Signed-off-by: Alex Leong <alex@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
534aeda
to
d87b6cc
Compare
@grampelberg I implemented your |
Oh yeah, I like that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great. thanks for the test setup instructions.
cli/cmd/root.go
Outdated
@@ -162,17 +162,17 @@ func newStatOptionsBase() *statOptionsBase { | |||
|
|||
func (o *statOptionsBase) validateOutputFormat() error { | |||
switch o.outputFormat { | |||
case "table", "json", "": | |||
case "table", "json", "wide", "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to implement the EFFECTIVE/ACTUAL columns for the linkerd stat
command? if not, it's a bit confusing that stat -o wide
works but is a no-op.
related, if routes -o wide
only works with --to
, consider validating that, or maybe documenting it in the help.
cli/cmd/root.go
Outdated
proxyOutboundCapacity: map[string]uint{}, | ||
proxyCPURequest: "", | ||
proxyMemoryRequest: "", | ||
tls: "", | ||
disableExternalProfiles: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is a go-fmt issue? when i save this file in vscode it re-aligns the field assignments.
cli/cmd/routes.go
Outdated
"SUCCESS", | ||
"RPS", | ||
} | ||
if options.toResource == "" || options.outputFormat != "wide" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider:
outputActual := options.toResource != "" && options.outputFormat == "wide"
... and then use outputActual
in the 3 subsequent if statements.
@@ -13,6 +13,7 @@ import ( | |||
|
|||
const ( | |||
routeReqQuery = "sum(increase(route_response_total%s[%s])) by (%s, dst, classification)" | |||
actualRouteReqQuery = "sum(increase(route_actual_response_total%s[%s])) by (%s, dst, classification)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the prometheus metrics level and BasicStats
level, we're re-using the existing response keys for "effective" stats, and adding new keys for "actual". higher up in the abstractions, around jsonRouteStats
and cli table headers
, we're instead adding both new "effective" and "actual" keys, and leaving the existing keys empty. what do you think about modifying the higher-level abstractions to match these lower-level ones, such that we'd remove "effective" altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only place that we have "effective" keys and leave the existing keys empty is in the struct that will be serialized to json. I do this so that we can have different field names: effective_rps
in the outbound case and just plain rps
in the inbound case.
Signed-off-by: Alex Leong <alex@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 🚢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐️ 🚀 Awesome! So cool to see these stats wired up.
$ bin/linkerd routes deploy/books --to svc/authors -owide
ROUTE SERVICE EFFECTIVE_SUCCESS EFFECTIVE_RPS ACTUAL_SUCCESS ACTUAL_RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
HEAD /authors/{id}.json authors 100.00% 2.2rps 51.16% 4.3rps 12ms 30ms 38ms
cli/cmd/routes.go
Outdated
func buildTopRoutesRequest(resource string, options *routesOptions) (*pb.TopRoutesRequest, error) { | ||
err := options.validateOutputFormat() | ||
err := validateOutputFormat(options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIOLI -- you could still use *routesOptions
as the receiver for this method, even though it embeds statOptionsBase
, which has its own definition of validateOutputFormat
. Something like this would work:
diff --git a/cli/cmd/routes.go b/cli/cmd/routes.go
index bab9fd36..9c3dceb4 100644
--- a/cli/cmd/routes.go
+++ b/cli/cmd/routes.go
@@ -268,12 +268,12 @@ func printRouteJSON(stats []*routeRowStats, w *tabwriter.Writer, options *routes
fmt.Fprintf(w, "%s\n", b)
}
-func validateOutputFormat(options *routesOptions) error {
- switch options.outputFormat {
+func (o *routesOptions) validateOutputFormat() error {
+ switch o.outputFormat {
case "table", "json", "":
return nil
case "wide":
- if options.toResource == "" {
+ if o.toResource == "" {
return errors.New("wide output is only available when --to is specified")
}
return nil
@@ -283,7 +283,7 @@ func validateOutputFormat(options *routesOptions) error {
}
func buildTopRoutesRequest(resource string, options *routesOptions) (*pb.TopRoutesRequest, error) {
- err := validateOutputFormat(options)
+ err := options.validateOutputFormat()
if err != nil {
return nil, err
}
Signed-off-by: Alex Leong <alex@buoyant.io>
This change adds the
isRetryable
property to routes in a service profile. This causes the proxy to retry failed requests to retryable routes whenever it is able to. A retry budget may also be added to the service profile which can limit the number of retries. Thelinkerd profile --template
command has been updated to describe these new properties.The
linkerd routes
command has been updated to differentiate between actual and effective RPS and success rate. This data is only available for outgoing queries i.e. when the--to
flag is used. Effective metrics are the metrics as observed by the client whereas actual metrics are the metrics which describe the requests and responses which are actually sent on the wire. Adding retries will usually increase the actual RPS and increase the effective success rate while leaving the actual success rate and effective RPS unchanged.