Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries in activator and envoy timeout to avoid 503's #1226

Merged
merged 2 commits into from
Jun 15, 2018
Merged

Add retries in activator and envoy timeout to avoid 503's #1226

merged 2 commits into from
Jun 15, 2018

Conversation

akyyy
Copy link
Contributor

@akyyy akyyy commented Jun 15, 2018

Fixes #
Since activator uses the revision service name, which doesn't always have the pod ip, we saw 503's and 504's.

Proposed Changes

  • Add retries in activator
  • Specify envoy timeout in request header

@akyyy akyyy self-assigned this Jun 15, 2018
@google-prow-robot google-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 15, 2018
"github.com/knative/serving/pkg/controller"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

const sixtySecondsInMs = "60000"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to a generic name like requestTimeoutMs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Weight: 0,
}},
}, getActivatorDestinationWeight(0),
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you move this to the previous line gofmt will indent better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. thanks!

Copy link
Contributor

@josephburnett josephburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

/lgtm
/approve

@google-prow-robot google-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 15, 2018
@tcnghia
Copy link
Contributor

tcnghia commented Jun 15, 2018

/approve

}
ret = append(ret, activatorRoute)
}
activatorRoute := RevisionRoute{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sync with the latest changes as I did the same thing in master branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied your change over to avoid merge. :)

@@ -339,7 +339,7 @@ func TestCreateRouteCreatesStuff(t *testing.T) {
Namespace: testNamespace,
},
Weight: 100,
}},
}, getActivatorDestinationWeight(0)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sync this file to the latest in master as well. Some of these changes are there as well.

@@ -29,10 +30,37 @@ import (
"k8s.io/client-go/rest"
)

const (
maxRetry = 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems quite aggressive to retry up to 60 times per request. We should move this to be exponential backoff eventually. We should probably open a Github issue to tackle this later on and check this one in as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Mustafa wondered the same thing as me. I didn't see this before I commented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type retryRoundTripper struct{}

func (rrt retryRoundTripper) RoundTrip(r *http.Request) (*http.Response, error) {
transport := http.DefaultTransport.(*http.Transport)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a need for this cast?

@nikkithurmond
Copy link
Contributor

/lgtm
/approve

Awesome job :) Just a question (doesn't change my approval), but should we be worried about any backoff to the retries? I don't think so, but I'm just wondering if you've considered it.

@google-prow-robot google-prow-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 15, 2018
@mdemirhan
Copy link
Contributor

I just resolved the conflict and removed my hold request. Let's check this in and we can address my comments in a later review as they are not blockers.

@knative-metrics-robot
Copy link

The following is the coverage report on pkg/. Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/activator/revision.go 79.5% 79.1% -0.5
pkg/controller/route/route_test.go 78.5% 78.7% 0.2

*TestCoverage feature is being tested, do not rely on any info here yet

@knative-metrics-robot
Copy link

The following is the coverage report on pkg/. Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/activator/revision.go 79.5% 79.1% -0.5
pkg/controller/route/route_test.go 78.5% 78.7% 0.2

*TestCoverage feature is being tested, do not rely on any info here yet

@mdemirhan
Copy link
Contributor

/lgtm

@google-prow-robot google-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 15, 2018
@vaikas
Copy link
Contributor

vaikas commented Jun 15, 2018

/lgtm
/approve

@google-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akyyy, josephburnett, nikkithurmond, tcnghia, vaikas-google

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-prow-robot google-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2018
@google-prow-robot google-prow-robot merged commit 9d5a63a into knative:master Jun 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants