Tune broker components #1269

yolocs · 2020-06-11T17:30:04Z

Some mitigation for #1265

Proposed Changes

Adjusted broker components memory resource
- Increased ingress memory limit to 1000Mi
- Set memory limits much high than requested
Adjusted HPA
- The avg memory usage is set to half of the limit for fanout/retry (this hopefully helps to mitigate some problem from surging memory usage)
- Lower the max replicas for fanout/retry as I've seen a higher error rate of dns when reaching more than 10 replicas
Relaxed liveness probe timeout (would help reduce some noise)
Override MaxIdleConns for delivery HTTP client (would help reusing connections and reduce the chance of dns error)

Release Note

- Adjusted broker components memory resource
  - Increased ingress memory limit to 1000Mi
  - Set memory limits much high than requested
- Adjusted HPA
  - The avg memory usage is set to half of the limit for fanout/retry (this hopefully helps to mitigate some problem from surging memory usage)
  - Lower the max replicas for fanout/retry as I've seen a higher error rate of dns when reaching more than 10 replicas
- Relaxed liveness probe timeout (would help reduce some noise)
- Override `MaxIdleConns` for delivery HTTP client (would help reusing connections and reduce the chance of dns error)

Docs

knative-prow-robot · 2020-06-11T17:30:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yolocs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yolocs]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yolocs · 2020-06-11T18:05:13Z

/retest

liu-cong · 2020-06-11T19:30:04Z

pkg/broker/handler/providers.go

@@ -37,6 +38,11 @@ var (

 	DefaultHTTPClient = &http.Client{
 		Transport: &ochttp.Transport{
+			Base: &http.Transport{
+				MaxIdleConns:        1000,
+				MaxIdleConnsPerHost: 1000,


Since generally we should have more than 1 targets, I think this value should be smaller than MaxIdleConns.

I suggest the following numbers. @grantr any thoughts on this?

MaxIdelConns: 1000, // Assume an average of 20 triggers, just a guess. This makes sure idle conns are more fairly distributed across targets. MaxIdleConnsPerHost: 50, // Also set max conns per target to limit the resource usage per target. This should also help limit the memory usage. MaxConnsPerHost:100,

Open to suggestion. Some of my thinking:

We don't have to guarantee fairness here. When you have 100 triggers but only one is actively receiving events, I see no reason to not have it take more idle connections. It should naturally evolve to "active triggers have more idle conns and less active ones have less". Although it might worth setting MaxIdleConnsPerHost a little bit less than MaxIdleConns. Under high concurrency I occasionally saw errors of failure to put new idle conn for a host because it's reaching max idle conns.

Setting MaxConnsPerHost should be able to effectively limit resource usage. But there is a balance between this and maxing throughput. Especially when we don't have "queue length based" auto scaling.

RE
#1. We need to find a good balance point. A high MaxIdleConnsPerHost deals with "small number of active triggers" scenario well but not so well with many active triggers. So I took a guess of on average 20 active triggers. That might be too big. Ideally we can make this configurable but we have to pick a guess for now. How about setting MaxIdleConnsPerHost to 100 or 200?

#2 Agree it's a tradeoff. Since we can run into OOM, setting this will help. Maybe something like 200 is a good balance point?

I'm thinking a bigger number like 500 for both MaxIdleConnsPerHost and MaxConnsPerHost. Since now I set high memory limit with HPA on 50%, it should help with the OOM problem (not always but should help some). Thoughts?

Is it better to have MaxIdleConnsPerHost< MaxConnsPerHost? This gives more room for other triggers to maintain some idle connection. The tradeoff is that it's less efficient for handling surges.

I don't think we can come up with perfect numbers We may need to tune it later anyway. I will lgtm with whatever you come up with :)

Make sense. Set them to 500 now.

liu-cong · 2020-06-12T17:31:29Z

/lgtm

Tune broker components

6dabd4a

yolocs requested review from grantr, nachocano, ian-mi and liu-cong June 11, 2020 17:30

knative-prow-robot requested review from grac3gao-zz and Harwayne June 11, 2020 17:30

knative-prow-robot added the approved label Jun 11, 2020

googlebot added the cla: yes (override cla status due to multiple authors bug) label Jun 11, 2020

knative-prow-robot added the size/M label Jun 11, 2020

yolocs removed request for Harwayne and grac3gao-zz June 11, 2020 18:36

liu-cong reviewed Jun 11, 2020

View reviewed changes

yolocs added 2 commits June 12, 2020 16:50

Merge branch 'master' into b/reuseconn

27dbc5e

num

d8dbd3b

knative-prow-robot assigned liu-cong Jun 12, 2020

knative-prow-robot added the lgtm label Jun 12, 2020

knative-prow-robot merged commit 1d5ef63 into google:master Jun 12, 2020

yolocs deleted the b/reuseconn branch June 12, 2020 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune broker components #1269

Tune broker components #1269

yolocs commented Jun 11, 2020

knative-prow-robot commented Jun 11, 2020

yolocs commented Jun 11, 2020

liu-cong Jun 11, 2020

yolocs Jun 11, 2020

liu-cong Jun 11, 2020

yolocs Jun 11, 2020

liu-cong Jun 11, 2020

yolocs Jun 12, 2020

liu-cong commented Jun 12, 2020

Tune broker components #1269

Tune broker components #1269

Conversation

yolocs commented Jun 11, 2020

Proposed Changes

knative-prow-robot commented Jun 11, 2020

yolocs commented Jun 11, 2020

liu-cong Jun 11, 2020

Choose a reason for hiding this comment

yolocs Jun 11, 2020

Choose a reason for hiding this comment

liu-cong Jun 11, 2020

Choose a reason for hiding this comment

yolocs Jun 11, 2020

Choose a reason for hiding this comment

liu-cong Jun 11, 2020

Choose a reason for hiding this comment

yolocs Jun 12, 2020

Choose a reason for hiding this comment

liu-cong commented Jun 12, 2020