Skip to content
This repository has been archived by the owner on Jun 19, 2022. It is now read-only.

Determine resource request/limit for broker data plane components #876

Closed
5 tasks
liu-cong opened this issue Apr 16, 2020 · 10 comments · Fixed by #1121
Closed
5 tasks

Determine resource request/limit for broker data plane components #876

liu-cong opened this issue Apr 16, 2020 · 10 comments · Fixed by #1121
Assignees
Labels
area/broker kind/feature-request New feature or request priority/1 Blocks current release defined by release/* label or blocks current milestone release/1 storypoint/8
Milestone

Comments

@liu-cong
Copy link
Contributor

liu-cong commented Apr 16, 2020

Problem
We need to run some benchmark tests to determine a reasonable resource requests/limits for broker data plane.

Persona:
Which persona is this feature for?
Operator

Exit Criteria
A recommended resource request/limit and why.

Time Estimate (optional):
How many developer-days do you think this may take to resolve?

Additional context (optional)

@liu-cong liu-cong added kind/feature-request New feature or request area/broker labels Apr 16, 2020
@grantr grantr added priority/1 Blocks current release defined by release/* label or blocks current milestone release/1 labels Apr 21, 2020
@grantr
Copy link
Contributor

grantr commented Apr 21, 2020

This is P1 for the requests, but not P1 for the limits value or repeatable benchmarks.

@grantr grantr added this to the Backlog milestone Apr 21, 2020
@grantr
Copy link
Contributor

grantr commented May 6, 2020

Clarification on scope: request is needed to determine minimum cluster footprint. Optimal request size for performance is not necessary for R1.

@grantr grantr modified the milestones: Backlog, v0.15.0-M3 May 13, 2020
@yolocs
Copy link
Member

yolocs commented May 18, 2020

Setup

  • n1-standard-4 3 nodes cluster
  • Workloads
    • namespace-1: 500x healthy subscribers; seeding one event every 5s
    • namespace-2: 150x slow subscribers (delay 15m); seeding one event every 5m
    • namespace-3: 200x err subscribers (always return 500); seeding one event every 2m
    • namespace-4: 10x echo subscribers (echo back events); seeding one event every 2m

Accumulating long connections and retries in retry service. Every 2m there is a surge of requests because of the echos.

Results (all single pod)

k top pods                                                                                                                                                                  130 ↵
NAME                             CPU(cores)   MEMORY(bytes)
broker-fanout-76867ffc8-rdpzn    1670m        166Mi
broker-ingress-746f69945-hvvfz   610m         64Mi
broker-retry-7cc4559df5-l72bh    570m         239Mi

tl;dr: Fanout is pretty CPU intensive while retry is slightly more memory intensive when there are a lot of events to be retried.

Results (more replicas)

Fanout - 3 replicas

Retry - 3 replicas

k top pods
NAME                             CPU(cores)   MEMORY(bytes)
broker-fanout-76867ffc8-jfbrq    1047m        291Mi
broker-fanout-76867ffc8-rdpzn    984m         166Mi
broker-fanout-76867ffc8-rqwzn    722m         277Mi
broker-ingress-746f69945-hvvfz   857m         85Mi
broker-retry-7cc4559df5-4plhb    251m         609Mi
broker-retry-7cc4559df5-l72bh    503m         239Mi
broker-retry-7cc4559df5-r7pt4    335m         530Mi

With increased replicas, fanout/retry was able to distribute the load and there is less CPU used in every pod. However, it doesn't seem to impact memory usage as much. I guess it's because that memory usage has a baseline that depends on the number of pull subs it has (and outstanding bytes for each pull sub).

Retry pod (239Mi) with significant less memory usage seems to handle more long connections.

When the ingress replicas were not increased, I've observed some reply delivery failures. This is an indicator that ingress might be reach its limit.

Conclusion

Proposed resource limit:

  • Fanout
    • Request: 200m CPU, 200Mi
    • Limit: 1200m CPU, 1000Mi
  • Retry
    • Request: 200m CPU, 200Mi
    • Limit: 1000m CPU, 1000Mi
  • Ingress
    • Request: 100m CPU, 100Mi
    • Limit: 1000m CPU, 500Mi

Action items:

  • Install HPA in cloud-run-events for broker components
    • This definitely helps distribute CPU load
    • For fanout and ingress, they are less memory intensive. With HPA they should scale well enough
  • Limit Pubsub outstanding message bytes for retry service
    • This would help to reduce the memory pressure at the cost of slower retrying
  • Implement dead letter queue
    • Accumulated retrying events don't seem to have a huge impact on resource usage likely because we use "sync" pull with the default 1000 outstanding messages. But eventually this won't scale well

@yolocs
Copy link
Member

yolocs commented May 18, 2020

I now think it's less important to set these values properly and having HPA is more important.
I also did some profiling for these components but I can't read the data very well and thus failed to come up with any useful action items.

@liu-cong @grantr @grac3gao let me know what you think.

@grantr
Copy link
Contributor

grantr commented May 18, 2020

Thanks @yolocs, this is really interesting. Great to see real numbers.

Was the load the same in the "single pods" run as in the "more replicas" run? I'm curious why retry was at 239MiB with a single pod but more per pod with replicas. Seems like if the retry memory usage was determined by number of pull subs and outstanding bytes, the single pod would have >= memory usage as multiple replicas.

@yolocs
Copy link
Member

yolocs commented May 19, 2020

After almost a day of running, things turned pretty ugly for retry service.

k top pods -n cloud-run-events
NAME                             CPU(cores)   MEMORY(bytes)
broker-fanout-76867ffc8-jfbrq    271m         485Mi
broker-fanout-76867ffc8-rdpzn    805m         447Mi
broker-fanout-76867ffc8-rqwzn    519m         399Mi
broker-ingress-746f69945-hvvfz   571m         541Mi
broker-retry-7cc4559df5-2xpf4    261m         5020Mi
broker-retry-7cc4559df5-mgts2    1120m        2572Mi
broker-retry-7cc4559df5-t8gdb    966m         3146Mi

It's now consuming a lot of memory and caused a lot of evicts.

k get pod
NAME                             READY   STATUS    RESTARTS   AGE
broker-fanout-76867ffc8-jfbrq    1/1     Running   8          19h
broker-fanout-76867ffc8-rdpzn    1/1     Running   7          22h
broker-fanout-76867ffc8-rqwzn    1/1     Running   19         19h
broker-ingress-746f69945-hvvfz   1/1     Running   13         22h
broker-retry-696f7bbd8c-2z6xn    0/1     Evicted   0          24h
broker-retry-7cc4559df5-2d4vp    0/1     Evicted   0          114m
broker-retry-7cc4559df5-2mll5    0/1     Evicted   0          3h3m
broker-retry-7cc4559df5-2xpf4    1/1     Running   4          82m
broker-retry-7cc4559df5-46z9d    0/1     Evicted   0          3h59m
broker-retry-7cc4559df5-4bxww    0/1     Evicted   0          138m
broker-retry-7cc4559df5-4h74c    0/1     Evicted   0          6h1m
broker-retry-7cc4559df5-4plhb    0/1     Evicted   0          19h
broker-retry-7cc4559df5-65cpz    0/1     Evicted   0          5h14m
broker-retry-7cc4559df5-65p8s    0/1     Evicted   0          3h
broker-retry-7cc4559df5-6gxxk    0/1     Evicted   0          6h7m
broker-retry-7cc4559df5-7fkh2    0/1     Evicted   0          6h55m
broker-retry-7cc4559df5-8bn84    0/1     Evicted   0          4h28m
broker-retry-7cc4559df5-8gm4d    0/1     Evicted   0          56m
broker-retry-7cc4559df5-8mrt9    0/1     Evicted   0          4h47m
broker-retry-7cc4559df5-99ssm    0/1     Evicted   0          10m
broker-retry-7cc4559df5-bftqq    0/1     Evicted   0          5h1m
broker-retry-7cc4559df5-bpkh2    0/1     Evicted   0          4h18m
broker-retry-7cc4559df5-c7cb2    0/1     Evicted   0          53m
broker-retry-7cc4559df5-clmtw    0/1     Evicted   0          6h44m
broker-retry-7cc4559df5-d7mdh    0/1     Evicted   0          80m
broker-retry-7cc4559df5-dqmnv    0/1     Evicted   0          59m
broker-retry-7cc4559df5-f44ld    0/1     Evicted   0          3h37m
broker-retry-7cc4559df5-ggq9d    0/1     Evicted   0          5h35m
broker-retry-7cc4559df5-gj7dc    0/1     Evicted   0          75m
broker-retry-7cc4559df5-gr8s2    0/1     Evicted   0          3h24m
broker-retry-7cc4559df5-hkfjj    0/1     Evicted   0          150m
broker-retry-7cc4559df5-j427n    0/1     Evicted   0          103m
broker-retry-7cc4559df5-jc77n    0/1     Evicted   0          3h29m
broker-retry-7cc4559df5-jwgrw    0/1     Evicted   0          6h57m
broker-retry-7cc4559df5-k2nnd    0/1     Evicted   0          125m
broker-retry-7cc4559df5-kklj4    0/1     Evicted   0          173m
broker-retry-7cc4559df5-kwsf2    0/1     Evicted   0          167m
broker-retry-7cc4559df5-l6sn5    0/1     Evicted   0          159m
broker-retry-7cc4559df5-l72bh    0/1     Evicted   0          22h
broker-retry-7cc4559df5-lh5tg    0/1     Evicted   0          92m
broker-retry-7cc4559df5-m89f9    0/1     Evicted   0          3h11m
broker-retry-7cc4559df5-mdhrp    0/1     Evicted   0          98m
broker-retry-7cc4559df5-mgts2    0/1     Evicted   0          17m
broker-retry-7cc4559df5-ms6vq    0/1     Evicted   0          4h53m
broker-retry-7cc4559df5-mvrwg    0/1     Evicted   0          6h18m
broker-retry-7cc4559df5-npjwv    1/1     Running   1          7m27s
broker-retry-7cc4559df5-nzx6h    0/1     Evicted   0          85m
broker-retry-7cc4559df5-p7v6v    0/1     Evicted   0          161m
broker-retry-7cc4559df5-phpsv    0/1     Evicted   0          119m
broker-retry-7cc4559df5-pr92m    0/1     Evicted   0          5h49m
broker-retry-7cc4559df5-pz5n5    0/1     Evicted   0          4h41m
broker-retry-7cc4559df5-qwc27    1/1     Running   0          3m22s
broker-retry-7cc4559df5-r7pt4    0/1     Evicted   0          19h
broker-retry-7cc4559df5-rlzhz    0/1     Evicted   0          133m
broker-retry-7cc4559df5-rp547    0/1     Evicted   0          151m
broker-retry-7cc4559df5-rsxxf    0/1     Evicted   0          3h19m
broker-retry-7cc4559df5-rtqtf    0/1     Evicted   0          4h41m
broker-retry-7cc4559df5-shvmv    0/1     Evicted   0          4h33m
broker-retry-7cc4559df5-skxpb    0/1     Evicted   0          133m
broker-retry-7cc4559df5-t8gdb    0/1     Evicted   0          46m
broker-retry-7cc4559df5-t987z    0/1     Evicted   0          6h33m
broker-retry-7cc4559df5-tnpkp    0/1     Evicted   0          171m
broker-retry-7cc4559df5-v4qbx    0/1     Evicted   0          7h20m
broker-retry-7cc4559df5-vtg8z    0/1     Evicted   0          6h58m
broker-retry-7cc4559df5-vxk8z    0/1     Evicted   0          13m
broker-retry-7cc4559df5-vzzcz    0/1     Evicted   0          3h45m
broker-retry-7cc4559df5-x6j8w    0/1     Evicted   0          4h36m
broker-retry-7cc4559df5-xgw7n    0/1     Evicted   0          131m
broker-retry-7cc4559df5-xstx6    0/1     Evicted   0          6h27m
broker-retry-7cc4559df5-zg55z    0/1     Evicted   0          20m
broker-retry-7cc4559df5-zr9x4    0/1     Evicted   0          4h41m
broker-retry-7cc4559df5-zxsv6    0/1     Evicted   0          31m

Based on the profiling data, it seems more than half of the memory usage was spent on http diaConn and subsequently bufio.NewWriterSize and bufio.NewReaderSize. This function should only be used for forwarding events to subscribers. I suspect it caused by accumulated retry events being sent to "slow" (slower than internal timeout) subscribers. Because it's slow, the retry service has to keep connections open for a long time (>10m) as well as keep the allocated buffer for reading/writing data.

@ian-mi as you might have some insights.

I'm updating the slow subscribers to be not-slow. And expecting memory consumption drops.

@yolocs
Copy link
Member

yolocs commented May 19, 2020

This behavior further confirms the need to:

  1. Limit outstanding message bytes in retry service. This won't solve the problem if there are a lot of triggers. If we know for each pull sub we will have max X bytes and we have Y triggers. Then we can expect a max of O(X*Y) memory usage (not including overhead from other processes). We can limit X but not Y. Y has a hard limit 10000 from pubsub though. So maybe we can do some calculation for a reasonable X.

  2. Have dead letter queue. Accumulated retry events sounds like a nightmare to the retry service. So either we can somehow shard the service or we dead-letter events.

@liu-cong
Copy link
Contributor Author

@yolocs

Thanks for the numbers and insights!

A few questions/comments:

  1. Can you share the profiling files? I'd like to take a look
  2. If I remember correctly, we don't have a backoff retry strategy yet. Maybe that's the short term mitigation.
  3. Do you have any data or evidence about the throughput under different load and different # of replicas? For example, to @grantr's question about the increased memory with more replicas, maybe there was a throughput increase that can explain the memory increase.

@yolocs
Copy link
Member

yolocs commented May 19, 2020

Limiting the outstanding messages/bytes seems to effectively throttle the resources usage. Here is a snapshot after I changed outstanding messages to 10 and bytes to 10000.

k top pods
NAME                             CPU(cores)   MEMORY(bytes)
broker-fanout-76867ffc8-jfbrq    1037m        219Mi
broker-fanout-76867ffc8-rdpzn    812m         175Mi
broker-fanout-76867ffc8-rqwzn    924m         248Mi
broker-ingress-746f69945-hvvfz   933m         98Mi
broker-retry-68d55cdc8c-cbdfg    328m         183Mi
broker-retry-68d55cdc8c-fd426    324m         178Mi
broker-retry-68d55cdc8c-sdm6l    296m         188Mi

@yolocs
Copy link
Member

yolocs commented May 20, 2020

Proposing new values (according to new guideline, no CPU limits and memory request=limit):

  • Ingress (this is baseline because it seems it's the least resource intensive component)
    • CPU request: 1000m
    • Memory request/limit: 500Mi
  • Fanout (this is more CPU intensive and likely need to keep long connections which will use more memory)
    • CPU request: 1500m (it doesn't make sense to put a huge number here if we have HPA)
    • Memory request/limit: 1000Mi
  • Retry (this is more memory intensive if we don't limit outstanding messages in the service)
    • CPU request: 1000m
    • Memory request/limit: 1500Mi
      • Related to Retry pod memory usage grows without bound #1102. If for each pull sub we set outstanding bytes limit to 1Mi and assume avg message size is 200K, then a single retry pod can handle retry events for (1500Mi/1Mi = ~1500) triggers with (1Mi/200k=5) qps per trigger. With HPA, hopefully the retry load can be distributed so that each trigger can get better aggregated qps (from multiple replicas)
      • Caveat: Pubsub allows message size up to 10Mi. In we set outstanding bytes limit to 10Mi, then with 1500Mi memory a single retry pod could at most handle 150 triggers with 1 qps per trigger

What we have observed so far:

  • Fanout CPU load can be distributed with more pods
  • Retry memory could be effectively limited with outstanding bytes

What we haven't confirmed:

  • With more retry pods, does the overall delivery qps for a trigger increase? (Guessing - yes)

In summary there isn't much science behind these proposed values. But they should work if we have HPA to distribute the load.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/broker kind/feature-request New feature or request priority/1 Blocks current release defined by release/* label or blocks current milestone release/1 storypoint/8
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants