Skip to content
This repository has been archived by the owner on Jun 19, 2022. It is now read-only.

Adjust broker retry resource requirements to meet default retry attempts per second target #1602

Closed
grantr opened this issue Aug 18, 2020 · 2 comments · Fixed by #1843
Closed
Assignees
Labels
area/broker kind/feature-request New feature or request priority/1 Blocks current release defined by release/* label or blocks current milestone release/2 storypoint/8
Milestone

Comments

@grantr
Copy link
Contributor

grantr commented Aug 18, 2020

Problem
Once we have a retry attempts per second target #1598 and a default event payload size #1599, we can measure the resource requirements for a single retry pod to meet the target. That's our default resource requirement value.

Persona:
User

Exit Criteria
The broker retry deployment is created with default resource requirements allowing it to meet the default retry attempts per second target.

Additional context (optional)
Part of #1552.

@grantr grantr added kind/feature-request New feature or request area/broker priority/1 Blocks current release defined by release/* label or blocks current milestone release/2 labels Aug 18, 2020
@grantr grantr added this to the Backlog milestone Aug 25, 2020
@grantr grantr modified the milestones: Backlog, v0.18.0-M2 Sep 2, 2020
@cathyzhyi cathyzhyi self-assigned this Sep 3, 2020
@grantr grantr modified the milestones: v0.18.0-M2, Backlog Sep 9, 2020
@grantr
Copy link
Contributor Author

grantr commented Sep 14, 2020

#970 is effectively a subtask of this one.

@cathyzhyi
Copy link
Contributor

cathyzhyi commented Oct 7, 2020

here are the configuration proposed to reach the target throughput in #1598

  • CPU

    • Request: 1000m
      This is based on all the experiments did with different combinations (rps* payload size *number of triggers ) to reach target throughput. The highest CPU consumption for a single pod is around 3500-4000m. With autoscaling, this is set to a lower number so that a single pod resources requirement won't be too high. A lower value would trigger auto-scaling unnecessarily for very low workload.
  • Memory

    • Limit == request: 1.5 Gb

      This is to protect the retry from OOM when MaxOutStandingMessages is set to 200 and MaxOutStandingBytes is set to 1M. With 100 trigger * 256kb payload size when reaching the target throughput or a little bit higher, memory request lower than 1.5G would result in OOM. Memory request of 1.5G is quite stable even when dealing with throughput higher than the default target. The reason for setting MaxOutStandingMessages and MaxOutStandingBytes is stated in the Pubsub receiveSetting part.
  • Autoscaling params

    • CPU utilization threshold: 95%

      The rationale behind this number is mainly for good CPU utilization. Setting it relatively high so that retry doesn't scale out too fast and end up using only part of the requested CPU.
    • Memory consumption threshold: 1000Mi

      This number is set well below the request to add safety buffer in case of memory surge. In reality, it's very unlikely this value triggers auto scaling because the CPU threshold is usually reached before memory.
  • Pubsub receiveSetting

    • MaxOutstandingMessages: 200 (old value 100)
    • MaxOutstandingBytes: 1M (old value 3M)
      Setting MaxOutstandingBytes to 1M can protect retry from OOM with 256k payload * 100 trigger with very high inbound traffic. MaxOutstandingMessages set to 200 so that smaller payload size like 32k payload * 100 trigger can reach a better throughput without OOM.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/broker kind/feature-request New feature or request priority/1 Blocks current release defined by release/* label or blocks current milestone release/2 storypoint/8
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants