Adjust broker retry resource requirements to meet default retry attempts per second target #1602

grantr · 2020-08-18T19:32:45Z

Problem
Once we have a retry attempts per second target #1598 and a default event payload size #1599, we can measure the resource requirements for a single retry pod to meet the target. That's our default resource requirement value.

Persona:
User

Exit Criteria
The broker retry deployment is created with default resource requirements allowing it to meet the default retry attempts per second target.

Additional context (optional)
Part of #1552.

grantr · 2020-09-14T22:42:25Z

#970 is effectively a subtask of this one.

cathyzhyi · 2020-10-07T03:18:19Z

here are the configuration proposed to reach the target throughput in #1598

CPU
- Request: 1000m
  This is based on all the experiments did with different combinations (rps* payload size *number of triggers ) to reach target throughput. The highest CPU consumption for a single pod is around 3500-4000m. With autoscaling, this is set to a lower number so that a single pod resources requirement won't be too high. A lower value would trigger auto-scaling unnecessarily for very low workload.
Memory 
- Limit == request: 1.5 Gb 
  This is to protect the retry from OOM when MaxOutStandingMessages is set to 200 and MaxOutStandingBytes is set to 1M. With 100 trigger * 256kb payload size when reaching the target throughput or a little bit higher, memory request lower than 1.5G would result in OOM. Memory request of 1.5G is quite stable even when dealing with throughput higher than the default target. The reason for setting MaxOutStandingMessages and MaxOutStandingBytes is stated in the Pubsub receiveSetting part.
Autoscaling params 
- CPU utilization threshold: 95% 
  The rationale behind this number is mainly for good CPU utilization. Setting it relatively high so that retry doesn't scale out too fast and end up using only part of the requested CPU.
- Memory consumption threshold: 1000Mi 
  This number is set well below the request to add safety buffer in case of memory surge. In reality, it's very unlikely this value triggers auto scaling because the CPU threshold is usually reached before memory.
Pubsub receiveSetting 
- MaxOutstandingMessages: 200 (old value 100)
- MaxOutstandingBytes: 1M (old value 3M)
  Setting MaxOutstandingBytes to 1M can protect retry from OOM with 256k payload * 100 trigger with very high inbound traffic. MaxOutstandingMessages set to 200 so that smaller payload size like 32k payload * 100 trigger can reach a better throughput without OOM.

grantr added kind/feature-request New feature or request area/broker priority/1 Blocks current release defined by release/* label or blocks current milestone release/2 labels Aug 18, 2020

grantr added this to the Backlog milestone Aug 25, 2020

grantr added the storypoint/8 label Aug 25, 2020

grantr modified the milestones: Backlog, v0.18.0-M2 Sep 2, 2020

cathyzhyi self-assigned this Sep 3, 2020

grantr modified the milestones: v0.18.0-M2, Backlog Sep 9, 2020

grantr mentioned this issue Sep 14, 2020

Stress test broker until it's broken #1491

Closed

grantr modified the milestones: Backlog, v0.18.0-M3 Sep 16, 2020

grantr modified the milestones: v0.18.0-M3, v0.19.0-M1 Sep 30, 2020

This was referenced Oct 1, 2020

Decide whether we should set request=limit for memory for broker data plane #1545

Closed

Memory-based autoscaling for the Broker deployments may not be reachable in certain circumstances #1520

Closed

cathyzhyi mentioned this issue Oct 7, 2020

Retry OOM when setting memory request to 1-1.5G #1815

Closed

yolocs mentioned this issue Oct 12, 2020

Adjust default brokercell parameters to match latest performance target #1828

Merged

yolocs closed this as completed Oct 13, 2020

cathyzhyi mentioned this issue Oct 20, 2020

Lower retry cpu request to 1G #1843

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust broker retry resource requirements to meet default retry attempts per second target #1602

Adjust broker retry resource requirements to meet default retry attempts per second target #1602

grantr commented Aug 18, 2020

grantr commented Sep 14, 2020

cathyzhyi commented Oct 7, 2020 •

edited

Loading

Adjust broker retry resource requirements to meet default retry attempts per second target #1602

Adjust broker retry resource requirements to meet default retry attempts per second target #1602

Comments

grantr commented Aug 18, 2020

grantr commented Sep 14, 2020

cathyzhyi commented Oct 7, 2020 • edited Loading

cathyzhyi commented Oct 7, 2020 •

edited

Loading