Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache statefulset scale update/get requests #7651

Merged
merged 4 commits into from
Feb 6, 2024

Conversation

Cali0707
Copy link
Member

@Cali0707 Cali0707 commented Feb 1, 2024

Helps to fix the consumergroup issues we are seeing in ekb. At least some of the slow reconciliation is due to errors such as:

failed to schedule consumers: failed to unschedule consumer group: failed to schedule consumers: Get \"[https://10.88.0.1:443/apis/apps/v1/namespaces/knative-eventing/statefulsets/kafka-broker-dispatcher/scale](https://10.88.0.1/apis/apps/v1/namespaces/knative-eventing/statefulsets/kafka-broker-dispatcher/scale)\": context canceled

When we increased the client rate limit, these error logs were no longer present

Proposed Changes

  • Cache the scale of statefulsets whenever we get/update them, reusing the cached value where possible
  • Expire the keys in the cache to force refresh every once in a while

Pre-review Checklist

  • At least 80% unit test coverage
  • E2E tests for any new behavior
  • Docs PR for any user-facing impact
  • Spec PR for any new API feature
  • Conformance test for any change to the spec

Release Note

StatefulSet scheduling now makes fewer API server requests, reducing APIServer load.

Signed-off-by: Calum Murray <cmurray@redhat.com>
@knative-prow knative-prow bot requested review from aslom and lberk February 1, 2024 20:08
@Cali0707 Cali0707 requested a review from pierDipi February 1, 2024 20:08
@knative-prow knative-prow bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 1, 2024
@Cali0707 Cali0707 requested review from matzew, creydr and Leo6Leo and removed request for aslom and lberk February 1, 2024 20:08
Signed-off-by: Calum Murray <cmurray@redhat.com>
Copy link

codecov bot commented Feb 1, 2024

Codecov Report

Attention: 60 lines in your changes are missing coverage. Please review.

Comparison is base (62c74c1) 74.29% compared to head (c76d64e) 74.01%.
Report is 5 commits behind head on main.

Files Patch % Lines
pkg/scheduler/scheduler.go 0.00% 56 Missing ⚠️
pkg/scheduler/statefulset/scheduler.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7651      +/-   ##
==========================================
- Coverage   74.29%   74.01%   -0.29%     
==========================================
  Files         262      262              
  Lines       15112    15172      +60     
==========================================
+ Hits        11227    11229       +2     
- Misses       3279     3337      +58     
  Partials      606      606              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 183 to 185
func (sc *ScaleCache) Reset() {
sc.entries = cache.NewExpiring()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this thread-safe ?

Signed-off-by: Calum Murray <cmurray@redhat.com>
@Cali0707
Copy link
Member Author

Cali0707 commented Feb 2, 2024

/cc @pierDipi

@knative-prow knative-prow bot requested a review from pierDipi February 2, 2024 16:38
@@ -114,3 +123,83 @@ type VPod interface {

GetResourceVersion() string
}

type ScaleCache struct {
lock sync.RWMutex //protects access to entries, entries itself is concurrency safe, so we only need to ensure that we correctly access the pointer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be specific on what this lock is associated with, usually <name> + Mu

Suggested change
lock sync.RWMutex //protects access to entries, entries itself is concurrency safe, so we only need to ensure that we correctly access the pointer
entriesMu sync.RWMutex // protects access to entries, entries itself is concurrency safe, so we only need to ensure that we correctly access the pointer

pkg/scheduler/scheduler.go Outdated Show resolved Hide resolved
pkg/scheduler/scheduler.go Outdated Show resolved Hide resolved
@Cali0707
Copy link
Member Author

Cali0707 commented Feb 6, 2024

/cc @pierDipi

@knative-prow knative-prow bot requested a review from pierDipi February 6, 2024 15:00
@Cali0707
Copy link
Member Author

Cali0707 commented Feb 6, 2024

/retest

Signed-off-by: Calum Murray <cmurray@redhat.com>
Copy link
Member

@pierDipi pierDipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2024
Copy link

knative-prow bot commented Feb 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cali0707, pierDipi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot merged commit 5500bed into knative:main Feb 6, 2024
36 of 39 checks passed
@Cali0707
Copy link
Member Author

@pierDipi do we want to backport this to 1.13?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants