-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve scheduler memory usage #8144
Improve scheduler memory usage #8144
Conversation
- Create a namespaced-scoped statefulset lister instead of being cluster-wide - Accept a PodLister rather than creating a cluster-wide one Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
4b17f61
to
c3d3b78
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pierDipi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8144 +/- ##
==========================================
+ Coverage 67.89% 67.91% +0.01%
==========================================
Files 368 368
Lines 17571 17581 +10
==========================================
+ Hits 11930 11940 +10
Misses 4893 4893
Partials 748 748 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
775f379
to
22bfaa9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
* Improve scheduler memory usage - Create a namespaced-scoped statefulset lister instead of being cluster-wide - Accept a PodLister rather than creating a cluster-wide one Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Update codegen Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
* Improve scheduler memory usage - Create a namespaced-scoped statefulset lister instead of being cluster-wide - Accept a PodLister rather than creating a cluster-wide one Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Update codegen Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
…its to speed up recovery time (#8202) * Improve scheduler memory usage (#8144) * Improve scheduler memory usage - Create a namespaced-scoped statefulset lister instead of being cluster-wide - Accept a PodLister rather than creating a cluster-wide one Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Update codegen Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Remove scheduler `wait`s to speed up recovery time (#8200) Currently, the scheduler and autoscaler are single threads and use a lock to prevent multiple scheduling and autoscaling decision from happening in parallel; this is not a problem for our use cases, however, the multiple `wait` currently present are slowing down recovery time. From my testing, if I delete and recreate the Kafka control plane and data plane, without this patch it takes 1h to recover when there are 400 triggers or 20 minutes when there are 100 triggers; with the patch it is immediate (only a 2/3 minutes with 400 triggers). - Remove `wait`s from state builder and autoscaler - Add additional debug logs - Use logger provided through the context as opposed to gloabal loggers in each individual component to preserve `knative/pkg` resource aware log keys. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
…its to speed up recovery time (#8203) * Improve scheduler memory usage (#8144) * Improve scheduler memory usage - Create a namespaced-scoped statefulset lister instead of being cluster-wide - Accept a PodLister rather than creating a cluster-wide one Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Update codegen Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Remove scheduler `wait`s to speed up recovery time (#8200) Currently, the scheduler and autoscaler are single threads and use a lock to prevent multiple scheduling and autoscaling decision from happening in parallel; this is not a problem for our use cases, however, the multiple `wait` currently present are slowing down recovery time. From my testing, if I delete and recreate the Kafka control plane and data plane, without this patch it takes 1h to recover when there are 400 triggers or 20 minutes when there are 100 triggers; with the patch it is immediate (only a 2/3 minutes with 400 triggers). - Remove `wait`s from state builder and autoscaler - Add additional debug logs - Use logger provided through the context as opposed to gloabal loggers in each individual component to preserve `knative/pkg` resource aware log keys. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Fixes #
Proposed Changes
Pre-review Checklist