Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multitenant PingSource #2716

Merged
merged 12 commits into from
Mar 13, 2020
Merged

Conversation

lionelvillard
Copy link
Member

@lionelvillard lionelvillard commented Mar 6, 2020

Proposed Changes

  • PingSources are now handled by a global jobrunner pod lazily installed knative-eventing
  • PingSources with the eventing.knative.dev/scope: resource are handled by a single pod (as before)

Release Note

PingSources are now by default handled by a global multi-tenant job runner service. To restore the previous behavior, add the eventing.knative.dev/scope: resource annotation to PingSource custom objects.

Docs

knative/docs#2288

@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 6, 2020
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Mar 6, 2020
@knative-prow-robot knative-prow-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 6, 2020
@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2020
Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Produced via:
gofmt -s -w $(find -path './vendor' -prune -o -path './third_party' -prune -o -type f -name '*.go' -print)

Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Produced via:
gofmt -s -w $(find -path './vendor' -prune -o -path './third_party' -prune -o -type f -name '*.go' -print)

@n3wscott
Copy link
Contributor

n3wscott commented Mar 7, 2020

I think this is the wrong direction to go. I would rather use jobs: #2478

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2020
@lionelvillard
Copy link
Member Author

@n3wscott The problem with CronJob is that a pod is created each time an event is sent. Creating/deleting pods is not cheap specially compare to goroutine.

Note that SinkBinding + CronJob is still valid when the scope of PingSource is set to resource.

@knative-prow-robot knative-prow-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 9, 2020
Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Produced via:
gofmt -s -w $(find -path './vendor' -prune -o -path './third_party' -prune -o -type f -name '*.go' -print)

@n3wscott
Copy link
Contributor

n3wscott commented Mar 9, 2020

@n3wscott The problem with CronJob is that a pod is created each time an event is sent. Creating/deleting pods is not cheap specially compare to goroutine.

It is cheap if it only runs once a week.

Note that SinkBinding + CronJob is still valid when the scope of PingSource is set to resource.

It is valid regardless, the question is where that job is launched.

@lionelvillard lionelvillard changed the title [WIP] Multitenant PingSource Multitenant PingSource Mar 9, 2020
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 9, 2020
Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Produced via:
gofmt -s -w $(find -path './vendor' -prune -o -path './third_party' -prune -o -type f -name '*.go' -print)

@@ -14,12 +14,14 @@ See the License for the specific language governing permissions and
limitations under the License.
*/

package pingsource
package controller

import (
"context"
"encoding/json"
"fmt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format Go code:

Suggested change
"fmt"
"fmt"

@n3wscott
Copy link
Contributor

n3wscott commented Mar 9, 2020

I still don't think we should go this way. We should use real cronjobs and sinkbindings.

@lionelvillard
Copy link
Member Author

It is cheap if it only runs once a week.

Sure, but that's not what this PR is about. We want to be able to handle thousand of cronjobs, each potentially triggering every second.

@lionelvillard
Copy link
Member Author

| I still don't think we should go this way

I'm quite puzzled. Why? It seems you think cronjobs can handle thousand of pingsources, each potentially triggering every second.

@knative-prow-robot knative-prow-robot added the area/test-and-release Test infrastructure, tests or release label Mar 9, 2020
@n3wscott
Copy link
Contributor

n3wscott commented Mar 9, 2020

| I still don't think we should go this way

I'm quite puzzled. Why? It seems you think cronjobs can handle thousand of pingsources, each potentially triggering every second.

Cron is only supported every minute, even today in PingSource 0.13. If you need a load tester, write a load tester.

@lionelvillard
Copy link
Member Author

oops right I meant every minute. BTW I did write a load tester and currently porting it to go

pkg/apis/sources/v1alpha2/ping_validation.go Outdated Show resolved Hide resolved
deploymentInformer.Informer().AddEventHandler(cache.FilteringResourceEventHandler{
FilterFunc: controller.Filter(v1alpha1.SchemeGroupVersion.WithKind("PingSource")),
Handler: controller.HandleAll(impl.EnqueueControllerOf),
})

// Watch for the global deployment
gr := func(obj interface{}) {
impl.GlobalResync(pingSourceInformer.Informer())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a global resync, maybe add a comment here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's because the shared deployment for multi-tenant ones will need to requeue all the ping sources, but if there's only one there, can we not just use tracker for it and only watch / notify on changes to that deployment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between the tracker and the informer with a FilterFunc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not sure why we are doing a global resync here. with the tracker you know if it's using the shared one and you can set up the tracker to only queue the particular ping source when things change. Here you're saying (unless I'm mistaken) that whenever the shared deployment changes, enqueue all the ping sources regardless of if they are affected by the change or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah got it, thanks! You are not mistaken.

This is related to the open discussion about whether to allow mixing cluster and namespace scoped dispatchers/jobrunners. Let's say we allow it (and offer config knobs to restrict it) then using the tracker is a win in that case. Working on it...

@vaikas
Copy link
Contributor

vaikas commented Mar 9, 2020

I was also curious about the seconds, vs. minutes, as we'd have to expand the types to have more granular time duration. So, the plan is to only have it fire at most once a minute?

@aslom
Copy link
Member

aslom commented Mar 11, 2020

Is there an issue that describes why we need to reimplement ping source and what are the options? I see it discussed in PRs #2478 and #2716 but I could not find related issues - could you add link here?

@n3wscott
Copy link
Contributor

@n3wscott said (see #2478 (comment)):

I don't really think a valid alternate impl is a new pod that runs 100% of the time to do what a job running 0.01% of the time can do.

A couple of things:

  • the new pod consumes almost no cpu, just memory

It takes a reserve in the cluster. So it is non-zero always.

  • the actual time of a job is the job run time (negligible as you pointed out) plus the time to manage the pod (creation, deletion, collecting metrics, etc...). Soon enough (few PingSource, each triggering every minute), you end up with a collection of jobs actually running 100% of the time.

This is a job of combining the jobs that have the same schedule, which could be a different task than what is being proposed. Then at most it is one job per minute.

test/e2e/source_ping_test.go Outdated Show resolved Hide resolved
pkg/reconciler/pingsource/jobrunner/runner_test.go Outdated Show resolved Hide resolved
pkg/reconciler/pingsource/jobrunner/runner.go Show resolved Hide resolved
pkg/reconciler/pingsource/controller/pingsource.go Outdated Show resolved Hide resolved
@jasonpet
Copy link

I am totally on board with this multi-tenant implementation. I have helped develop and support the IBM Cloud Functions alarm/cron event provider for the last 4 years and know from experience that we see far more customer adoption for this event provider than any of the others. My new team is exploring Knative Eventing and would love to see a multi-tenant implementation for cron such as this. The current cron job implementation in Knative Eventing would be far too costly for us.

@grantr
Copy link
Contributor

grantr commented Mar 12, 2020

@jasonpet does the IBM Cloud Functions alarm/cron event provider allow customers to create schedules with period less than 1 minute (e.g. every 10 seconds)? If so, do customers use that feature?

@lionelvillard
Copy link
Member Author

@grantr it used to but this feature has been removed. The minimal period is now 1 minute.

@lionelvillard
Copy link
Member Author

tracker has been added.
/unhold

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 12, 2020
@lionelvillard
Copy link
Member Author

/test pull-knative-eventing-integration-tests

@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-eventing-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/sources/v1alpha1/ping_validation.go 81.8% 91.7% 9.8
pkg/apis/sources/v1alpha2/ping_validation.go 87.5% 100.0% 12.5
pkg/reconciler/pingsource/controller/controller.go Do not exist 100.0%
pkg/reconciler/pingsource/controller/pingsource.go Do not exist 63.9%
pkg/reconciler/pingsource/controller/resources/eventtype.go Do not exist 100.0%
pkg/reconciler/pingsource/controller/resources/jobrunner.go Do not exist 100.0%
pkg/reconciler/pingsource/controller/resources/jobrunner_service.go Do not exist 100.0%
pkg/reconciler/pingsource/controller/resources/labels.go Do not exist 100.0%
pkg/reconciler/pingsource/controller/resources/receive_adapter.go Do not exist 100.0%
pkg/reconciler/pingsource/jobrunner/controller.go Do not exist 81.0%
pkg/reconciler/pingsource/jobrunner/pingsource.go Do not exist 61.1%
pkg/reconciler/pingsource/jobrunner/runner.go Do not exist 92.3%

@lionelvillard
Copy link
Member Author

@vaikas I implemented the tracker and added a couple of more tests.

@vaikas
Copy link
Contributor

vaikas commented Mar 13, 2020

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2020
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lionelvillard, vaikas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [lionelvillard,vaikas]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot merged commit 1a1f5b0 into knative:master Mar 13, 2020

cronRunner *cronJobsRunner
pingsourceLister sourceslisters.PingSourceLister
impl *controller.Impl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unused, and is kind of a strange thing to have the reconciler have as a field (it creates a cycle in the layering).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's coming from the in-mem reconciler. Same problem there...

}

func (r *Reconciler) reconcile(key string, source *v1alpha2.PingSource) error {
if source.DeletionTimestamp != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without finalizers, it isn't guaranteed that you'll actually flow through here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed! thanks!

@lberk lberk mentioned this pull request Mar 26, 2020
@lionelvillard lionelvillard deleted the mt-ping branch August 28, 2020 15:38
@lionelvillard lionelvillard restored the mt-ping branch September 3, 2020 15:41
@lionelvillard lionelvillard deleted the mt-ping branch September 3, 2020 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test-and-release Test infrastructure, tests or release cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.