Multitenant PingSource #2716

lionelvillard · 2020-03-06T22:00:33Z

Proposed Changes

PingSources are now handled by a global jobrunner pod lazily installed knative-eventing
PingSources with the eventing.knative.dev/scope: resource are handled by a single pod (as before)

Release Note

PingSources are now by default handled by a global multi-tenant job runner service. To restore the previous behavior, add the eventing.knative.dev/scope: resource annotation to PingSource custom objects.

Docs

knative/docs#2288

mattmoor

Produced via:
gofmt -s -w $(find -path './vendor' -prune -o -path './third_party' -prune -o -type f -name '*.go' -print)

mattmoor

Produced via:
gofmt -s -w $(find -path './vendor' -prune -o -path './third_party' -prune -o -type f -name '*.go' -print)

cmd/ping/kodata/HEAD

n3wscott · 2020-03-07T00:37:01Z

I think this is the wrong direction to go. I would rather use jobs: #2478

lionelvillard · 2020-03-09T12:58:02Z

@n3wscott The problem with CronJob is that a pod is created each time an event is sent. Creating/deleting pods is not cheap specially compare to goroutine.

Note that SinkBinding + CronJob is still valid when the scope of PingSource is set to resource.

mattmoor

Produced via:
gofmt -s -w $(find -path './vendor' -prune -o -path './third_party' -prune -o -type f -name '*.go' -print)

n3wscott · 2020-03-09T15:28:10Z

@n3wscott The problem with CronJob is that a pod is created each time an event is sent. Creating/deleting pods is not cheap specially compare to goroutine.

It is cheap if it only runs once a week.

Note that SinkBinding + CronJob is still valid when the scope of PingSource is set to resource.

It is valid regardless, the question is where that job is launched.

mattmoor

Produced via:
gofmt -s -w $(find -path './vendor' -prune -o -path './third_party' -prune -o -type f -name '*.go' -print)

mattmoor · 2020-03-09T15:41:28Z

pkg/reconciler/pingsource/controller/pingsource.go

@@ -14,12 +14,14 @@ See the License for the specific language governing permissions and
 limitations under the License.
 */

-package pingsource
+package controller

 import (
 	"context"
 	"encoding/json"
 	"fmt"


Format Go code:

Suggested change

"fmt"

"fmt"

n3wscott · 2020-03-09T16:57:51Z

I still don't think we should go this way. We should use real cronjobs and sinkbindings.

lionelvillard · 2020-03-09T16:57:55Z

It is cheap if it only runs once a week.

Sure, but that's not what this PR is about. We want to be able to handle thousand of cronjobs, each potentially triggering every second.

lionelvillard · 2020-03-09T17:14:46Z

| I still don't think we should go this way

I'm quite puzzled. Why? It seems you think cronjobs can handle thousand of pingsources, each potentially triggering every second.

n3wscott · 2020-03-09T19:45:38Z

| I still don't think we should go this way

I'm quite puzzled. Why? It seems you think cronjobs can handle thousand of pingsources, each potentially triggering every second.

Cron is only supported every minute, even today in PingSource 0.13. If you need a load tester, write a load tester.

lionelvillard · 2020-03-09T19:54:24Z

oops right I meant every minute. BTW I did write a load tester and currently porting it to go

pkg/apis/sources/v1alpha2/ping_validation.go

vaikas · 2020-03-09T20:09:36Z

pkg/reconciler/pingsource/controller/controller.go

 	deploymentInformer.Informer().AddEventHandler(cache.FilteringResourceEventHandler{
 		FilterFunc: controller.Filter(v1alpha1.SchemeGroupVersion.WithKind("PingSource")),
 		Handler:    controller.HandleAll(impl.EnqueueControllerOf),
 	})

+	// Watch for the global deployment
+	gr := func(obj interface{}) {
+		impl.GlobalResync(pingSourceInformer.Informer())


why do we need a global resync, maybe add a comment here.

I assume it's because the shared deployment for multi-tenant ones will need to requeue all the ping sources, but if there's only one there, can we not just use tracker for it and only watch / notify on changes to that deployment?

What's the difference between the tracker and the informer with a FilterFunc?

Well, I'm not sure why we are doing a global resync here. with the tracker you know if it's using the shared one and you can set up the tracker to only queue the particular ping source when things change. Here you're saying (unless I'm mistaken) that whenever the shared deployment changes, enqueue all the ping sources regardless of if they are affected by the change or not.

ah got it, thanks! You are not mistaken.

This is related to the open discussion about whether to allow mixing cluster and namespace scoped dispatchers/jobrunners. Let's say we allow it (and offer config knobs to restrict it) then using the tracker is a win in that case. Working on it...

pkg/reconciler/pingsource/controller/pingsource_test.go

pkg/reconciler/pingsource/controller/pingsource.go

vaikas · 2020-03-09T20:23:16Z

I was also curious about the seconds, vs. minutes, as we'd have to expand the types to have more granular time duration. So, the plan is to only have it fire at most once a minute?

aslom · 2020-03-11T15:41:05Z

Is there an issue that describes why we need to reimplement ping source and what are the options? I see it discussed in PRs #2478 and #2716 but I could not find related issues - could you add link here?

n3wscott · 2020-03-11T16:27:10Z

@n3wscott said (see #2478 (comment)):

I don't really think a valid alternate impl is a new pod that runs 100% of the time to do what a job running 0.01% of the time can do.

A couple of things:

the new pod consumes almost no cpu, just memory

It takes a reserve in the cluster. So it is non-zero always.

the actual time of a job is the job run time (negligible as you pointed out) plus the time to manage the pod (creation, deletion, collecting metrics, etc...). Soon enough (few PingSource, each triggering every minute), you end up with a collection of jobs actually running 100% of the time.

This is a job of combining the jobs that have the same schedule, which could be a different task than what is being proposed. Then at most it is one job per minute.

test/e2e/source_ping_test.go

pkg/reconciler/pingsource/jobrunner/runner_test.go

pkg/reconciler/pingsource/jobrunner/runner.go

pkg/reconciler/pingsource/controller/pingsource.go

jasonpet · 2020-03-12T16:20:36Z

I am totally on board with this multi-tenant implementation. I have helped develop and support the IBM Cloud Functions alarm/cron event provider for the last 4 years and know from experience that we see far more customer adoption for this event provider than any of the others. My new team is exploring Knative Eventing and would love to see a multi-tenant implementation for cron such as this. The current cron job implementation in Knative Eventing would be far too costly for us.

grantr · 2020-03-12T16:26:34Z

@jasonpet does the IBM Cloud Functions alarm/cron event provider allow customers to create schedules with period less than 1 minute (e.g. every 10 seconds)? If so, do customers use that feature?

lionelvillard · 2020-03-12T18:36:48Z

@grantr it used to but this feature has been removed. The minimal period is now 1 minute.

lionelvillard · 2020-03-12T19:19:01Z

tracker has been added.
/unhold

lionelvillard · 2020-03-12T19:52:17Z

/test pull-knative-eventing-integration-tests

knative-metrics-robot · 2020-03-12T20:12:16Z

The following is the coverage report on the affected files.
Say /test pull-knative-eventing-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/apis/sources/v1alpha1/ping_validation.go	81.8%	91.7%	9.8
pkg/apis/sources/v1alpha2/ping_validation.go	87.5%	100.0%	12.5
pkg/reconciler/pingsource/controller/controller.go	Do not exist	100.0%
pkg/reconciler/pingsource/controller/pingsource.go	Do not exist	63.9%
pkg/reconciler/pingsource/controller/resources/eventtype.go	Do not exist	100.0%
pkg/reconciler/pingsource/controller/resources/jobrunner.go	Do not exist	100.0%
pkg/reconciler/pingsource/controller/resources/jobrunner_service.go	Do not exist	100.0%
pkg/reconciler/pingsource/controller/resources/labels.go	Do not exist	100.0%
pkg/reconciler/pingsource/controller/resources/receive_adapter.go	Do not exist	100.0%
pkg/reconciler/pingsource/jobrunner/controller.go	Do not exist	81.0%
pkg/reconciler/pingsource/jobrunner/pingsource.go	Do not exist	61.1%
pkg/reconciler/pingsource/jobrunner/runner.go	Do not exist	92.3%

lionelvillard · 2020-03-12T21:19:12Z

@vaikas I implemented the tracker and added a couple of more tests.

vaikas · 2020-03-13T17:31:28Z

/lgtm
/approve

knative-prow-robot · 2020-03-13T17:31:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lionelvillard, vaikas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [lionelvillard,vaikas]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cmd/ping/jobrunner/kodata/HEAD

mattmoor · 2020-03-13T23:45:14Z

pkg/reconciler/pingsource/jobrunner/pingsource.go

+
+	cronRunner       *cronJobsRunner
+	pingsourceLister sourceslisters.PingSourceLister
+	impl             *controller.Impl


This looks unused, and is kind of a strange thing to have the reconciler have as a field (it creates a cycle in the layering).

it's coming from the in-mem reconciler. Same problem there...

mattmoor · 2020-03-13T23:45:45Z

pkg/reconciler/pingsource/jobrunner/pingsource.go

+}
+
+func (r *Reconciler) reconcile(key string, source *v1alpha2.PingSource) error {
+	if source.DeletionTimestamp != nil {


without finalizers, it isn't guaranteed that you'll actually flow through here.

indeed! thanks!

knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 6, 2020

googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Mar 6, 2020

knative-prow-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 6, 2020

knative-prow-robot requested review from grantr and liu-cong March 6, 2020 22:00

knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2020

mattmoor reviewed Mar 6, 2020

View reviewed changes

n3wscott reviewed Mar 7, 2020

View reviewed changes

cmd/ping/kodata/HEAD Show resolved Hide resolved

knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2020

lionelvillard force-pushed the mt-ping branch from 290b96a to e1db80c Compare March 9, 2020 15:20

knative-prow-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 9, 2020

mattmoor reviewed Mar 9, 2020

View reviewed changes

lionelvillard changed the title ~~[WIP] Multitenant PingSource~~ Multitenant PingSource Mar 9, 2020

knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 9, 2020

mattmoor reviewed Mar 9, 2020

View reviewed changes

lionelvillard mentioned this pull request Mar 9, 2020

document multi-tenant PingSource knative/docs#2288

Closed

knative-prow-robot added the area/test-and-release Test infrastructure, tests or release label Mar 9, 2020

vaikas reviewed Mar 9, 2020

View reviewed changes

hoist scoping constants, add more tests

84aab70

aslom mentioned this pull request Mar 11, 2020

PingSource implemented as a CronJob and SinkBinding #2478

Closed

use tracker

32668b6

lionelvillard force-pushed the mt-ping branch from 636923f to 32668b6 Compare March 11, 2020 18:16

vaikas reviewed Mar 11, 2020

View reviewed changes

various fixes

f54bbbd

fix unit test + one more test

31b78ac

knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 12, 2020

fix traker

c7e092c

knative-prow-robot assigned vaikas Mar 13, 2020

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2020

knative-prow-robot merged commit 1a1f5b0 into knative:master Mar 13, 2020

mattmoor reviewed Mar 13, 2020

View reviewed changes

cmd/ping/jobrunner/kodata/HEAD Show resolved Hide resolved

mattmoor reviewed Mar 13, 2020

View reviewed changes

lberk mentioned this pull request Mar 26, 2020

add pingsource finalizer #2759

Merged

lionelvillard deleted the mt-ping branch August 28, 2020 15:38

lionelvillard restored the mt-ping branch September 3, 2020 15:41

lionelvillard deleted the mt-ping branch September 3, 2020 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multitenant PingSource #2716

Multitenant PingSource #2716

lionelvillard commented Mar 6, 2020 •

edited

Loading

mattmoor left a comment

mattmoor left a comment

n3wscott commented Mar 7, 2020

lionelvillard commented Mar 9, 2020

mattmoor left a comment

n3wscott commented Mar 9, 2020

mattmoor left a comment

mattmoor Mar 9, 2020

n3wscott commented Mar 9, 2020

lionelvillard commented Mar 9, 2020

lionelvillard commented Mar 9, 2020

n3wscott commented Mar 9, 2020

lionelvillard commented Mar 9, 2020

vaikas Mar 9, 2020

vaikas Mar 9, 2020

lionelvillard Mar 10, 2020

vaikas Mar 10, 2020

lionelvillard Mar 11, 2020

vaikas commented Mar 9, 2020

aslom commented Mar 11, 2020

n3wscott commented Mar 11, 2020

jasonpet commented Mar 12, 2020

grantr commented Mar 12, 2020

lionelvillard commented Mar 12, 2020

lionelvillard commented Mar 12, 2020

lionelvillard commented Mar 12, 2020

knative-metrics-robot commented Mar 12, 2020

lionelvillard commented Mar 12, 2020

vaikas commented Mar 13, 2020

knative-prow-robot commented Mar 13, 2020

mattmoor Mar 13, 2020

lionelvillard Mar 16, 2020

mattmoor Mar 13, 2020

lionelvillard Mar 16, 2020

Multitenant PingSource #2716

Multitenant PingSource #2716

Conversation

lionelvillard commented Mar 6, 2020 • edited Loading

Proposed Changes

mattmoor left a comment

Choose a reason for hiding this comment

mattmoor left a comment

Choose a reason for hiding this comment

n3wscott commented Mar 7, 2020

lionelvillard commented Mar 9, 2020

mattmoor left a comment

Choose a reason for hiding this comment

n3wscott commented Mar 9, 2020

mattmoor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

n3wscott commented Mar 9, 2020

lionelvillard commented Mar 9, 2020

lionelvillard commented Mar 9, 2020

n3wscott commented Mar 9, 2020

lionelvillard commented Mar 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vaikas commented Mar 9, 2020

aslom commented Mar 11, 2020

n3wscott commented Mar 11, 2020

jasonpet commented Mar 12, 2020

grantr commented Mar 12, 2020

lionelvillard commented Mar 12, 2020

lionelvillard commented Mar 12, 2020

lionelvillard commented Mar 12, 2020

knative-metrics-robot commented Mar 12, 2020

lionelvillard commented Mar 12, 2020

vaikas commented Mar 13, 2020

knative-prow-robot commented Mar 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lionelvillard commented Mar 6, 2020 •

edited

Loading