Make pull-based receive adapter HA and scalable #3157

lionelvillard · 2020-05-18T17:34:17Z

The current multi-tenant receive adapter architecture does not scale which is a problem particularly for direct event delivery (from a source to a service).

Some background context:

The original receive adapter architecture scales but at the cost of consuming lot of resources and does not scale down to zero.
For PingSources specifically, the propose CronJob and SinkBinding model scales (arguably since the more CronJobs are being scheduled, the more inaccurate they become), but consumes lot of resources. It scales down to zero though.
The multi-tenant receive adapter architecture is designed to reduce the resource inefficiency inherent to the above architectures at the cost of not being scalable.
For the record, here the general requirement to make sources scalable: Make Knative eventing sources more serverless and scalable #2153

The multi-tenant receive adapter does not scale and there is a limit on the number of in-flight requests a single pod can handle. As an example, 100,000 in-flight requests requires 3GB and 3-4 (or more) CPUs. The number of in-flight requests can be very high due to direct delivery to synchronous and potentially long running services. 100,000 inflight requests is 10,000 PingSources scheduled to send event every minute to a 10mn long service. Most importantly, the latency increases proportionally to the number of in-flight requests (about 1-2 seconds per 500 requests).

I can think of two solutions:

static partitioning: the controller (e.g. the PingSource controller) assigns CRs to a particular receive adapter. The difficulty is to come up with a proper partitioning function that does not waste resources and rebalancing is a challenge (e.g. how to avoid duplicate events)
load balancing via event forwarding proxy: events are sent to a stateless event proxy service. This service distributes requests to the backend pods with the sole purpose to forward events to the actual target. One question is what load balancing strategy to use. By default, k8s load balancing is random but in this particular case an IPVS-based load balancer with least connection strategy seems to be the right solution. Maybe random distribution works but what happens when a request is sent to an overloaded pod?

TL;DR I'm leaning towards adding a general purpose event forwarding service to help with sending events to potentially long running, synchronous destination. This service can be internally used by sources. For instance, the PingSource controller can decide to forward events to the proxy when there are more than 500 direct PingSources. Potentially all multi-tenant sources could benefit from it.

I'd like to gather people thoughts on this issue so please add your comments below.

lionelvillard · 2020-05-18T18:05:56Z

Additional note: it seems that Knative service is the right solution for the proxy. Only possibility is to implement the proxy in knative-sandbox and to have a configuration parameter named event-forwarding-proxy in eventing.

lionelvillard · 2020-05-19T13:46:56Z

Additional note 2: direct event delivery does not have to be reliable (no guarantees). Think of the proxy as a trimmed down version of the IMC without the fan-out and without the channel/subscription.

slinkydeveloper · 2020-05-19T19:37:34Z

I have some questions about some specific things i don't get:

About static partitioning: You cannot watch the same CRD instances from several controllers, unless you find some way to mutually exclude them (eg a "third part controller" applies a label that partition the CR, assigning it to a specific receive adapter instance). Even in that case, this sounds quite hard and convoluted to implement as you underlined.
About load balancing and Additional note 2, When i read this thread i had this exact feeling, when you have such a complex use case that requires scaling, would it be better to just say to users "create a channel in the middle"? Is there any specific reason why you want to optimize this use case of direct source -> sink?

lionelvillard · 2020-05-20T14:03:52Z

About static partitioning: the idea is to let the controller assigning CRs to a given receive adapter. I agree it's hard and convoluted and that's why I don't like this solution. I just put it there for the record.
The user does not necessarily care about scaling direct event delivery. They just want it to just work. However, in a multi-tenant environment, the operator do care about it. They can't ask the use to "create a channel in the middle" because the mt receive adapter is busy handling requests from other tenants.

@slinkydeveloper does this answer your questions?

slinkydeveloper · 2020-05-20T14:08:58Z

However, in a multi-tenant environment, the operator do care about it. They can't ask the use to "create a channel in the middle" because the mt receive adapter is busy handling requests from other tenants.

I think I lack some context to fully understand the implications and provide some meaningful feedback here 😄 Do you have any document explaining this + the mt architecture and the reasons behind it?

lionelvillard · 2020-05-20T14:21:34Z

I'll have to dig up some old issues/PRs I guess :-)

In a nutshell, the multi-tenant receive adapters are more resource efficient compare to their single-tenant equivalent. For instance, most of the time the stping receive adapter is idle, wasting resources. Instead let's have one not-so-idle mtping receive adapter handling hundred or thousands of pingsources. That's a win.

Note that this is not only for PingSources, it applies to other sources (like GitHub).

However what we lost with the multi-tenant receive adapters is scalability, thus this proposal.

matzew · 2020-05-20T16:23:58Z

Isn't this more like some "ksvc" acting as a source, and have some sink injected ?

lionelvillard · 2020-05-20T17:40:55Z

@matzew yes.

duglin · 2020-05-22T13:49:14Z

Note that if knServices supported async natively (meaning returning a 202 quickly) then I think it would lessen the need for this because then the clients (event sources) get their outbound connection resources freed-up quickly, but the sink (the ksvc) would still scale up based on it processing lots of async requests and not be limited to just scaling on inbound requests (which are not there once the 202 is returned)

n3wscott · 2020-05-26T15:46:58Z

You should inject a channel between the source and the n consumers.

lionelvillard · 2020-05-26T18:45:56Z

I agree with injecting a channel. I wonder if the SinkBinding controller and the SinkBinding library is the right place to do this. WDYT @n3wscott ?

n3wscott · 2020-05-27T01:18:04Z

I would not overload sink binding to do this. First how would you tell the control plane who a subscriber should be? I think this comes back to the idea of leveraging subscriptions to any object? I am not sure why this needs to be a magic layer in Knative to be honest, we have the building blocks to provide this, and the Flow would have been the object at the highest level of the model to enable what you asking for. Perhaps it is time to think about Flow again?

Tldr: I think this request should be solved with a new layer or abstraction, not overloading the objects in the current model with complexity.

lionelvillard · 2020-05-28T13:33:55Z

I don't think we need a new abstraction. Direct delivery is IMHO a good abstraction, we (Knative authors) "just" need to make the implementation efficient and scalable (which might sounds magic).

I agree we do have (almost?) all the building blocks we need. We just need to figure out how to put them together so that direct delivery scales out-of-the-box.

lionelvillard · 2020-06-01T14:21:56Z

For the record, PingSource is special because of its bursty and predicable traffic behavior. For schedule below a certain threshold (e.g. 2mn) connections should be kept opened.

duglin · 2020-06-16T14:56:22Z

It feels a bit like this issue is trying to cover many different topics and I'm getting confused. If it's just me, then ignore this comment:

The current multi-tenant receive adapter architecture does not scale which is a problem particularly for direct event delivery (from a source to a service).

Some questions:

are we talking about push or pull model? e.g github vs kafka model
are we talking about scaling the adapter due to the size/amount of incoming events?
are we talking about how to deal with outbound connections (adapters -> sink) due to those connections potentially being long-lived and we might run out of available connections?
are we talking about how to partition CRs across adapters that need to also scale (either due to the incoming events or due to the large # of CRs) ?
where does "direct event delivery" fit into any of this discussion? (aside from the outbound connection aspect mentioned above)
or is this focused on something totally different/?

If we can narrow down which problem we're trying to solve I think it would help focus the discussion.

lionelvillard · 2020-06-16T17:22:32Z

Some questions:

* are we talking about push or pull model?  e.g github vs kafka model

It's more an issue for pull model.

* are we talking about scaling the adapter due to the size/amount of incoming events?

Not in this issue. Trying to reduce a bit the scope of this overarching issue: #2153.

* are we talking about how to deal with outbound connections (adapters -> sink) due to those connections potentially being long-lived and we might run out of available connections?

Yes. This issue applies to multi-tenant receive adapters that can handle many outbound connections.

* are we talking about how to partition CRs across adapters that need to also scale (either due to the incoming events or due to the large # of CRs) ?

That's one possible solution. The one I'm favoring right now.

* where does "direct event delivery" fit into any of this discussion? (aside from the outbound connection aspect mentioned above)

this is an attempt to reduce the scope of this discussion.

* or is this focused on something totally different/?
If we can narrow down which problem we're trying to solve I think it would help focus the discussion.

Yes. On the other hand there is an opportunity to solve the big scalability and HA (sorry adding something else on the table) issue once and for all. Maybe too ambitious ?

lionelvillard · 2020-06-17T15:05:13Z

I updated the title to better reflect what this issue is about.

I'm working on a PoC heavily influenced by the Per-Reconciler Leader Election Feature Track. The initial plan is:

Add bucket-based leader election in the adapter shared main. Leader election is optional (the mt github adapter is already HA and scalable). Use configmap resource lock
Eventually: improve the leader-election algo provided by the k8s client SDK to also be push-based to reduce downtime.
Refactor mtping:
- use the adapter shared main (see the mt github adapter for an example)
- use ConfigMaps containing ping source schedules, instead of informers
- implement static partitioning and autoscaling in the PingSource controller

I haven't looked at other pull-based sources like KafkaSource or PrometheusSource in great details but I don't see a reason why the leader election approach shouldn't work for scaling. It should work for CouchDB.

If you are interested in this topic and want to help, let me know!

slinkydeveloper · 2020-06-18T09:25:17Z

ah ok so you want to use leader election in order to perform static partitioning?

lionelvillard · 2020-06-18T13:00:51Z

yes :-)

lionelvillard · 2020-06-23T18:18:24Z

The k8s leader election impl is buggy: kubernetes/kubernetes#91942, and not ideal. We need a way plug alternative implementations.

lionelvillard · 2020-06-24T16:33:17Z

Targeting 0.16 for mtping.

/milestone 0.16

knative-prow-robot · 2020-06-24T16:33:18Z

@lionelvillard: You must be a member of the knative/knative-milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility.

In response to this:

Targeting 0.16 for mtping.

/milestone 0.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

github-actions · 2020-12-14T01:37:42Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

lionelvillard added the kind/feature-request label May 18, 2020

lionelvillard mentioned this issue May 27, 2020

RFC: Supporting Multi-tenant receive adapters #3210

Closed

lberk added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/api labels Jun 1, 2020

lionelvillard added the area/sources label Jun 11, 2020

lionelvillard changed the title ~~Make direct event delivery scalable~~ Make pull-based receive adapter HA and scalable Jun 17, 2020

This was referenced Jun 22, 2020

receive adapters can now run in leader elected mode #3370

Merged

mtping now uses shared main adapter #3371

Merged

pass leader election config to mtping #3384

Merged

aliok mentioned this issue Jun 24, 2020

HA for eventing + label to indicate HA deployments + config for leader elected components knative/operator#134

Closed

lionelvillard mentioned this issue Jun 24, 2020

pluggable leader election strategy #3400

Merged

lionelvillard added this to the v0.16.0 milestone Jun 24, 2020

lionelvillard mentioned this issue Jun 24, 2020

apply k8s leader election graceful release fix in vendor #3405

Merged

grantr modified the milestones: v0.16.0, Backlog Jul 8, 2020

lionelvillard self-assigned this Jul 8, 2020

lionelvillard modified the milestones: Backlog, v0.17.0 Jul 8, 2020

lionelvillard mentioned this issue Jul 16, 2020

Add persistent cache store backed by ConfigMap. Refactor mtping #3451

Merged

grantr modified the milestones: v0.17.0, v0.18.0 Aug 25, 2020

grantr modified the milestones: v0.18.0, Backlog Sep 14, 2020

aslom mentioned this issue Nov 10, 2020

Improve scaling in Knative Eventing #4494

Closed

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2020

github-actions bot closed this as completed Jan 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make pull-based receive adapter HA and scalable #3157

Make pull-based receive adapter HA and scalable #3157

lionelvillard commented May 18, 2020

lionelvillard commented May 18, 2020

lionelvillard commented May 19, 2020

slinkydeveloper commented May 19, 2020 •

edited

Loading

lionelvillard commented May 20, 2020

slinkydeveloper commented May 20, 2020

lionelvillard commented May 20, 2020

matzew commented May 20, 2020

lionelvillard commented May 20, 2020

duglin commented May 22, 2020

n3wscott commented May 26, 2020

lionelvillard commented May 26, 2020

n3wscott commented May 27, 2020

lionelvillard commented May 28, 2020

lionelvillard commented Jun 1, 2020

duglin commented Jun 16, 2020

lionelvillard commented Jun 16, 2020

lionelvillard commented Jun 17, 2020

slinkydeveloper commented Jun 18, 2020

lionelvillard commented Jun 18, 2020

lionelvillard commented Jun 23, 2020

lionelvillard commented Jun 24, 2020

knative-prow-robot commented Jun 24, 2020

github-actions bot commented Dec 14, 2020

Make pull-based receive adapter HA and scalable #3157

Make pull-based receive adapter HA and scalable #3157

Comments

lionelvillard commented May 18, 2020

lionelvillard commented May 18, 2020

lionelvillard commented May 19, 2020

slinkydeveloper commented May 19, 2020 • edited Loading

lionelvillard commented May 20, 2020

slinkydeveloper commented May 20, 2020

lionelvillard commented May 20, 2020

matzew commented May 20, 2020

lionelvillard commented May 20, 2020

duglin commented May 22, 2020

n3wscott commented May 26, 2020

lionelvillard commented May 26, 2020

n3wscott commented May 27, 2020

lionelvillard commented May 28, 2020

lionelvillard commented Jun 1, 2020

duglin commented Jun 16, 2020

lionelvillard commented Jun 16, 2020

lionelvillard commented Jun 17, 2020

slinkydeveloper commented Jun 18, 2020

lionelvillard commented Jun 18, 2020

lionelvillard commented Jun 23, 2020

lionelvillard commented Jun 24, 2020

knative-prow-robot commented Jun 24, 2020

github-actions bot commented Dec 14, 2020

slinkydeveloper commented May 19, 2020 •

edited

Loading