-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make pull-based receive adapter HA and scalable #3157
Comments
Additional note: it seems that Knative service is the right solution for the proxy. Only possibility is to implement the proxy in |
Additional note 2: direct event delivery does not have to be reliable (no guarantees). Think of the proxy as a trimmed down version of the IMC without the fan-out and without the channel/subscription. |
I have some questions about some specific things i don't get:
|
@slinkydeveloper does this answer your questions? |
I think I lack some context to fully understand the implications and provide some meaningful feedback here 😄 Do you have any document explaining this + the mt architecture and the reasons behind it? |
I'll have to dig up some old issues/PRs I guess :-) In a nutshell, the multi-tenant receive adapters are more resource efficient compare to their single-tenant equivalent. For instance, most of the time the stping receive adapter is idle, wasting resources. Instead let's have one not-so-idle mtping receive adapter handling hundred or thousands of pingsources. That's a win. Note that this is not only for PingSources, it applies to other sources (like GitHub). However what we lost with the multi-tenant receive adapters is scalability, thus this proposal. |
Isn't this more like some "ksvc" acting as a source, and have some sink injected ? |
@matzew yes. |
Note that if knServices supported async natively (meaning returning a 202 quickly) then I think it would lessen the need for this because then the clients (event sources) get their outbound connection resources freed-up quickly, but the sink (the ksvc) would still scale up based on it processing lots of async requests and not be limited to just scaling on inbound requests (which are not there once the 202 is returned) |
You should inject a channel between the source and the n consumers. |
I agree with injecting a channel. I wonder if the SinkBinding controller and the SinkBinding library is the right place to do this. WDYT @n3wscott ? |
I would not overload sink binding to do this. First how would you tell the control plane who a subscriber should be? I think this comes back to the idea of leveraging subscriptions to any object? I am not sure why this needs to be a magic layer in Knative to be honest, we have the building blocks to provide this, and the Flow would have been the object at the highest level of the model to enable what you asking for. Perhaps it is time to think about Flow again? Tldr: I think this request should be solved with a new layer or abstraction, not overloading the objects in the current model with complexity. |
I don't think we need a new abstraction. Direct delivery is IMHO a good abstraction, we (Knative authors) "just" need to make the implementation efficient and scalable (which might sounds magic). I agree we do have (almost?) all the building blocks we need. We just need to figure out how to put them together so that direct delivery scales out-of-the-box. |
For the record, PingSource is special because of its bursty and predicable traffic behavior. For schedule below a certain threshold (e.g. 2mn) connections should be kept opened. |
It feels a bit like this issue is trying to cover many different topics and I'm getting confused. If it's just me, then ignore this comment:
Some questions:
If we can narrow down which problem we're trying to solve I think it would help focus the discussion. |
It's more an issue for pull model.
Not in this issue. Trying to reduce a bit the scope of this overarching issue: #2153.
Yes. This issue applies to multi-tenant receive adapters that can handle many outbound connections.
That's one possible solution. The one I'm favoring right now.
this is an attempt to reduce the scope of this discussion.
Yes. On the other hand there is an opportunity to solve the big scalability and HA (sorry adding something else on the table) issue once and for all. Maybe too ambitious ? |
I updated the title to better reflect what this issue is about. I'm working on a PoC heavily influenced by the Per-Reconciler Leader Election Feature Track. The initial plan is:
I haven't looked at other pull-based sources like KafkaSource or PrometheusSource in great details but I don't see a reason why the leader election approach shouldn't work for scaling. It should work for CouchDB. If you are interested in this topic and want to help, let me know! |
ah ok so you want to use leader election in order to perform static partitioning? |
yes :-) |
The k8s leader election impl is buggy: kubernetes/kubernetes#91942, and not ideal. We need a way plug alternative implementations. |
Targeting 0.16 for mtping. /milestone 0.16 |
@lionelvillard: You must be a member of the knative/knative-milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This issue is stale because it has been open for 90 days with no |
The current multi-tenant receive adapter architecture does not scale which is a problem particularly for direct event delivery (from a source to a service).
Some background context:
The multi-tenant receive adapter does not scale and there is a limit on the number of in-flight requests a single pod can handle. As an example, 100,000 in-flight requests requires 3GB and 3-4 (or more) CPUs. The number of in-flight requests can be very high due to direct delivery to synchronous and potentially long running services. 100,000 inflight requests is 10,000 PingSources scheduled to send event every minute to a 10mn long service. Most importantly, the latency increases proportionally to the number of in-flight requests (about 1-2 seconds per 500 requests).
I can think of two solutions:
TL;DR I'm leaning towards adding a general purpose event forwarding service to help with sending events to potentially long running, synchronous destination. This service can be internally used by sources. For instance, the PingSource controller can decide to forward events to the proxy when there are more than 500 direct PingSources. Potentially all multi-tenant sources could benefit from it.
I'd like to gather people thoughts on this issue so please add your comments below.
The text was updated successfully, but these errors were encountered: