-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What happens when... for JSON Batch Format. #414
Comments
Examples: In other words, the publisher MUST expect that the batch will be broken up. It is just a temporary optimization on one hop, and probably will change on the next hop. Does that help?
|
@clemensv what was your intention w.r.t. batch responses? IMO w.r.t. the response if there's an issue with one of the events... I think the answer might be the same as what would happen if the processing of a single event failed 1/2 thru it's business logic. Do we roll it all back? Do you try to respond with some partial success error code? I think the same applies to batching and the options are:
Net: I think we could add text to the primer stating this. |
FYI .. I've been really slow adding any language to the webhook spec for this .. sorry. |
I hereby volunteer to open a small PR, but I'm fine with Scott or Jem driving this as well 😉 |
@cneijenhuis thanks a ton! I know @n3wscott is super swamped these days would appreciate any help. |
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
Clarifications on batching to address #414
How about a new proposal for how batching works, the bones would be: contenttype remains but the payload changes, from To a structured form:
This allows for batch extensions that do not mix with the events themselves. And the response is changed to to something that allows us to understand the acceptance of each message, like:
And Needs more thought, but in general, this would solve my issue with batch. Then I can do this,
|
Reason I strongly prefer array at the top level, is because then it is just an array of our existing definition of cloud events, vs. introducing a new construct to represent a batch. In other words, I strongly prefer to leave as-is. |
@rperelma how do you deal with ack'ing upstream? |
@n3wscott We do all or nothing. |
I think we have two issues, and we don't necessarily do us a favor by mixing them:
For the first, I can open a PR to have a base for the WG to discuss this. I don't have a preference either way, to be honest. For the second... I initially tried to open a PR for "batching if you use both JSON and HTTP". Clemens rightfully called out these two should be separate: one could use JSON over UDP, or XML-batching over HTTP. The problem with your proposal is where I define the response. Either I define the response in the JSON format. Then I implicitly require that a) any transport that receives a JSON batch will also be able to reply with JSON and b) any native transport error/status handling must be overwritten with what I define in the JSON format. Alternatively, I can define the response in the HTTP transport, with the "tools" I've got at the transport layer. That means I can't use JSON (because I may have to support e.g. XML in the future), but I could use HTTP headers.
or
@n3wscott Does that make sense? Should I try to open a PR for it? |
For the moment ignore that we might say too much in our current http specs, but I'd prefer to not head down this path. Once you open this door we're not just hinting about a processing model, we're inventing one. Opening the batching door will soon get people to ask us thing like how they indicate that they want processing to stop on the first failure - and other orchestration actions. I'd like to keep our scope to just "adding metadata to an existing event and it's transport". There's part of me that does think defining batching might have gone too far because it gets the questions being raised by this issue, but we tried to limit it by saying that we're just defining a format (like our structured single event format), and not much more. One way out of this is be very clear about our scope and "punt" on it. Meaning, say something like: This specification defines a format for how multiple CloudEvents can appear in a single structure formatted message. However, due to the limited scope of this project, this specification will not address how to process those events nor how the receiver is to indicate the status (or possible response) related to processing those events. That is an implementation detail to be agreed upon out of band between the sender and receiver of those batched events, or by some other specification. I think this is actually consistent with what we do for singular CE messages - we don't say anything about how the transport works. I just looked at the HTTP spec and we don't even talk about 200's or 202's or 500's in there - which I think is correct. The webhook spec does, and I think we need to look at that and question why.... |
I'm also wary of adding batching to the spec. If the application sender is
specifying how messages should be batched then the batch boundaries may
have significance for the application receiver. Such a batch is *not* a
stream of Events - it is a weird single event with many event-like data
structures packed into it - neither the batch nor the sub-events can be
properly identified, acked, responded to. timestamped or otherwise fulfill
any of the responsibilities of a proper Event except the one of carrying
some data.
I think this use case is much better modeled as a single, normal event with
an application-specified array in its body. Avoid special cases in the CE
and transport code and give the application full control over defining
their own batching strategies *within* an event using arrays, indexed maps,
compression or whatever is needed.
…On Fri, Jul 26, 2019 at 7:47 AM Doug Davis ***@***.***> wrote:
For the moment ignore that we might say too much in our current http
specs, but I'd prefer to not head down this path. Once you open this door
we're not just hinting about a processing model, we're inventing one.
Opening the batching door will soon get people to ask us thing like how
they indicate that they want processing to stop on the first failure - and
other orchestration actions.
I'd like to keep our scope to just "adding metadata to an existing event
and it's transport". There's part of me that does think defining batching
might have gone too far because it gets the questions being raised by this
issue, but we tried to limit it by saying that we're just defining a format
(like our structured single event format), and not much more.
One way out of this is be very clear about our scope and "punt" on it.
Meaning, say something like:
------------------------------
This specification defines a format for how multiple CloudEvents can
appear in a single structure formatted message. However, due to the limited
scope of this project, this specification will not address how to process
those events nor how the receiver is to indicate the status (or possible
response) related to processing those events. That is an implementation
detail to be agreed upon out of band between the sender and receiver of
those batched events, or by some other specification.
------------------------------
I think this is actually consistent with what we do for singular CE
messages - we don't say anything about how the transport works. I just
looked at the HTTP spec and we don't even talk about 200's or 202's or
500's in there - which I think is correct. The webhook spec does, and I
think we need to look at that and question why....
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#414?email_source=notifications&email_token=AB3LUXS5CCRQJH53UNWWCI3QBLP3XA5CNFSM4HDYUEX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD24LHVI#issuecomment-515421141>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB3LUXXI3IPBKIU3PA6YK7LQBLP3XANCNFSM4HDYUEXQ>
.
|
From an interoperability perspective, we cannot let applications define their own batching strategies. Event importers need to know which format to use independently of what application is going to receive these events. |
This is why I was ok with us defining the format - I wanted syntax interop because I feared each impl might define their own way of encoding it. I'm just not sure we can (or should) define the semantic processing side of it beyond to say "your normal transport/processing rules apply". At least not w/o another year of work :-) and a much broader scope (IMO). |
batching could have been implemented as an extension and inside of data the batched events could be housed in an array like Alan says. I think we should talk about removing batch for v1. |
If it's done as an extension, what happens when, let says, an importer supports the batching extension but the consumer does not? |
@alanconway That's great, because we already decided we won't add batching to the spec itself, for the reason you've outlined. See last section of https://github.com/cloudevents/spec/blob/master/primer.md#design-goals 😉 However, we can't NOT support batching if some transports already natively support it. Note that in all transports with batching that I have ever seen, a batch carries no semantic meaning and the boundaries are arbitrary - it is just an optimization at the transport layer, because it is often faster to transport a batch of events. And for many transports, once you define how to map a single CloudEvent into their proprietary format, you can automatically use both the batched and the non-batched APIs. E.g. Kafka, SQS, Pub/Sub. To keep the discussion focused, can we please stick to the topic of batching for a specific transport - HTTP - and not whether batching should be a first-class-citizen of the spec? Thanks! |
This is what I tried to get at with my comment in the last call. If the webhook spec lived in a separate repository, it'd be much clearer where the CloudEvents scope ends, and it'd be clear that CloudEvents goal is NOT to provide a blue-print for implementing an eventing system over HTTP. However, the webhook spec should IMO be enough to implement an eventing system - otherwise I don't see what the value of it is. |
@cneijenhuis I'm hoping to find time to look at the webhook spec more closely today, or this weekend, but do you have a sense for how it should change (or which parts should be removed) to keep it useful but still within the scope of CE? |
@duglin I think we can't measure the webhook spec with the same yard stick as "everything" else. It is outside of the scope of CE, and this is already acknowledged in the primer:
Put it another way, in the Design goal section, we state:
But a webhook specification that ignores the processing model is incomplete (to put it nicely 😉). Which is why we have to go through the extra justification I quoted above... My suggestion is to move the webhook spec into a dedicated repository - within the Serverless WG, but outside of CloudEvents (or a least the spec). At this point, we can treat webhooks like any other transport, and can assume that each transport comes with it's own processing model, and a way of communicating processing states between producers and consumers. |
@cneijenhuis I took another look at the web-hook spec and I agree with you that it really doesn't belong in the CE repo. It doesn't even really deal with CEs at all, it's more of a generic HTTP spec/profile - and in some cases duplicates what's already written in the HTTP specs. If there are some bits of that docs that really need to be mentioned for CE over http then we should add that text to the http transport spec - but in my quick scan of the doc I couldn't see anything obvious. As for moving the spec to some other repo... I don't have an opinion on that other than I wonder whether it makes sense to be under the Serverless WG. It's not even really a serverless specific issue. Feels more like an IETF RFC type of doc. But, before we decide where/if it should go (if anywhere), what do other people think about removing it? Clearly @clemensv will want to chime in next week when he's back from vacation - although removing it before then could make for an interesting surprise. :-) I'll open a new issue so we can discuss this independently of the batching discussion. So, back to the issue at hand... batching. I think the options we have are: Did I miss an option? |
I vote for # 2, keep as-is |
The group has discussed this a lot and the end result is to leave processing model out of the CloudEvents spec. We can revisit how to process Batch in the processing model if we get to defining it. Closing this. |
I have few implementation questions around json batch format.
[]
, is that also a valid request? IE, should an SDK prevent you from sending[]
?The text was updated successfully, but these errors were encountered: