What happens when... for JSON Batch Format. #414

n3wscott · 2019-04-05T04:50:23Z

I have few implementation questions around json batch format.

Is it expected that the receiver is delivered the array? Or can an implementation not expose batching and deliver the events one at a time?
What happens when that receiver has an issue with one of n of these events? The response code should be?
If a valid response is [], is that also a valid request? IE, should an SDK prevent you from sending[]?

The text was updated successfully, but these errors were encountered:

cneijenhuis · 2019-04-05T08:01:58Z

I'm not sure I'm getting the question correctly. Batching is optional and must only be used if both producer and consumer support it.
If you have a setup with a middleware, the middleware can add or remove the batching, or change the batch size.

Examples:
Publisher pushes single events every second (no batching). Middleware is a queue and aggregates events. Consumer polls the queue (with batching) every 5 seconds, and the Middleware responds with a batch of 5 events.
Publisher publishes a batch of 5 events. Middleware takes this batch, transforms each event from JSON into Protobuf and sends them one-by-one via grpc.

In other words, the publisher MUST expect that the batch will be broken up. It is just a temporary optimization on one hop, and probably will change on the next hop.

Does that help?

Probably 400, but the webhook spec is not very prescriptive in terms of error responses. Do you think it makes sense to specify a response format, where the status is declared per event?
Clemens says "no": HTTP Transport Binding for batching JSON #370 (comment) 😉

duglin · 2019-06-13T02:04:24Z

@clemensv what was your intention w.r.t. batch responses?

IMO w.r.t. the response if there's an issue with one of the events... I think the answer might be the same as what would happen if the processing of a single event failed 1/2 thru it's business logic. Do we roll it all back? Do you try to respond with some partial success error code? I think the same applies to batching and the options are:

200 : successfully processed the incoming event or events
202 : successfully accepted the incoming event or events
4xx/5xx : was not able to process the incoming event or events - how far it got is unknown. Whether it rolled back any is unknown. At least from our spec perspective, since we don't deal with processing models.

Net: I think we could add text to the primer stating this.

JemDay · 2019-06-13T15:33:32Z

FYI .. I've been really slow adding any language to the webhook spec for this .. sorry.

cneijenhuis · 2019-06-20T08:16:17Z

I hereby volunteer to open a small PR, but I'm fine with Scott or Jem driving this as well 😉

duglin · 2019-06-20T12:31:16Z

@cneijenhuis thanks a ton! I know @n3wscott is super swamped these days would appreciate any help.

Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>

Clarifications on batching to address #414

n3wscott · 2019-07-25T14:59:32Z

How about a new proposal for how batching works, the bones would be:

contenttype remains application/cloudevents+batch

but the payload changes, from [{<events>},...]

To a structured form:

{"events":[]}

This allows for batch extensions that do not mix with the events themselves.

And the response is changed to to something that allows us to understand the acceptance of each message, like:

{
  "result":{
     "<event-id>": {<result object>}
  }
}

And result object is TBD but would have the general properties of "status":"accepted" or "status":"error". This could be transport dependent.

Needs more thought, but in general, this would solve my issue with batch.

Then I can do this,

Send: A, B, C
Collector turns this into

{"events":[A, B, C]}

Invoker sends A, B, C to a consumer and records the resp for each. B fails.
Invoker replies (assuming HTTP) with:

{
	"result": {
		"id-a": {
			"status": "accepted"
		},
		"id-b": {
			"status": "failed",
			"message": "some error"
		},
		"id-c": {
			"status": "accepted"
		}
	}
}

n3wscott · 2019-07-25T15:04:11Z

Prior art: http://docs.oasis-open.org/odata/odata-json-format/v4.01/csprd02/odata-json-format-v4.01-csprd02.html#sec_BatchRequestsandResponses

rperelma · 2019-07-25T19:42:16Z

Reason I strongly prefer array at the top level, is because then it is just an array of our existing definition of cloud events, vs. introducing a new construct to represent a batch. In other words, I strongly prefer to leave as-is.

n3wscott · 2019-07-25T20:13:10Z

@rperelma how do you deal with ack'ing upstream?

rperelma · 2019-07-25T20:14:12Z

@n3wscott We do all or nothing.

cneijenhuis · 2019-07-26T07:41:18Z

I think we have two issues, and we don't necessarily do us a favor by mixing them:

Some prefer a JSON object instead of a JSON array at the top level
We don't have a way to ack'ing individual events

For the first, I can open a PR to have a base for the WG to discuss this. I don't have a preference either way, to be honest.

For the second... I initially tried to open a PR for "batching if you use both JSON and HTTP". Clemens rightfully called out these two should be separate: one could use JSON over UDP, or XML-batching over HTTP.

The problem with your proposal is where I define the response.

Either I define the response in the JSON format. Then I implicitly require that a) any transport that receives a JSON batch will also be able to reply with JSON and b) any native transport error/status handling must be overwritten with what I define in the JSON format.
I think b) is the "killer": How does "status": "failed" map back to a 400 or 500 response?

Alternatively, I can define the response in the HTTP transport, with the "tools" I've got at the transport layer. That means I can't use JSON (because I may have to support e.g. XML in the future), but I could use HTTP headers.
This sounds like a more reasonable option to me. We could also try to reuse the HTTP status codes in the event-specific response. The overall HTTP status code is then the highest HTTP status of any event.
E.g.

HTTP/1.1 400 BadRequest
ce-<event-id-a>: 200
ce-<event-id-b>: 400
ce-<event-id-c>: 200

or

HTTP/1.1 400 BadRequest
ce-status-200: <event-id-a>, <event-id-c>
ce-status-400: <event-id-b>

@n3wscott Does that make sense? Should I try to open a PR for it?

duglin · 2019-07-26T11:47:02Z

For the moment ignore that we might say too much in our current http specs, but I'd prefer to not head down this path. Once you open this door we're not just hinting about a processing model, we're inventing one. Opening the batching door will soon get people to ask us thing like how they indicate that they want processing to stop on the first failure - and other orchestration actions.

I'd like to keep our scope to just "adding metadata to an existing event and it's transport". There's part of me that does think defining batching might have gone too far because it gets the questions being raised by this issue, but we tried to limit it by saying that we're just defining a format (like our structured single event format), and not much more.

One way out of this is be very clear about our scope and "punt" on it. Meaning, say something like:

This specification defines a format for how multiple CloudEvents can appear in a single structure formatted message. However, due to the limited scope of this project, this specification will not address how to process those events nor how the receiver is to indicate the status (or possible response) related to processing those events. That is an implementation detail to be agreed upon out of band between the sender and receiver of those batched events, or by some other specification.

I think this is actually consistent with what we do for singular CE messages - we don't say anything about how the transport works. I just looked at the HTTP spec and we don't even talk about 200's or 202's or 500's in there - which I think is correct. The webhook spec does, and I think we need to look at that and question why....

alanconway · 2019-07-26T13:40:33Z

I'm also wary of adding batching to the spec. If the application sender is specifying how messages should be batched then the batch boundaries may have significance for the application receiver. Such a batch is *not* a stream of Events - it is a weird single event with many event-like data structures packed into it - neither the batch nor the sub-events can be properly identified, acked, responded to. timestamped or otherwise fulfill any of the responsibilities of a proper Event except the one of carrying some data. I think this use case is much better modeled as a single, normal event with an application-specified array in its body. Avoid special cases in the CE and transport code and give the application full control over defining their own batching strategies *within* an event using arrays, indexed maps, compression or whatever is needed.

…

On Fri, Jul 26, 2019 at 7:47 AM Doug Davis ***@***.***> wrote: For the moment ignore that we might say too much in our current http specs, but I'd prefer to not head down this path. Once you open this door we're not just hinting about a processing model, we're inventing one. Opening the batching door will soon get people to ask us thing like how they indicate that they want processing to stop on the first failure - and other orchestration actions. I'd like to keep our scope to just "adding metadata to an existing event and it's transport". There's part of me that does think defining batching might have gone too far because it gets the questions being raised by this issue, but we tried to limit it by saying that we're just defining a format (like our structured single event format), and not much more. One way out of this is be very clear about our scope and "punt" on it. Meaning, say something like: ------------------------------ This specification defines a format for how multiple CloudEvents can appear in a single structure formatted message. However, due to the limited scope of this project, this specification will not address how to process those events nor how the receiver is to indicate the status (or possible response) related to processing those events. That is an implementation detail to be agreed upon out of band between the sender and receiver of those batched events, or by some other specification. ------------------------------ I think this is actually consistent with what we do for singular CE messages - we don't say anything about how the transport works. I just looked at the HTTP spec and we don't even talk about 200's or 202's or 500's in there - which I think is correct. The webhook spec does, and I think we need to look at that and question why.... — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#414?email_source=notifications&email_token=AB3LUXS5CCRQJH53UNWWCI3QBLP3XA5CNFSM4HDYUEX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD24LHVI#issuecomment-515421141>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB3LUXXI3IPBKIU3PA6YK7LQBLP3XANCNFSM4HDYUEXQ> .

lionelvillard · 2019-07-26T13:57:29Z

From an interoperability perspective, we cannot let applications define their own batching strategies. Event importers need to know which format to use independently of what application is going to receive these events.

duglin · 2019-07-26T14:01:30Z

From an interoperability perspective, we cannot let applications define their own batching strategies. Event importers need to know which format to use independently of what application is going to receive these events.

This is why I was ok with us defining the format - I wanted syntax interop because I feared each impl might define their own way of encoding it. I'm just not sure we can (or should) define the semantic processing side of it beyond to say "your normal transport/processing rules apply". At least not w/o another year of work :-) and a much broader scope (IMO).

n3wscott · 2019-07-26T14:10:01Z

batching could have been implemented as an extension and inside of data the batched events could be housed in an array like Alan says.

I think we should talk about removing batch for v1.

lionelvillard · 2019-07-26T14:26:29Z

If it's done as an extension, what happens when, let says, an importer supports the batching extension but the consumer does not?

cneijenhuis · 2019-07-26T14:26:49Z

I'm also wary of adding batching to the spec.

@alanconway That's great, because we already decided we won't add batching to the spec itself, for the reason you've outlined. See last section of https://github.com/cloudevents/spec/blob/master/primer.md#design-goals 😉

However, we can't NOT support batching if some transports already natively support it. Note that in all transports with batching that I have ever seen, a batch carries no semantic meaning and the boundaries are arbitrary - it is just an optimization at the transport layer, because it is often faster to transport a batch of events.

And for many transports, once you define how to map a single CloudEvent into their proprietary format, you can automatically use both the batched and the non-batched APIs. E.g. Kafka, SQS, Pub/Sub.

To keep the discussion focused, can we please stick to the topic of batching for a specific transport - HTTP - and not whether batching should be a first-class-citizen of the spec? Thanks!

cneijenhuis · 2019-07-26T14:40:47Z

I think this is actually consistent with what we do for singular CE messages - we don't say anything about how the transport works. I just looked at the HTTP spec and we don't even talk about 200's or 202's or 500's in there - which I think is correct. The webhook spec does, and I think we need to look at that and question why....

This is what I tried to get at with my comment in the last call.

If the webhook spec lived in a separate repository, it'd be much clearer where the CloudEvents scope ends, and it'd be clear that CloudEvents goal is NOT to provide a blue-print for implementing an eventing system over HTTP.

However, the webhook spec should IMO be enough to implement an eventing system - otherwise I don't see what the value of it is.

duglin · 2019-07-26T14:43:24Z

@cneijenhuis I'm hoping to find time to look at the webhook spec more closely today, or this weekend, but do you have a sense for how it should change (or which parts should be removed) to keep it useful but still within the scope of CE?

cneijenhuis · 2019-07-26T19:47:29Z

@duglin I think we can't measure the webhook spec with the same yard stick as "everything" else. It is outside of the scope of CE, and this is already acknowledged in the primer:

If required to assure broader interoperability, the CloudEvents specification set provides specific constraints for event delivery using a particular application protocol. The HTTP Webhook specification is not specific to CloudEvents and can be used to post any kind of one-way event and notifications to a conformant HTTP endpoint. However, the lack of such a specification elsewhere makes it necessary for CloudEvents to define it.

Put it another way, in the Design goal section, we state:

The specifications will not focus on the processing model of either the event producer or event consumer.

But a webhook specification that ignores the processing model is incomplete (to put it nicely 😉). Which is why we have to go through the extra justification I quoted above...

My suggestion is to move the webhook spec into a dedicated repository - within the Serverless WG, but outside of CloudEvents (or a least the spec). At this point, we can treat webhooks like any other transport, and can assume that each transport comes with it's own processing model, and a way of communicating processing states between producers and consumers.
We'd also be free to be a bit more opinionated in the webhook spec, and fully embrace defining a processing model.
And, finally, we can get CloudEvents to 1.0 and can evolve the webhook spec independently.

duglin · 2019-08-01T14:20:49Z

@cneijenhuis I took another look at the web-hook spec and I agree with you that it really doesn't belong in the CE repo. It doesn't even really deal with CEs at all, it's more of a generic HTTP spec/profile - and in some cases duplicates what's already written in the HTTP specs. If there are some bits of that docs that really need to be mentioned for CE over http then we should add that text to the http transport spec - but in my quick scan of the doc I couldn't see anything obvious.

As for moving the spec to some other repo... I don't have an opinion on that other than I wonder whether it makes sense to be under the Serverless WG. It's not even really a serverless specific issue. Feels more like an IETF RFC type of doc.

But, before we decide where/if it should go (if anywhere), what do other people think about removing it? Clearly @clemensv will want to chime in next week when he's back from vacation - although removing it before then could make for an interesting surprise. :-)

I'll open a new issue so we can discuss this independently of the batching discussion.

So, back to the issue at hand... batching.

I think the options we have are:
1 - keep batching but define the processing model of the consumer so that the sender get an accurate status for each event within the batch - e.g. return an array of http status codes in the http case
2 - keep batching but limit it to the shape of the message and assume each transport's normal processing/status model covers things. I believe this is what the spec does today.
3 - remove batching from the spec but add something to the primer saying that people could do batching but it becomes a business level problem. Which means the sender needs to batch up the events into data and the consumer needs to know this and break them up as appropriate as part of the business processing logic - not a transport level thing
4 - remove batching and say nothing about it

Did I miss an option?
Please add a comment with your POV so we can get a sense for where the group's thinking is on this.

rperelma · 2019-08-01T16:27:47Z

I vote for # 2, keep as-is

n3wscott · 2019-08-05T17:01:16Z

The group has discussed this a lot and the end result is to leave processing model out of the CloudEvents spec. We can revisit how to process Batch in the processing model if we get to defining it.

Closing this.

duglin added the v1.0 label Jun 8, 2019

duglin assigned n3wscott Jun 13, 2019

duglin added the needs work label Jun 20, 2019

This was referenced Jun 20, 2019

Should we document unhappy paths? #434

Closed

Add section to primer about error handling #466

Merged

cneijenhuis added a commit to cneijenhuis/spec that referenced this issue Jun 27, 2019

Clarifications on batching to address cloudevents#414

7a6958d

cneijenhuis added a commit to cneijenhuis/spec that referenced this issue Jun 27, 2019

Clarifications on batching to address cloudevents#414

d0aa718

Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>

cneijenhuis mentioned this issue Jun 27, 2019

Clarifications on batching to address #414 #469

Merged

duglin removed needs work labels Jun 27, 2019

duglin pushed a commit that referenced this issue Jul 11, 2019

Merge pull request #469 from cneijenhuis/batch-414

819e257

Clarifications on batching to address #414

duglin closed this as completed in #466 Jul 25, 2019

duglin reopened this Jul 25, 2019

duglin mentioned this issue Aug 1, 2019

Consider removing the web-hook spec #476

Closed

n3wscott closed this as completed Aug 5, 2019

n3wscott mentioned this issue Sep 15, 2021

How to send a batch events? cloudevents/sdk-go#716

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happens when... for JSON Batch Format. #414

What happens when... for JSON Batch Format. #414

n3wscott commented Apr 5, 2019

cneijenhuis commented Apr 5, 2019

duglin commented Jun 13, 2019

JemDay commented Jun 13, 2019

cneijenhuis commented Jun 20, 2019

duglin commented Jun 20, 2019

n3wscott commented Jul 25, 2019

n3wscott commented Jul 25, 2019

rperelma commented Jul 25, 2019

n3wscott commented Jul 25, 2019

rperelma commented Jul 25, 2019

cneijenhuis commented Jul 26, 2019

duglin commented Jul 26, 2019

alanconway commented Jul 26, 2019 via email

lionelvillard commented Jul 26, 2019

duglin commented Jul 26, 2019

n3wscott commented Jul 26, 2019

lionelvillard commented Jul 26, 2019

cneijenhuis commented Jul 26, 2019

cneijenhuis commented Jul 26, 2019

duglin commented Jul 26, 2019

cneijenhuis commented Jul 26, 2019

duglin commented Aug 1, 2019

rperelma commented Aug 1, 2019

n3wscott commented Aug 5, 2019

What happens when... for JSON Batch Format. #414

What happens when... for JSON Batch Format. #414

Comments

n3wscott commented Apr 5, 2019

cneijenhuis commented Apr 5, 2019

duglin commented Jun 13, 2019

JemDay commented Jun 13, 2019

cneijenhuis commented Jun 20, 2019

duglin commented Jun 20, 2019

n3wscott commented Jul 25, 2019

n3wscott commented Jul 25, 2019

rperelma commented Jul 25, 2019

n3wscott commented Jul 25, 2019

rperelma commented Jul 25, 2019

cneijenhuis commented Jul 26, 2019

duglin commented Jul 26, 2019

alanconway commented Jul 26, 2019 via email

lionelvillard commented Jul 26, 2019

duglin commented Jul 26, 2019

n3wscott commented Jul 26, 2019

lionelvillard commented Jul 26, 2019

cneijenhuis commented Jul 26, 2019

cneijenhuis commented Jul 26, 2019

duglin commented Jul 26, 2019

cneijenhuis commented Jul 26, 2019

duglin commented Aug 1, 2019

rperelma commented Aug 1, 2019

n3wscott commented Aug 5, 2019