Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happens when... for JSON Batch Format. #414

Closed
n3wscott opened this issue Apr 5, 2019 · 24 comments · Fixed by #466
Closed

What happens when... for JSON Batch Format. #414

n3wscott opened this issue Apr 5, 2019 · 24 comments · Fixed by #466
Assignees
Labels

Comments

@n3wscott
Copy link
Member

n3wscott commented Apr 5, 2019

I have few implementation questions around json batch format.

  1. Is it expected that the receiver is delivered the array? Or can an implementation not expose batching and deliver the events one at a time?
  2. What happens when that receiver has an issue with one of n of these events? The response code should be?
  3. If a valid response is [], is that also a valid request? IE, should an SDK prevent you from sending[]?
@cneijenhuis
Copy link
Contributor

  1. I'm not sure I'm getting the question correctly. Batching is optional and must only be used if both producer and consumer support it.
    If you have a setup with a middleware, the middleware can add or remove the batching, or change the batch size.

Examples:
Publisher pushes single events every second (no batching). Middleware is a queue and aggregates events. Consumer polls the queue (with batching) every 5 seconds, and the Middleware responds with a batch of 5 events.
Publisher publishes a batch of 5 events. Middleware takes this batch, transforms each event from JSON into Protobuf and sends them one-by-one via grpc.

In other words, the publisher MUST expect that the batch will be broken up. It is just a temporary optimization on one hop, and probably will change on the next hop.

Does that help?

  1. Probably 400, but the webhook spec is not very prescriptive in terms of error responses. Do you think it makes sense to specify a response format, where the status is declared per event?

  2. Clemens says "no": HTTP Transport Binding for batching JSON #370 (comment) 😉

@duglin duglin added the v1.0 label Jun 8, 2019
@duglin
Copy link
Collaborator

duglin commented Jun 13, 2019

@clemensv what was your intention w.r.t. batch responses?

IMO w.r.t. the response if there's an issue with one of the events... I think the answer might be the same as what would happen if the processing of a single event failed 1/2 thru it's business logic. Do we roll it all back? Do you try to respond with some partial success error code? I think the same applies to batching and the options are:

  • 200 : successfully processed the incoming event or events
  • 202 : successfully accepted the incoming event or events
  • 4xx/5xx : was not able to process the incoming event or events - how far it got is unknown. Whether it rolled back any is unknown. At least from our spec perspective, since we don't deal with processing models.

Net: I think we could add text to the primer stating this.

@JemDay
Copy link
Contributor

JemDay commented Jun 13, 2019

FYI .. I've been really slow adding any language to the webhook spec for this .. sorry.

@cneijenhuis
Copy link
Contributor

I hereby volunteer to open a small PR, but I'm fine with Scott or Jem driving this as well 😉

@duglin
Copy link
Collaborator

duglin commented Jun 20, 2019

@cneijenhuis thanks a ton! I know @n3wscott is super swamped these days would appreciate any help.

cneijenhuis added a commit to cneijenhuis/spec that referenced this issue Jun 27, 2019
cneijenhuis added a commit to cneijenhuis/spec that referenced this issue Jun 27, 2019
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
duglin pushed a commit that referenced this issue Jul 11, 2019
Clarifications on batching to address #414
@n3wscott
Copy link
Member Author

How about a new proposal for how batching works, the bones would be:

contenttype remains application/cloudevents+batch

but the payload changes, from [{<events>},...]

To a structured form:

{"events":[]}

This allows for batch extensions that do not mix with the events themselves.

And the response is changed to to something that allows us to understand the acceptance of each message, like:

{
  "result":{
     "<event-id>": {<result object>}
  }
}

And result object is TBD but would have the general properties of "status":"accepted" or "status":"error". This could be transport dependent.

Needs more thought, but in general, this would solve my issue with batch.


Then I can do this,

  1. Send: A, B, C

  2. Collector turns this into

{"events":[A, B, C]}
  1. Invoker sends A, B, C to a consumer and records the resp for each. B fails.

  2. Invoker replies (assuming HTTP) with:

{
	"result": {
		"id-a": {
			"status": "accepted"
		},
		"id-b": {
			"status": "failed",
			"message": "some error"
		},
		"id-c": {
			"status": "accepted"
		}
	}
}

@rperelma
Copy link
Contributor

Reason I strongly prefer array at the top level, is because then it is just an array of our existing definition of cloud events, vs. introducing a new construct to represent a batch. In other words, I strongly prefer to leave as-is.

@duglin duglin reopened this Jul 25, 2019
@n3wscott
Copy link
Member Author

@rperelma how do you deal with ack'ing upstream?

@rperelma
Copy link
Contributor

@n3wscott We do all or nothing.

@cneijenhuis
Copy link
Contributor

I think we have two issues, and we don't necessarily do us a favor by mixing them:

  • Some prefer a JSON object instead of a JSON array at the top level
  • We don't have a way to ack'ing individual events

For the first, I can open a PR to have a base for the WG to discuss this. I don't have a preference either way, to be honest.

For the second... I initially tried to open a PR for "batching if you use both JSON and HTTP". Clemens rightfully called out these two should be separate: one could use JSON over UDP, or XML-batching over HTTP.

The problem with your proposal is where I define the response.

Either I define the response in the JSON format. Then I implicitly require that a) any transport that receives a JSON batch will also be able to reply with JSON and b) any native transport error/status handling must be overwritten with what I define in the JSON format.
I think b) is the "killer": How does "status": "failed" map back to a 400 or 500 response?

Alternatively, I can define the response in the HTTP transport, with the "tools" I've got at the transport layer. That means I can't use JSON (because I may have to support e.g. XML in the future), but I could use HTTP headers.
This sounds like a more reasonable option to me. We could also try to reuse the HTTP status codes in the event-specific response. The overall HTTP status code is then the highest HTTP status of any event.
E.g.

HTTP/1.1 400 BadRequest
ce-<event-id-a>: 200
ce-<event-id-b>: 400
ce-<event-id-c>: 200

or

HTTP/1.1 400 BadRequest
ce-status-200: <event-id-a>, <event-id-c>
ce-status-400: <event-id-b>

@n3wscott Does that make sense? Should I try to open a PR for it?

@duglin
Copy link
Collaborator

duglin commented Jul 26, 2019

For the moment ignore that we might say too much in our current http specs, but I'd prefer to not head down this path. Once you open this door we're not just hinting about a processing model, we're inventing one. Opening the batching door will soon get people to ask us thing like how they indicate that they want processing to stop on the first failure - and other orchestration actions.

I'd like to keep our scope to just "adding metadata to an existing event and it's transport". There's part of me that does think defining batching might have gone too far because it gets the questions being raised by this issue, but we tried to limit it by saying that we're just defining a format (like our structured single event format), and not much more.

One way out of this is be very clear about our scope and "punt" on it. Meaning, say something like:


This specification defines a format for how multiple CloudEvents can appear in a single structure formatted message. However, due to the limited scope of this project, this specification will not address how to process those events nor how the receiver is to indicate the status (or possible response) related to processing those events. That is an implementation detail to be agreed upon out of band between the sender and receiver of those batched events, or by some other specification.


I think this is actually consistent with what we do for singular CE messages - we don't say anything about how the transport works. I just looked at the HTTP spec and we don't even talk about 200's or 202's or 500's in there - which I think is correct. The webhook spec does, and I think we need to look at that and question why....

@alanconway
Copy link
Contributor

alanconway commented Jul 26, 2019 via email

@lionelvillard
Copy link
Contributor

From an interoperability perspective, we cannot let applications define their own batching strategies. Event importers need to know which format to use independently of what application is going to receive these events.

@duglin
Copy link
Collaborator

duglin commented Jul 26, 2019

From an interoperability perspective, we cannot let applications define their own batching strategies. Event importers need to know which format to use independently of what application is going to receive these events.

This is why I was ok with us defining the format - I wanted syntax interop because I feared each impl might define their own way of encoding it. I'm just not sure we can (or should) define the semantic processing side of it beyond to say "your normal transport/processing rules apply". At least not w/o another year of work :-) and a much broader scope (IMO).

@n3wscott
Copy link
Member Author

batching could have been implemented as an extension and inside of data the batched events could be housed in an array like Alan says.

I think we should talk about removing batch for v1.

@lionelvillard
Copy link
Contributor

If it's done as an extension, what happens when, let says, an importer supports the batching extension but the consumer does not?

@cneijenhuis
Copy link
Contributor

I'm also wary of adding batching to the spec.

@alanconway That's great, because we already decided we won't add batching to the spec itself, for the reason you've outlined. See last section of https://github.com/cloudevents/spec/blob/master/primer.md#design-goals 😉

However, we can't NOT support batching if some transports already natively support it. Note that in all transports with batching that I have ever seen, a batch carries no semantic meaning and the boundaries are arbitrary - it is just an optimization at the transport layer, because it is often faster to transport a batch of events.

And for many transports, once you define how to map a single CloudEvent into their proprietary format, you can automatically use both the batched and the non-batched APIs. E.g. Kafka, SQS, Pub/Sub.

To keep the discussion focused, can we please stick to the topic of batching for a specific transport - HTTP - and not whether batching should be a first-class-citizen of the spec? Thanks!

@cneijenhuis
Copy link
Contributor

I think this is actually consistent with what we do for singular CE messages - we don't say anything about how the transport works. I just looked at the HTTP spec and we don't even talk about 200's or 202's or 500's in there - which I think is correct. The webhook spec does, and I think we need to look at that and question why....

This is what I tried to get at with my comment in the last call.

If the webhook spec lived in a separate repository, it'd be much clearer where the CloudEvents scope ends, and it'd be clear that CloudEvents goal is NOT to provide a blue-print for implementing an eventing system over HTTP.

However, the webhook spec should IMO be enough to implement an eventing system - otherwise I don't see what the value of it is.

@duglin
Copy link
Collaborator

duglin commented Jul 26, 2019

@cneijenhuis I'm hoping to find time to look at the webhook spec more closely today, or this weekend, but do you have a sense for how it should change (or which parts should be removed) to keep it useful but still within the scope of CE?

@cneijenhuis
Copy link
Contributor

@duglin I think we can't measure the webhook spec with the same yard stick as "everything" else. It is outside of the scope of CE, and this is already acknowledged in the primer:

If required to assure broader interoperability, the CloudEvents specification set provides specific constraints for event delivery using a particular application protocol. The HTTP Webhook specification is not specific to CloudEvents and can be used to post any kind of one-way event and notifications to a conformant HTTP endpoint. However, the lack of such a specification elsewhere makes it necessary for CloudEvents to define it.

Put it another way, in the Design goal section, we state:

The specifications will not focus on the processing model of either the event producer or event consumer.

But a webhook specification that ignores the processing model is incomplete (to put it nicely 😉). Which is why we have to go through the extra justification I quoted above...

My suggestion is to move the webhook spec into a dedicated repository - within the Serverless WG, but outside of CloudEvents (or a least the spec). At this point, we can treat webhooks like any other transport, and can assume that each transport comes with it's own processing model, and a way of communicating processing states between producers and consumers.
We'd also be free to be a bit more opinionated in the webhook spec, and fully embrace defining a processing model.
And, finally, we can get CloudEvents to 1.0 and can evolve the webhook spec independently.

@duglin
Copy link
Collaborator

duglin commented Aug 1, 2019

@cneijenhuis I took another look at the web-hook spec and I agree with you that it really doesn't belong in the CE repo. It doesn't even really deal with CEs at all, it's more of a generic HTTP spec/profile - and in some cases duplicates what's already written in the HTTP specs. If there are some bits of that docs that really need to be mentioned for CE over http then we should add that text to the http transport spec - but in my quick scan of the doc I couldn't see anything obvious.

As for moving the spec to some other repo... I don't have an opinion on that other than I wonder whether it makes sense to be under the Serverless WG. It's not even really a serverless specific issue. Feels more like an IETF RFC type of doc.

But, before we decide where/if it should go (if anywhere), what do other people think about removing it? Clearly @clemensv will want to chime in next week when he's back from vacation - although removing it before then could make for an interesting surprise. :-)

I'll open a new issue so we can discuss this independently of the batching discussion.


So, back to the issue at hand... batching.

I think the options we have are:
1 - keep batching but define the processing model of the consumer so that the sender get an accurate status for each event within the batch - e.g. return an array of http status codes in the http case
2 - keep batching but limit it to the shape of the message and assume each transport's normal processing/status model covers things. I believe this is what the spec does today.
3 - remove batching from the spec but add something to the primer saying that people could do batching but it becomes a business level problem. Which means the sender needs to batch up the events into data and the consumer needs to know this and break them up as appropriate as part of the business processing logic - not a transport level thing
4 - remove batching and say nothing about it

Did I miss an option?
Please add a comment with your POV so we can get a sense for where the group's thinking is on this.

@rperelma
Copy link
Contributor

rperelma commented Aug 1, 2019

I vote for # 2, keep as-is

@n3wscott
Copy link
Member Author

n3wscott commented Aug 5, 2019

The group has discussed this a lot and the end result is to leave processing model out of the CloudEvents spec. We can revisit how to process Batch in the processing model if we get to defining it.

Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants