Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update (rewrite) overview, motivation, and data plane #24

Closed
wants to merge 17 commits into from

Conversation

evankanderson
Copy link
Member

I'm afraid these ended up being mostly-complete rewrites.

I plan to merge broker.md, channel.md, spec.md, and interfaces.md into a control-plane.md document, which will address both lifecycle and resource interactions as well as event routing semantics.

My current plan is probably to remove helper.md and sources.md, as the former seems like implementation-specific guidance, and the latter feels like it might be better in a 1.1 document along with the Discovery API for sources. I'm happy to be convinced otherwise, but conformance testing for sources feels somewhat uncharted.

@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label May 6, 2021
@knative-prow-robot knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 6, 2021
@evankanderson evankanderson force-pushed the re-spect-part-1 branch 3 times, most recently from 1e38d41 to 2d25faf Compare May 6, 2021 18:49
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
If a recipient chooses to reply to a sender with a `200` response code and a
reply event in the absence of a `Prefer: reply` header, the sender SHOULD treat
the event as accepted, and MAY log an error about the unexpected payload. The
sender MUST NOT process the reply event if it did not advertise the `Prefer:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section brings the Prefer: reply to a mandatory and I thought before it was more of a "you can provide this, but the sink is free to ignore it, but if you say "Prefer: " and get a reply, you are free to drop the event without warning.

This language makes the header mandatory, and it is not used anywhere yet, so it would be an API change for eventing at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this attempts to make the expected behavior for e.g. a receiving function explicit.

I'm willing to change this, but the current formulation implemented in eventing makes it difficult for a function to know if the reply might be dropped, which seems to go against the goal of avoiding message loss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't change this yet, because I'd like to discuss whether it makes more sense to change the contract or the implementation.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, here's some more background on this.
knative/eventing#4560

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the background, and the concern that adding MUST language around this header would make previous versions of Knative incompatible with the 1.0 spec.

knative/eventing#4560 (comment)

I'm wondering if we have a requirement that versions of Knative prior to the 1.0 spec MUST be compliant? If so, what is our boundary?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this functionality is not written in the reference implementation, I would be inclined to not make it a MUST.

The original PR was using SHOULD and MAY, no MUST NOT nor MUST

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original PR did not include any MUST clauses.

Unfortunately, I have a conflict with the eventing meeting today (interview); is there a way you could see this becoming a "MUST", or do you think it always needs to be a MAY/SHOULD?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it as always a MAY/SHOULD for 1.0

@n3wscott
Copy link

n3wscott commented May 7, 2021

Relates to knative/eventing#4595

Copy link
Contributor

@vaikas vaikas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
If a recipient chooses to reply to a sender with a `200` response code and a
reply event in the absence of a `Prefer: reply` header, the sender SHOULD treat
the event as accepted, and MAY log an error about the unexpected payload. The
sender MUST NOT process the reply event if it did not advertise the `Prefer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, here's some more background on this.
knative/eventing#4560

specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
Copy link
Member Author

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I think I've address most of the issues, and I re-ordered the overview to put the interfaces first.

specs/eventing/overview.md Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
@bsnchan
Copy link

bsnchan commented May 12, 2021

Just wanted to say THANK YOU for doing the hard to write this up @evankanderson

@evankanderson
Copy link
Member Author

/assign @duglin

specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
Comment on lines 44 to 47
The current version of this document does not describe protocol negotiation or
the ability to upgrade an HTTP 1.1 event delivery into a more efficient protocol
such as GRPC, AMQP, or the like. It is expected that a future compatible version
of this specification might describe a protocol negotiation mechanism.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should remove this sentence, It doesn't make sense to me to describe the spec future inside the spec itself. Maybe we can have such claims somewhere else, but not in the spec.

Maybe just keep:

The current version of this document does not describe protocol negotiation or the ability to upgrade an HTTP 1.1 event delivery into another protocol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this so that readers of the specification would be aware when building compatible implementations. Since we don't have a separate version number, it's possible that an implementation built against this specification might be run against a future specification which would offer extra fields. The (older) implementation should not reject the event because of additional fields that it doesn't understand.

(For example, if the Upgrade header is used, the older implementation should just ignore the protocol upgrade offer.)

Copy link

@slinkydeveloper slinkydeveloper May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, but what you're talking about is a concern of the future spec, not of the existing one: when we're going to add such mechanisms, we need to make sure they'll be backwards compatible, in the sense that old implementations should still work properly.

On the other hand, this sentence sounds wrong, because we're saying in the spec what the future of the spec might look like, which is just going to confuse the reader that needs to implement the spec today. And also, what if we're never going to keep this promise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important for implementers to realize that there may be future-compatible extensions. How about the following language:

The current version of this document does not describe protocol negotiation or any delivery mechanism other than HTTP 1.1. Future versions may define protocol negotiation to optimize delivery; compliant implementations SHOULD aim to interoperate by ignoring unrecognized negotiation options (such as HTTP Upgrade headers).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording you proposed is definitely better, but it still doesn't sound right to me writing such "future proof" sentence inside the spec. I prefer just keeping the first sentence of this new wording, but it's ok if you want to have all the text block.

For me from Future versions onward should go in a doc like a primer: #24 (comment)

specs/eventing/data-plane.md Outdated Show resolved Hide resolved
Comment on lines +77 to +82
Senders MAY probe the recipient with an
[HTTP OPTIONS request](https://tools.ietf.org/html/rfc7231#section-4.3.7); if
implemented, the recipient MUST indicate support for the POST verb using the
[`Allow` header](https://tools.ietf.org/html/rfc7231#section-7.4.1). Senders
which receive an error when probing with HTTP OPTIONS SHOULD proceed using the
HTTP POST mechanism.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually implemented by somebody? At least in our ref implementation, it's not implemented at all. Can you remove it now and maybe we add it later?

The reason I ask is that there is another CE spec, called Webhook, which I believe overlaps with this sentence knative/eventing#3092

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I found it in one of the existing eventing docs. This seems to be compatible with https://github.com/cloudevents/spec/blob/master/http-webhook.md.

The CE WebHook specification provides the following for webhook authorization:

Any system that allows registration of and delivery of notifications to arbitrary HTTP endpoints ...

  • Sender sends an HTTP OPTIONS request with Webhook-Request-Origin (which should match the Origin header sent on each event delivery) and optionally Webhook-Request-Callback and Webhook-Request-Rate.
  • Responder replies with Allow: POST and WebHook-Allowed-Origin and preferably WebHook-Allowed-Rate (required if WebHook-Request-Rate is set). If the responder needs to do some manual setup and Webhook-Request-Callback is provided, the responder can respond without headers and then use the callback URL later to grant access.

Since this is "sender MAY" and "if implemented, recipient MUST respond with Allow: POST, I think that lines up. The "receive an error when probing ... SHOULD" allows for the webhook authorization behavior, but doesn't require it to be implemented.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this is implemented, neither supported in any way right now... So I prefer to remove it now and add it later, when we'll have discussed and implemented it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. If the spec is now rewritten in a way mandating certain implementation work, that's not there, is IMO not flying well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting; it appears that the current Brokers are not compliant with https://datatracker.ietf.org/doc/html/rfc7231#section-7.4.1

The actual set of allowed methods is defined by the origin server at
the time of each request. An origin server MUST generate an Allow
field in a 405 (Method Not Allowed) response and MAY do so in any
other response. An empty Allow field value indicates that the
resource allows no methods, which might occur in a 405 response if
the resource has been temporarily disabled by configuration.

curl -vv -X OPTIONS -H "Host: broker-ingress.knative-eventing.svc.cluster.local" http://127.0.0.1:8090/default/example-broker
*   Trying 127.0.0.1:8090...
* Connected to 127.0.0.1 (127.0.0.1) port 8090 (#0)
> OPTIONS /default/example-broker HTTP/1.1
> Host: broker-ingress.knative-eventing.svc.cluster.local
> User-Agent: curl/7.71.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 405 Method Not Allowed
< Date: Thu, 20 May 2021 18:29:47 GMT
< Content-Length: 0
< 

In any case, the current Broker implementation matches this clause: my sender sent an HTTP OPTIONS, and the recipient did not support HTTP OPTIONS, so I proceed with POST delivery.

specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
Comment on lines 111 to 112
be retried? What about `405` (Method Not Allowed), 413 (Payload Too Large), 414
(URI Too Long), 426 (Upgrade Required), 431 (Header Fields Too Large), 451
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was working off of the following:

https://github.com/knative/specs/blob/main/specs/eventing/data-plane.md#http-support

Which specified 200/202, 404, and 400 (unparseable CloudEvent). This is somewhat contradicted by:

https://github.com/knative/specs/blob/main/specs/eventing/data-plane.md#error-handling

If Sink is not returning HTTP success header (200 or 202) then the event may be sent again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to default to 4xx = no retry, should any of following codes be retried?

  • 401 (Unauthorized)
  • 403 (Forbidden)
  • 404 (Not Found)
  • 405 (Method Not Allowed)
  • 408 (Request Timeout)
  • 412 (Precondition Failed)
  • 413 (Payload Too Large)
  • 414 (URI Too Long)
  • 421 (Misdirected Request)
  • 426 (Upgrade Required)
  • 431 (Request Header Fields Too Large)

Currently, the spec specifies:

If an HTTP request's URL does not correspond to an existing endpoint, then the Sink MUST respond with 404 Not Found.

I interpreted this to mean that 404s were retriable, but the current spec says that every error "may be sent again" (non-spec language), which doesn't really line up with the code implementation.

I've adjusted this to only retry the three codes mentioned and all other 400s are non-retriable, though the 404 case seems strange to me, and I'd love to get more details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure on all the "Too Long" things,

Also not sure on 426, as that indicates the current protocol is refused. I doubt we do upgrade...

Copy link
Member Author

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews!

I've updated things again; one semantic change:

  • 4xx responses default to not-retriable except for 404, 409, and 429.

specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
Comment on lines 44 to 47
The current version of this document does not describe protocol negotiation or
the ability to upgrade an HTTP 1.1 event delivery into a more efficient protocol
such as GRPC, AMQP, or the like. It is expected that a future compatible version
of this specification might describe a protocol negotiation mechanism.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this so that readers of the specification would be aware when building compatible implementations. Since we don't have a separate version number, it's possible that an implementation built against this specification might be run against a future specification which would offer extra fields. The (older) implementation should not reject the event because of additional fields that it doesn't understand.

(For example, if the Upgrade header is used, the older implementation should just ignore the protocol upgrade offer.)

Comment on lines +77 to +82
Senders MAY probe the recipient with an
[HTTP OPTIONS request](https://tools.ietf.org/html/rfc7231#section-4.3.7); if
implemented, the recipient MUST indicate support for the POST verb using the
[`Allow` header](https://tools.ietf.org/html/rfc7231#section-7.4.1). Senders
which receive an error when probing with HTTP OPTIONS SHOULD proceed using the
HTTP POST mechanism.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I found it in one of the existing eventing docs. This seems to be compatible with https://github.com/cloudevents/spec/blob/master/http-webhook.md.

The CE WebHook specification provides the following for webhook authorization:

Any system that allows registration of and delivery of notifications to arbitrary HTTP endpoints ...

  • Sender sends an HTTP OPTIONS request with Webhook-Request-Origin (which should match the Origin header sent on each event delivery) and optionally Webhook-Request-Callback and Webhook-Request-Rate.
  • Responder replies with Allow: POST and WebHook-Allowed-Origin and preferably WebHook-Allowed-Rate (required if WebHook-Request-Rate is set). If the responder needs to do some manual setup and Webhook-Request-Callback is provided, the responder can respond without headers and then use the callback URL later to grant access.

Since this is "sender MAY" and "if implemented, recipient MUST respond with Allow: POST, I think that lines up. The "receive an error when probing ... SHOULD" allows for the webhook authorization behavior, but doesn't require it to be implemented.

specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/overview.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
Copy link
Member Author

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated feedback.

My intent is not to add new requirements except the Prefer: Reply header, which I'm willing to send a PR for next week. I think event recipients knowing whether or not a reply event is going to the bit-bucket is important enough to implement for the next release.

It would also be nice for Brokers and Channels to be RFC 7231 (HTTP/1.1) compliant with respect to the Allow header, but I think the spec language doesn't require that except indirectly.

Comment on lines 44 to 47
The current version of this document does not describe protocol negotiation or
the ability to upgrade an HTTP 1.1 event delivery into a more efficient protocol
such as GRPC, AMQP, or the like. It is expected that a future compatible version
of this specification might describe a protocol negotiation mechanism.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important for implementers to realize that there may be future-compatible extensions. How about the following language:

The current version of this document does not describe protocol negotiation or any delivery mechanism other than HTTP 1.1. Future versions may define protocol negotiation to optimize delivery; compliant implementations SHOULD aim to interoperate by ignoring unrecognized negotiation options (such as HTTP Upgrade headers).

Comment on lines +77 to +82
Senders MAY probe the recipient with an
[HTTP OPTIONS request](https://tools.ietf.org/html/rfc7231#section-4.3.7); if
implemented, the recipient MUST indicate support for the POST verb using the
[`Allow` header](https://tools.ietf.org/html/rfc7231#section-7.4.1). Senders
which receive an error when probing with HTTP OPTIONS SHOULD proceed using the
HTTP POST mechanism.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting; it appears that the current Brokers are not compliant with https://datatracker.ietf.org/doc/html/rfc7231#section-7.4.1

The actual set of allowed methods is defined by the origin server at
the time of each request. An origin server MUST generate an Allow
field in a 405 (Method Not Allowed) response and MAY do so in any
other response. An empty Allow field value indicates that the
resource allows no methods, which might occur in a 405 response if
the resource has been temporarily disabled by configuration.

curl -vv -X OPTIONS -H "Host: broker-ingress.knative-eventing.svc.cluster.local" http://127.0.0.1:8090/default/example-broker
*   Trying 127.0.0.1:8090...
* Connected to 127.0.0.1 (127.0.0.1) port 8090 (#0)
> OPTIONS /default/example-broker HTTP/1.1
> Host: broker-ingress.knative-eventing.svc.cluster.local
> User-Agent: curl/7.71.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 405 Method Not Allowed
< Date: Thu, 20 May 2021 18:29:47 GMT
< Content-Length: 0
< 

In any case, the current Broker implementation matches this clause: my sender sent an HTTP OPTIONS, and the recipient did not support HTTP OPTIONS, so I proceed with POST delivery.

specs/eventing/data-plane.md Outdated Show resolved Hide resolved
Comment on lines 117 to 121
in return to the sender. Event recipients MUST use the
[`source` and `id` attributes](https://github.com/cloudevents/spec/blob/v1.0.1/spec.md#required-attributes)
to determine duplicate events (see [observability](#observability) for an
example case where other event attributes may vary from one delivery attempt to
another).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Ville said -- CloudEvents requires that each source ensure that id is globally-for-all-time unique for events emitted by that source.

If a destination system such as a Knative Service can't handle the events idempotently, then it may need something like a lookaside cache (e.g. Redis) where it can track the events it has already processed. This isn't built into Knative serving, and is a state store that's outside of Knative. That's hardly unique to Knative; AWS Lambda has similar requirements and solutions, 2, 3.

I'm happy to rewrite this section to make the two points clearer -- would the following help?

As specified in the CloudEvents specification, event recipients MUST use the source and id attributes to determine duplicate events if needed. This specification does not describe state requirements for clients which need to detect duplicate events. (See observability for anexample case where other event attributes may vary from one delivery attempt toanother)

Comment on lines 135 to 136
[`traceparent` and `tracestate` distributed tracing attributes](https://github.com/cloudevents/spec/blob/v1.0/extensions/distributed-tracing.md)
may be modified in this way for each delivery attempt of the same event.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I pointed out that was a mess a while ago; I can also link directly to the w3c in case a future version of the CE spec removes the tracecontext content.

specs/eventing/overview.md Outdated Show resolved Hide resolved
Comment on lines 117 to 121
in return to the sender. Event recipients MUST use the
[`source` and `id` attributes](https://github.com/cloudevents/spec/blob/v1.0.1/spec.md#required-attributes)
to determine duplicate events (see [observability](#observability) for an
example case where other event attributes may vary from one delivery attempt to
another).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added a note that idempotency was a reasonable way to deal with duplicate events.

| other `2xx` | (Unspecified) | No\* | No\* | Yes\* |
| other `3xx` | (Unspecified) | No\* | No\* | Yes\* |
| `400` | Unparsable event | No | No | Yes |
| `404` | Endpoint does not exist | Yes | No | Yes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a Yes in the Retry column means that the sender MUST retry or that it MAY retry if it wants. I'm not sure I would want to mandate a Retry on a 404.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or on any error TBH

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a SHOULD follow the sender's defined retry behavior, so this is a "MAY".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L134 specifies this as a "SHOULD":

Where possible, event senders SHOULD re-attempt delivery of events where the HTTP request failed or returned a retryable status code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still uncomfortable with a SHOULD for a 404 and 5xx errors. I can buy an argument that 429 might be retried, but the others (and probably 409) don't give me any reason to believe that a subsequent replay of the event would result in any different result. Maybe 404 if you believe the sink is still coming up I guess, but all of the others are just asking to be DDOSs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a MAY if the sender has reason to believe that a retry might yield different results?

[observability](#observability) for an example case where other event attributes
may vary from one delivery attempt to another).

Where possible, event senders SHOULD re-attempt delivery of events where the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SHOULD may be too strong here. In serving we specifically stopped the networking layer from retrying on 5xx errors because it ended up spamming the ksvc when the ksvc was correctly returning a 503. Even with a MAY I'd still like to see some text that talks about how the sender needs to be selective about it to avoid DDOSing the sink.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the mention of congestion control in the next sentence sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good thing to mention, but it still pushes people towards resending... just slower. I keep having flashbacks to wasting my time debugging a ksvc trying to figure out why I was seeing 3 times the errors in its logs... and it was because of the network retries. I think it's sufficient to talk about how it's ok to retry if the sender thinks a new result will happen, but not just because it failed with a 5xx

event senders implement some form of congestion control (such as exponential
backoff) when managing retry timing. This specification does not document any
specific congestion control algorithm or parameters.
[Brokers](./overview.md#broker) and [Channels](./overview.md#channel) MUST
Copy link
Contributor

@duglin duglin May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the previous comment, MUST may be too strong here. I don't want my broker/channel to spam my ksvc if the error returned by the ksvc is an error and it's not going to change no matter how many times it is called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to lean on congestion control a bit here to avoid overwhelming the destination, possibly including discarding some messages?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added:

Congestion control MAY cause event delivery to fail or retry attempts to be skipped

(Delivery failure would trigger dead-letter behavior, see #25 for details of that.)


In some applications, an event receiver might emit an event in reaction to a
received event. Components MAY to choose to support this pattern by accepting an
encoded CloudEvent in the HTTP response. The sender SHOULD NOT assume that a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the SHOULD NOT here is very interesting when paired with the "might emit an event in reaction to a received event". The sentence above sure implies a relationship to me. So, if we're explicitly saying that it could be a totally unrelated event how can components like a Sequence provide any guarantees that it'll work properly? Can you explain why this SHOULD NOT is needed? Seems like we're opening the door for people to ask "ok, if they can be unrelated then how can I indicate that it IS related? Or how do I send a related response later?"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from here:
https://github.com/knative/specs/blob/main/specs/eventing/data-plane.md#http-support

If Sink is Callable it MAY respond with 200 OK and a single event in the HTTP response. A returned event is not required to be related to the received event. The Callable should return a successful response if the event was processed successfully. If there is no event to send back then Callable Sink MUST respond with 2xx HTTP and with empty body.

TBH, I'm not sure why that's here, and I'd be happy to remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3wscott @slinkydeveloper @vaikas do you know why this is here? W/o a clear reason for why it could be a random event, I'd prefer to remove this and then leave the implication be that it's related. That leave it open for us to "clarify" it later by adding text to say that it can be random or mandate that it be related once we get more feedback.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 with removing it. I don't know why that sentence is here.

reply event in the absence of a `Prefer: reply` header, the sender SHOULD treat
the event as accepted, and MAY log an error about the unexpected payload. The
sender MUST NOT process the reply event if it did not advertise the
`Prefer: reply` capability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can mandate what a sender does w.r.t. the reply at all since we can't verify it. Or put another way, we can only control the interactions between the sender and receiver, not the sender's actions after that interaction.

Likewise, it feels wrong to push people to treat it as an error to get a response w/o the Prefer header. There are going to be lots of sinks that have no idea about this header and to encourage the sender to treat them all as errors doesn't feel right. Especially when they can't actually do anything about it. Granted it's a MAY but it feels like we're encouraging it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a header that the sender controls. I've attempted to clarify. This is saying "if a sender does not set Prefer: reply in the request and receives a response, it should not process the response". I'm not sure how a sender would reach that situation by accident (except that our existing Channels / Broker didn't implement the extension we described, but I'm going to PR that shortly).

This also provides a sink the ability to determine whether they are on the subscriber or reply leg of a Subscription; subscriber should advertise Prefer: reply, while reply doesn't have a place to handle the reply and shouldn't send that header.

In terms of determining the MUST for Brokers and Channels, we can actually determine what it does with the response message, because processing the event means re-enqueing it in the Broker or sending it to the reply Addressable. We can have the subscriber target refuse to send an event (or have it log an error or what-not) if the delivered event doesn't have the expected Prefer: reply header.

1. Services can be connected to create new applications
- without modifying producer or consumer.
1. Services can be connected to create new applications:
- without modifying producer or consumer
Copy link
Contributor

@duglin duglin May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use common terms... introducing "consume" in this doc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you're suggesting here; in the data plane doc, receiver is being used specifically to refer to the "origin server" side of HTTP, whereas this is intended to speak about the logical connections and flow of events. (All of motivation should use the produce/consume terminology, I think.)


1. Services can be connected to create new applications
- without modifying producer or consumer.
1. Services can be connected to create new applications:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what this one is trying to say. Do you mean we can add new sinks into the eventing infra/mesh w/o impacting existing producers and consumers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, this is saying that new event sources and sinks can be connected to the topology without needing to change any code in the producers or consumers.

Compare this with explicitly configuring exchanges in AMQP or topics in Kafka, where consuming from two exchanges or topics would required updating the code to create two subscriptions or consumer groups.

- [Resource type overview](overview.md)
- [Interface contracts](interfaces.md)
- [Object model specification](spec.md)
Knative Eventing also defines patterns to simplify the construction and usage of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have an example of what you had in mind with this last sentence?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Destination/Addressable and the high availability of Channels and Brokers are the two benefits to the event producer side. For the event consumer side, I think the reply pattern and retry policies along with CloudEvents-over-HTTP are the main contributors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to cut this sentence if it doesn't feel like it belongs, though.


### Addressable

**Addressable** resources expose a resource address (HTTP URL) in their `status`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/address/address/ to make it clear that's the name of the property. And we should show an example to make it even more clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### Destination

**Destination** is an interface (resource fragment) which is used consistently
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you call it an "interface", but Addressable is a "resource"... it might be better to use consistent terms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a discussion with @slinkydeveloper about this:

I think Addressable is a contract, because you can access an object (via the K8s apiserver), and process it based on unpacking the partial schema.

You can't do this for a Destination, because Destination is not rooted at the base of the object. Instead, you'll find Destinations in several different places in spec, but they all work the same. I wanted to distinguish between duck typing and "common fragments like ObjectReference", because common fragments don't have the same ability to work across variant Kinds.

Destination is a different type of thing than Addressable (which is a duck type). This is more akin to a K8s ObjectReference -- a common type used in many places, but which can't be operated on abstractly in the same way as a duck type.

simple URL or relative to an **Addressable** object reference; it also supports
a Kubernetes Service object reference (as a special case). An absolute URL in a
Destination may be used to reference cluster-external resources such as a
virtual machine or SaaS service.
Copy link
Contributor

@duglin duglin May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show exactly what the resource definitions look like and make them normative? This applies to all resources define here. Having just textual descriptions leaves too much to the imagination.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- Topology-based event routing ([`messaging.knative.dev`](#messaging))

- Content-based event routing ([`eventing.knative.dev`](#eventing))
Copy link
Contributor

@duglin duglin May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knowing the history here I understand why you wrote this, but w/o more context (or explaining these concepts and how they related to one another) this is hard for a newbie to follow or understand why you're mentioning this. These two terms are never used again. I'd suggest we either remove this section and just talk about the individual resources below (including how they related to each other), or go into a lot more detail about the meta-model so people understand it better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at #25 (which includes this PR), you'll see that these are referenced there: https://github.com/knative/specs/blob/99bb2e0cb0e33f36dfd3a4475ebf08668fbca2f3/specs/eventing/control-plane.md#content-based-routing. I broke the PR up so people could get started and didn't need to digest all 1500 lines at once.

The Knative Eventing API provides primitives for two common event-processing
patterns:

- Topology-based event routing ([`messaging.knative.dev`](#messaging))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these *.knative.dev namespaces significant from a spec perspective?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because the Kubernetes apiserver will serve these from:

/apis/messaging.knative.dev/v1/namespaces/{namespace}/channels/{name}

for Channels, for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - then I think we need to add some RFC keywords in here to mandate that these be used to ensure interop.

may reference additional implementation-specific configuration options via a
reference to an external object (either a kubernetes built-in like ConfigMap or
a custom object); the format of the external objects is intentionally not
standardized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing normative in here so what are people meant to do with this section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to be something like the CE primer, i.e. to provide background on the intent. Compare with https://github.com/knative/specs/blob/main/specs/serving/overview.md

Copy link
Member Author

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duglin I'm going to resolve comments where I made a change that mostly lines up with your suggestions, and ask you to ack or request further changes on the remaining comments that aren't resolved.

Thanks for the review! It definitely helped make this part clearer and stronger.


A **Sink** MUST be able to handle duplicate events.
- Knative Eventing assumes a sender-driven (push) event delivery system. That
is, each event processor is actively responsible for an event until it is
Copy link
Member Author

@evankanderson evankanderson May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it actually means:

Each sender is responsible for an event until it receives an acknowledgement from the receiver. Each receiver MUST NOT send an acknowledgement for an event until it is ready to take responsibility for the event.

That's a little longer than "each event processor is actively responsible for an event", but maybe it's clearer?

A **Sink** MUST be able to handle duplicate events.
- Knative Eventing assumes a sender-driven (push) event delivery system. That
is, each event processor is actively responsible for an event until it is
handled (or affirmatively delivered to all following recipients).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment. It's like "hot potato" -- you're responsible for not dropping the potato until you hand it to someone else.

[version 0.3](https://github.com/cloudevents/spec/blob/v0.3/http-transport-binding.md)
with restrictions and extensions specified below.
- Knative Eventing aims to make writing
[event sources](./overview.md#event-source) and event-processing software
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few different roles:

  • Event Sender - the party initiating the HTTP POST in this document
  • Event Receiver - the party receiving the HTTP POST in this document
  • Event Processor - a component which may be an "event sender", "event receiver", or both
  • Source - an event processor which is only ever an "event sender"
  • Sink - an event processor which is only ever an "event receiver"

Would a glossary of these terms near the top help?

I also notice that this says "make writing event sources and ... easier to write". I took out the first "writing"

specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Outdated Show resolved Hide resolved
- [Resource type overview](overview.md)
- [Interface contracts](interfaces.md)
- [Object model specification](spec.md)
Knative Eventing also defines patterns to simplify the construction and usage of
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to cut this sentence if it doesn't feel like it belongs, though.


- Topology-based event routing ([`messaging.knative.dev`](#messaging))

- Content-based event routing ([`eventing.knative.dev`](#eventing))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at #25 (which includes this PR), you'll see that these are referenced there: https://github.com/knative/specs/blob/99bb2e0cb0e33f36dfd3a4475ebf08668fbca2f3/specs/eventing/control-plane.md#content-based-routing. I broke the PR up so people could get started and didn't need to digest all 1500 lines at once.


### Addressable

**Addressable** resources expose a resource address (HTTP URL) in their `status`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### Destination

**Destination** is an interface (resource fragment) which is used consistently
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a discussion with @slinkydeveloper about this:

I think Addressable is a contract, because you can access an object (via the K8s apiserver), and process it based on unpacking the partial schema.

You can't do this for a Destination, because Destination is not rooted at the base of the object. Instead, you'll find Destinations in several different places in spec, but they all work the same. I wanted to distinguish between duck typing and "common fragments like ObjectReference", because common fragments don't have the same ability to work across variant Kinds.

Destination is a different type of thing than Addressable (which is a duck type). This is more akin to a K8s ObjectReference -- a common type used in many places, but which can't be operated on abstractly in the same way as a duck type.

simple URL or relative to an **Addressable** object reference; it also supports
a Kubernetes Service object reference (as a special case). An absolute URL in a
Destination may be used to reference cluster-external resources such as a
virtual machine or SaaS service.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| `400` | Unparsable event | No | No | Yes |
| `404` | Endpoint does not exist | Yes | No | Yes |
| `409` | Conflict / Processing in progress | Yes | No | Yes |
| `429` | Too Many Requests / Overloaded | Yes | No | Yes |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec should explicitly reference https://datatracker.ietf.org/doc/html/rfc7231#section-7.1.3 as stated here: knative-extensions/eventing-kafka#666 (comment)

The CloudEvent Webhook spec says:

... The sender MUST observe the value of the Retry-After header and refrain from sending further requests until the indicated time.

With a proper reference to 7231 we should be good, right?

@evankanderson
Copy link
Member Author

/close

We're going to merge #25 and close this.

@knative-prow-robot
Copy link

@evankanderson: Closed this PR.

In response to this:

/close

We're going to merge #25 and close this.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

odacremolbap pushed a commit to odacremolbap/specs that referenced this pull request Jun 19, 2022
Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.