-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CloudEvents] CloudEvents for event bus events #77
Comments
Right, I proposed using CloudEvents because there was no sense in reinventing the wheel, but I don't know what validation would be necessary to ensure a fully correct implementation. I don't think it's the most pressing concern–validation would be nice if it's cheap to implement, but I wouldn't prioritize it over the other work needed to get a functioning system solving real use cases end-to-end. This is based on my assumption that the spec is small and flexible enough that it won't require overly invasive changes even if we do end up having to make some fixes to our output later. |
I generally agree with both of you here. Follow CloudEvents as best as we can but given that we'll have our own well defined schema validators for each event type, which are more precise, we should use those where possible to do validation instead. |
Thank @ormsbee and @feanil. To add some additional clarity, here are some required attributes according to the spec: https://github.com/cloudevents/spec/blob/v1.0.2/cloudevents/spec.md#required-attributes. I am guessing we don't have them all, and our team is prioritizing other work over reading up on this, discussing how this should be implemented, etc. This is an inform in case anyone else wants to take up this work. |
FWIW, there are four required fields listed there (id, source, specversion, type), all of which have behaviors defined for them in the fields section of OEP-41. |
Thanks for the reminder @ormsbee. I'm sure I've read those docs and forgotten about them just as many times. I appreciate that you have defined some of these for us. I can imagine implementing at least some of the missing fields once we work on observability. In my defense, there is OEP-41, the CloudEvent spec, CloudEvent for Avro spec, and the CloudEvent for Kafka spec, and it was easy to get lost among these in the past. Sorry I forgot about the OEP-41 interpretations. |
Given that OEP-41 already provides good starting values, some of which can be easily generated and others of which will only make it easier to resolve issues, I recommend spending some time-boxed effort to just add these in early rather than defer them to later. |
We'll look into what is clearly defined and easy to implement as part of the observability work. Originally, I thought the CloudEvent attributes belonged in the data. Then, after reading the kafka-protocol-binding for CloudEvents doc, I thought this data lived in the header with special prefixes. Now, I think that they belong in the data, but that there are special keys where this data is duplicated in the header? I'm still not clear. |
Additional notes:
|
@ormsbee: The OEP has the following text:
The UPDATE: Oops. This is from the CloudEvents spec, and not from the OEP. Dave - you may not have the answer in this case, but maybe you understand this? |
More notes/questions:
|
The CloudEvents spec doesn't strictly require that So for CloudEvents, it's legal for one message to have I tried to talk about this a bit in the id and source field descriptions in the OEP. |
Thanks for clarifying @ormsbee. I didn't notice that UUID was under an "Examples" section in the CloudEvents spec, and I didn't realize that it was the OEP that decided on a UUID (which was in my memory about the field). |
Yeah–I thought minorversion and sourcehost would be useful for debugging.
The CloudEvents spec for time states:
So to be spec-compliant, my sense is that we have three broad paths:
Forcing the clients to choose an explicit time is a pain, but I put it that way in the OEP because I thought it would be the least painful thing in the long term. I have a few areas of concern: Backfill scenariosLetting the producer automatically fill in the time of the event could cause garbage data to be emitted when running backfills. For instance, if there were a service listening for CourseOverview-related metadata updates, and we decided to bootstrap it by force emitting a bunch of "update" events. Or if we need to re-emit some set of events from the past week because of some bug or other. People equating time received with time of eventIf we remove the Fuzzy data inaccuracies over timeMy long term concern is that we get messy data that looks plausible but is just slightly different, sprinkled inconsistently across random parts of the codebase, making reconciliation a pain. Did that enrollment really happen during the database time or the event emission time? Those times are a second apart and that goes over a date boundary for this small set of enrollments, so which month is the revenue recognized in? These four events use the real database time, but those two others use producer time, so you can't trust them for reporting... etc... I'm not as familiar with the data pipeline as @bmtcril and others who have been on the data eng team, but my sense is that this sort of thing has been a pain point, and I'd like to address those issues early on. I'm in favor of forcing callers to explicitly set the time to make sure it actually reflects the time the event happened and matches the database records when appropriate.
You mean, can we remove it from the
Sure, though I'm not sure exactly what you're asking here. Are you thinking that the message-listening part of a service automatically emits an INFO-level log message on each incoming message so that we can see what the following log messages are responding to? |
Quick message to say "Thank you @ormsbee!" and I owe a response and haven't gotten to it yet. Note: If someone wants to groom/pick this up and I haven't responded, please ask. Thanks. |
@ormsbee: [inform] Our team has other urgent priorities and will be taking a (hopefully) temporary break from event bus work for a while. I'm trying to determine if there are quick wins here we can complete now, and leave larger efforts until later (i.e. moved to new tickets). Here are the fields defined in OEP-41:
Noting here some potential updates to OEP-41:
Regarding Avro:
Regarding Kafka Protocol:
Here is a proposal:
Questions to be addressed now:
Sorry. This wasn't meant to be a question. I just meant that we should update our logging to include the headers. |
Quick note before I give a fuller response: the links from the OEP to the spec are broken because they moved the spec and I only pointed to the doc in the branch. But here's the latest, as far as I know: https://github.com/cloudevents/spec/blob/main/cloudevents/spec.md |
Thanks. This was more just an FYI about some upkeep on the OEP that anyone of us could make when we have the time. |
This is meant to describe the content type of the thing in the
Okay. My inclination would be to make it an explicit Django setting for now rather than do anything particularly clever.
I think we could call With respect to the time thing, if we don't have time to do what we think is the right thing, let's just drop it entirely for now. |
Maybe I'm just reading the Avro and Kafka CloudEvent specs wrong? In any case, I know I'm not clear. Also, we are currently using binary (which I know is not in OEP-42), so I don't think
I thought settings were shared across both, but hopefully I'm missing something.
Sounds simple enough.
I like the idea of a new extension field that always means the time the event was produced, but that's just a thought. Even this idea can be punted. |
Ah, sorry, I misunderstood what Binary Content Mode implied. I thought it would work like the Structured Content Mode example, where the
Ah, right. Maybe we could inspect |
|
Quick note on implementation: Ex of what we'll need to do for now:
|
@robrap |
|
@rgraber: On the logging PR for the consumer, how would you feel about adding a |
Which PR? This guy openedx/event-bus-kafka#87 ? |
@rgraber: I decided to create a different ticket for this. See edx/edx-arch-experiments#121. It probably makes sense to do this separately. |
Closing this issue as all the headers specified in the Acceptance Criteria are now present. |
I'm not sure if this was already noted above, but the metadata of the signal is already coded to have a "time" field that uses now(). See openedx-events/openedx_events/data.py Line 59 in 25b8471
|
Note that Kafka has a timestamp tuple on its messages, that tells the timestamp and how it was defined (source time, or broker receive time. See https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.Message.timestamp |
Acceptance Criteria:
workers
vswebapp
if challenging)ce_type(Done)minorversion (extension)(Deferred)ce_time(Deferred)Notes:
From https://github.com/openedx/openedx-events/blob/main/docs/decisions/0005-external-event-schema-format.rst:
Our current approach has been to generally follow the CloudEvents spec as we understand it where and when we need to make a choice. However, we are not putting attention to ensuring that we have a complete or purely correct implementation. It is unclear how to do so, and what the benefits would be of such an effort at this time.
For current references, see:
The purpose of this issue is to communicate the current approach. If someone wants to put in more effort to review/implement any gaps around this, that help is welcome.
FYI: @ormsbee: Since you were the author of the OEP. Note: if you have no issues with this, we can close this issue.
The text was updated successfully, but these errors were encountered: