Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CloudEvents] Field ce_time for event bus events #159

Closed
Tracked by #183
robrap opened this issue Dec 15, 2022 · 3 comments
Closed
Tracked by #183

[CloudEvents] Field ce_time for event bus events #159

robrap opened this issue Dec 15, 2022 · 3 comments
Assignees
Labels
event-bus Work related to the Event Bus.

Comments

@robrap
Copy link
Contributor

robrap commented Dec 15, 2022

This is a continuation of #77, where we decided to defer the CloudEvent field ce_time, as defined in OEP-41.

@robrap robrap added event-bus Work related to the Event Bus. backlog To be put on a team's backlog or wishlist labels Dec 15, 2022
@robrap robrap added this to the [Event Bus] Future milestone Dec 15, 2022
@robrap robrap added this to Arch-BOM Dec 15, 2022
@robrap robrap moved this to Todo in Arch-BOM Dec 15, 2022
@robrap robrap changed the title [CloudEvents] Deferred CloudEvents for event bus events [CloudEvents] Field ce_time for event bus events Dec 23, 2022
@robrap
Copy link
Contributor Author

robrap commented Jan 5, 2023

There are a variety of times related to event production:

  • DB created_at time: In the case that an event is related to inserting data in the database, the created_at time is one related time. OEP-41 recommends this as the primary option for the CloudEvent time field.
  • DB updated_at time: In the case that an event is related to updating data in the database, the updated_at time is another possibility. I don't think this time is mentioned anywhere.
  • Event signal current time: This would be the equivalent of calling datetime.now() in the code that initiates sending the event signal.
  • Event bus signal handler current time: This would be the equivalent of calling datetime.now() in the code that initiates sending the signal to the event bus. It is doubtful that we will use both these "current times", and will instead probably choose one and document, if it is used at all.
  • Broker received event time: This is the time that the event broker (e.g. Kafka) received the event from the Producer. No consumer would be able to have consumed the event before this time, but it could be delayed from the earlier "current times" due to a variety of reasons. For example, maybe the broker was down, and retries were required. Or, maybe the outbox pattern is introduced, and the outbox gets backed up.

For the purposes of the rest of this discussion, we will ignore any times related to consuming the event. Any consumer can call datetime.now() when required, and at this time, we have no need to provide this as an attribute on the event.

According to the OEP, we must implement a "time" field that could be either db_created_at_time or signal_produced_time. Note: I will ignore the other current time possibilities. As a consumer, possibly the only way to know which of these you are getting for a certain event would be to read the docs for the specific event.

As the OEP states, using the created_at time is ideal for the consumer for a variety of reasons. However, it is more challenging to implement. Note: maybe this is no big deal for legacy event and I just need to better understand?

My main concern is holding up the ability to pass along any of these times, being blocked on how long it would take to implement the db_created_at_time solution, and knowing that once a ce_time header is added, it should not be changed without causing a major version change. Additionally, this is not a one time cost for our single event. This will come up for any new events (and especially for legacy event?). I'd like to consider sending a signal_produced_time header that will always mean the same thing, is simple to implement, and may or may not differ from ce_time. This could enable us to defer implementing ce_time, if we continue to be stuck on this.

Separately, confluent-kafka offers a tuple for its timestamp that provides both the time and a type of the time. We could consider enhancing the ce_time header with that information in a separate custom header. See https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.Message.timestamp. This is not a priority. I just thought it was interesting.

@robrap
Copy link
Contributor Author

robrap commented Jan 6, 2023

As an example of implementing the production timestamp, here is the code where we get the course for the event bus event data: https://github.com/openedx/edx-platform/blob/0b68c84286fe53c32b6b8c1e01ce30845bb8496c/cms/djangoapps/contentstore/signals/handlers.py#L85. I don't see how to get the db dates for that course. Maybe others know? Or, do we think making that data available (here and in all cases) should be a blocker to sending any new event?

I'm not certain of the answer, which is why I am open to punting on this and instead using a specifically named and easily implemented field that comes from datetime.now(). It could be documented as not necessarily matching the DB time of the related data.

@robrap
Copy link
Contributor Author

robrap commented Jan 6, 2023

UPDATE: @ormsbee and I discussed and he provided me with a solution for the DB time of this event: course.subtree_edited_on. So, we will update to pass through this event time. Related, OpenEdxPublicSignal.send_event() and new send_event_to_bus will need an optional time argument to use as the metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
event-bus Work related to the Event Bus.
Projects
None yet
Development

No branches or pull requests

2 participants