Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Evolution] Discovery: Schema Evolution #53

Open
3 tasks
robrap opened this issue Aug 4, 2022 · 3 comments
Open
3 tasks

[Evolution] Discovery: Schema Evolution #53

robrap opened this issue Aug 4, 2022 · 3 comments
Labels
event-bus Work related to the Event Bus.

Comments

@robrap
Copy link
Contributor

robrap commented Aug 4, 2022

Important: It is possible we should implement #53 (comment) to protect against unexpected schema evolution, before completing this larger effort to ensure it can be done safely.

Note: Although this work is important, it is being deferred until we have the problem of having events to evolve. Our use of a schema registry enables this work, but this ticket is for getting into the nuts and bolts of proper configuration, processes, testing, docs, etc. to make this all work.

The following are a list of questions around best practices for Schema Evolution.

A/C

  • Answer the following questions:
    • [ ]
  • Ensure our topics meet our chosen compatibility setting, or create an issue for that work
  • Create issues for any other necessary work (if there is any)

Open Questions:

  • What else is required to document and/or learn about our use of the Schema Registry?
  • What is our default schema compatibility configuration? Is it what we want?
    • Can we change the configuration in terraform without destroying existing topics? If not, we can do this manually
  • With the chosen configuration, do we have enough flexibility for rollout out changes to publisher and consumers, or do we need to be able to handle optional fields?
    • Note: We have since added some capabilities for optional fields, and there is an ADR on the topic.
  • What docs are needed to help someone translate the compatibility rules to acceptable changes to an attr object and OpenEdxPublicSignals data dict?
  • Do we need to handle compatibility rules in OpenEdxPublicSignal validation that matches Avro rules?
  • Assuming we ultimately need to support multiple events per topic (e.g. life cycle events like create, update, etc. that require ordering).
    • How do we properly configure the schemas?
    • How would we introduce a new event to an existing topic? What should a consumer do if it doesn't yet recognize the new event?
    • How would we introduce a backward-incompatible version of an event? Typically, this would go to a new topic. But this single topic is being used for ordering.
    • Does the introduction of a new backward-incompatible version of an event become any more complicated when one listener is covering multiple related/ordered events?
  • What will we need to do to make this work for Open edX deployments, which usually take big-step upgrades from one open-release to the next?
    • ~~Does skipping several schema versions during an upgrade mean that a different compatibility setting is required? ~~
      • named releases are a separate concern , leave it to Axim
    • Documentation for how to handle Kafka/other event bus tech during upgrades, e.g. "let topics drain before upgrading".
  • How should OEP-42's minorversion field requirement be implemented for backward-compatible changes?

To be confirmed: Confluent’s Schema Registry will not necessarily tell you or warn if you change the event schema for a topic. It will just evolve the schema and move on. This opens the possibility of error if someone accidentally updates a schema on the producing side without a corresponding update on the consumer side.

  • Do we need to specially handle this situation, or potential errors that may result?
  • Will our settings help keep us from creating errors in this situation?
  • Do we want to detect and notify when a newer version is available, and recommend upgrades?

Note: This ticket was copied/moved from original private ticket: https://2u-internal.atlassian.net/browse/ARCHBOM-2013.

@robrap robrap added event-bus Work related to the Event Bus. backlog To be put on a team's backlog or wishlist labels Aug 4, 2022
@robrap robrap added this to the [Event Bus] Future milestone Aug 4, 2022
@robrap robrap changed the title Discovery: Schema Evolution [Defer] Discovery: Schema Evolution Aug 9, 2022
@robrap
Copy link
Contributor Author

robrap commented Oct 20, 2022

  1. There is this ADR: https://github.com/openedx/openedx-events/blob/main/docs/decisions/0006-event-schema-serialization-and-evolution.rst. It selects “Forward” or “Forward Transitive”, but is outdated, because it was written before we had optional fields. It should probably be updated, and I think we may switch to "Full Transitive", which is documented as a future desire.
  2. How will we transition configs?

@robrap
Copy link
Contributor Author

robrap commented Oct 20, 2022

When trying to remove an optional field from a data class, we started getting an error. See this discussion for more details: openedx/openedx-events#131 (comment). We'll need some sort of workaround for this.

FYI: @rgraber: Feel free to provide any additional details if you wish. Thanks.

@robrap
Copy link
Contributor Author

robrap commented Jan 13, 2023

[idea] Add snapshot unit test where we are testing Avro processing for all events. The idea would be to have a script (or something) that would run the schema generation for all event definitions, and would produce a datastructure with the schema as string for each event. Then, when testing the Avro serialization/deserialization process, we'd also compare against this snapshot. If the snapshot is changing in a known way for a known reason, then the snapshot could simply be updated. If not, we caught some undesired change, which may or may not be related to an unexpected schema evolution.

Note that the comments for that test should also direct users back to this ticket while discovery still has not been completed.

@robrap robrap removed the status in Arch-BOM Mar 23, 2023
@robrap robrap removed the backlog To be put on a team's backlog or wishlist label Apr 20, 2023
@robrap robrap moved this to Prioritized in Arch-BOM Apr 21, 2023
@robrap robrap removed this from the [Event Bus] Future milestone Jun 8, 2023
@robrap robrap removed the status in Arch-BOM Jun 12, 2023
@jristau1984 jristau1984 moved this to Backlog in Arch-BOM Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
event-bus Work related to the Event Bus.
Projects
Status: Backlog
Development

No branches or pull requests

1 participant