Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data types update - implement protocol versioning (upgrades) #17639

Closed
edgao opened this issue Oct 5, 2022 · 8 comments · Fixed by #19240
Closed

data types update - implement protocol versioning (upgrades) #17639

edgao opened this issue Oct 5, 2022 · 8 comments · Fixed by #19240
Assignees
Labels
team/destinations Destinations team's backlog

Comments

@edgao
Copy link
Contributor

edgao commented Oct 5, 2022

after #17486 is accepted, we can implement the protocol versioning migrations. This issue is to implement the upgrade logic.

  • Catalogs can be up/downgraded easily, using this mapping of new <> old types types: (note that the old types sometimes use format in addition to airbyte_type)
    • string <> {"type": "string"}
    • binary data <> {"type": "string"}
      • the older version of the protocol has no concept of binary data; sources simply produce base64-encoded strings
    • date <> {"type": "string", "airbyte_type": "date", "format": "date"}
    • timestamp_with_timezone <> {"type": "string", "airbyte_type": "timestamp_with_timezone", "format": "date-time"}
    • timestamp_without_timezone <> {"type": "string", "airbyte_type": "timestamp_without_timezone", "format": "date-time"}
    • time_with_timezone <> {"type": "string", "airbyte_type": "time_with_timezone", "format": "time"}
    • time_without_timezone <> {"type": "string", "airbyte_type": "time_without_timezone", "format": "time"}
    • number <> {"type": "number"}
    • integer <> {"type": "integer"}
      • some sources (generally DB sources) produce {"type": number", "airbyte_type": "integer"} for historical reasons, but normalization can handle both formats, and type: integer is more descriptive.
    • boolean <> {"type": "boolean"}
  • messages are also up/downgradable based on the catalog:
    • If an old-style record has a numeric field, it can be upgraded by wrapping that field in a string
    • And if an old-style message has a numeric field, it can be downgraded by unwrapping that field
      • If this results in an error (buggy source provides a non-numeric string) then just leave the value unchanged
        • This matches existing behavior, where non-schema-conforming values are passed through without error
      • If the field has an unusual value (i.e. Infinity/-Infinity/NaN) then it should be nulled (i.e. just remove the field from the record completely)
        • This matches existing behavior in e.g. source-postgres, where values that the old protocol couldn't support were just dropped from the record.

As part of this issue, we should update https://github.com/airbytehq/airbyte/blob/master/airbyte-protocol/protocol-models/src/main/java/io/airbyte/protocol/models/JsonSchemaType.java with the new types, and tag the old-style schema declarations as @Deprecated. They'll still need to exist for the sake of unupdated connectors, but all new connector development should use new-style declarations.

@edgao
Copy link
Contributor Author

edgao commented Oct 6, 2022

@edgao ask platform folks about state of protocol versioning, is it ready for use

@edgao
Copy link
Contributor Author

edgao commented Oct 19, 2022

from jimmy:

The core of the migration piece is there, I am plugging it where needed right now
14:11
Most if it lives: https://github.com/airbytehq/airbyte/tree/master/airbyte-commons-protocol/src/main/java/io/airbyte/commons/protocol
What to look at:
about the migrations itself: https://github.com/airbytehq/airbyte/blob/master/airbyte-commons-protocol/src/main[…]byte/commons/protocol/migrations/AirbyteMessageMigrationV0.java
serialization/deserialization should be mostly https://github.com/airbytehq/airbyte/blob/master/airbyte-commons-protocol/src/main[…]/airbyte/commons/protocol/serde/AirbyteMessageV0Serializer.java
for the first actual iteration, I should be around to help to get it battle tested

message migrator test

@edgao
Copy link
Contributor Author

edgao commented Oct 20, 2022

@edgao split into upgrade/downgrade migration implementation

@edgao edgao changed the title data types update - implement protocol versioning data types update - implement protocol versioning (upgrades) Oct 20, 2022
@edgao
Copy link
Contributor Author

edgao commented Oct 27, 2022

@edgao figure out how to use the e2e up/downgrade test tool here

@grishick
Copy link
Contributor

grishick commented Nov 8, 2022

Does this ticket depend on this epic being done (or any part of it being done): #14276 ?

@edgao
Copy link
Contributor Author

edgao commented Nov 8, 2022

soft blocker: #16814 - I'll need to switch a few v0 to v1 in my code once this is completed, since we're building the 1.0.0 -> 1.1.0 migration. But nothing preventing me from starting implementation.

other than that, none of the open issues are relevant to this issue.

@edgao
Copy link
Contributor Author

edgao commented Jan 5, 2023

current state - #19240 has working migrations, but need to get platform acceptance tests to pass (there's a ton of protocol-related changes in there, apparently). Then we can merge this into the superbracnch (#20036)

@edgao
Copy link
Contributor Author

edgao commented Jan 11, 2023

merged into #20036; closing

@edgao edgao closed this as completed Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/destinations Destinations team's backlog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants