You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
┌─────────────────────────┐ Merged Stream
│ ┌───┬───┬───┬───┐ │ ┌───────────────────────────────┐
│ │ A │ B │ C │ D │ ... │────────┐ │ ┌───┬───╦═══╦───┬───╦═══╗ │
│ └───┴───┴───┴───┘ │ ├───────▶│ │ A │ B ║ B ║ C │ D ║ E ║ ... │
└─────────────────────────┘ │ │ └───┴─▲─╩═══╩───┴───╩═══╝ │
Stream 1 │ └───────────────────────────────┘
│ └ ─ ─ ─ ─ ─ ─ ┐
│
┌─────────────────────────┐ │ │
│ ╔═══╦═══╗ │ │
│ ║ B ║ E ║ ... │────────┘ Stable Sort: the merged
│ ╚═══╩═══╝ │ stream places equal rows from
└─────────────────────────┘ stream 1 *before* stream 2
Stream 2
Thus, if stream 1 and stream 2 had the same record values, it was guaranteed that the row from stream1 came out before stream 2. IOx uses this property correctly deduplicate updates (as streams with a larger index were inserted later).
IOx uses this property correctly deduplicate updates (as streams with a larger index were inserted later).
SortPreservingMergeStream appears to no longer be stable after #1596 was merged Specifically, the previous implementation of SortPreservingMergeSteam would always pick inputs with a lower 'stream index' when the sort keys were tied.
Describe the solution you'd like
I would like SortPreservingMergeStream to be stable
Describe alternatives you've considered
We could add a synthetic "column" to our input and then add it to the sort key, but this seems like a larger overhead than necessary
Additional context
I have a simple fix for this. Will get it up shortly.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Thus, if stream 1 and stream 2 had the same record values, it was guaranteed that the row from stream1 came out before stream 2. IOx uses this property correctly deduplicate updates (as streams with a larger index were inserted later).
IOx uses this property correctly deduplicate updates (as streams with a larger index were inserted later).
SortPreservingMergeStream
appears to no longer be stable after #1596 was merged Specifically, the previous implementation ofSortPreservingMergeSteam
would always pick inputs with a lower 'stream index' when the sort keys were tied.Describe the solution you'd like
I would like
SortPreservingMergeStream
to be stableDescribe alternatives you've considered
We could add a synthetic "column" to our input and then add it to the sort key, but this seems like a larger overhead than necessary
Additional context
I have a simple fix for this. Will get it up shortly.
cc @yjshen
The text was updated successfully, but these errors were encountered: