Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable SortPreservingMergeStream #1686

Closed
alamb opened this issue Jan 26, 2022 · 0 comments · Fixed by #1687
Closed

Stable SortPreservingMergeStream #1686

alamb opened this issue Jan 26, 2022 · 0 comments · Fixed by #1687
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jan 26, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

                                                                                 
┌─────────────────────────┐  Merged Stream                                       
│ ┌───┬───┬───┬───┐       │                 ┌───────────────────────────────┐    
│ │ A │ B │ C │ D │ ...   │────────┐        │ ┌───┬───╦═══╦───┬───╦═══╗     │    
│ └───┴───┴───┴───┘       │        ├───────▶│ │ A │ B ║ B ║ C │ D ║ E ║ ... │    
└─────────────────────────┘        │        │ └───┴─▲─╩═══╩───┴───╩═══╝     │    
  Stream 1                         │        └───────────────────────────────┘    
                                   │                └ ─ ─ ─ ─ ─ ─ ┐              
                                   │                                             
┌─────────────────────────┐        │                              │              
│ ╔═══╦═══╗               │        │                                             
│ ║ B ║ E ║     ...       │────────┘                   Stable Sort: the merged   
│ ╚═══╩═══╝               │                         stream places equal rows from
└─────────────────────────┘                          stream 1 *before* stream 2  
  Stream 2                                                                       
                                                                                 

Thus, if stream 1 and stream 2 had the same record values, it was guaranteed that the row from stream1 came out before stream 2. IOx uses this property correctly deduplicate updates (as streams with a larger index were inserted later).

IOx uses this property correctly deduplicate updates (as streams with a larger index were inserted later).

SortPreservingMergeStream appears to no longer be stable after #1596 was merged Specifically, the previous implementation of SortPreservingMergeSteam would always pick inputs with a lower 'stream index' when the sort keys were tied.

Describe the solution you'd like
I would like SortPreservingMergeStream to be stable

Describe alternatives you've considered
We could add a synthetic "column" to our input and then add it to the sort key, but this seems like a larger overhead than necessary

Additional context
I have a simple fix for this. Will get it up shortly.

cc @yjshen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant