-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix full outer result mismatch issue when output contains multiple matching rows #11068
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
28c8a14
to
8da6dd9
Compare
@JkSelf please let me know when it is ready for review. Also, please add summary explaining the exact problem and fix. Thanks! |
b13b661
to
30158db
Compare
@JkSelf thanks for looking into this. I'm still having a hard time to follow the code, what exactly was broken and how the proposed change addresses it. Could you elaborate more and maybe provide a few examples in the summary? We would also need to update the documentation of the structures and classes in the header file; if you update the documentation of the parts you are changing it may make it easier to review. |
30158db
to
c20a3f5
Compare
@pedroerp Add examples to describe the issue in the summary. And also add the comments in header file. Can you help to review again? Thanks. |
Assume the left table has columns a and b:
The right table has columns c and d:
The two tables are joined using a full outer join on the condition a == c and b < d. During the doGetOutput phase, the result is matched using a left join, resulting in 3 * 4 = 12 records:
Then, in the filter method, the records are filtered based on the condition b < d, resulting in the following:
Finally, records from the left table that do not have a match are filled with nulls, resulting in the following final output:
The above result is incorrect because it is missing rows from the right table that do not have a match. Among the 12 rows above, rows 0, 4, and 8 correspond to the first record (2, 3) from the right table, rows 1, 5, and 9 correspond to the second record (2, -1) from the right table, rows 2, 6, and 10 correspond to the third record (2, -1) from the right table, and rows 3, 7, and 11 correspond to the fourth record (2, 3) from the right table. From the matching results above, rows 1, 5, and 9, as well as rows 2, 6, and 10, are all false, meaning that the third and fourth records from the right table do not have matching rows. Therefore, the final result is missing rows from the right table that do not have matches. The correct final result should be:
This PR calls the filter function when the keys are the same to filter out rows from the right table that do not have matches. If a row from the right table does not have a match, a new record is inserted with the corresponding columns from the left table set to null.