Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix full outer result mismatch issue when output contains multiple matching rows #11068

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

JkSelf
Copy link
Collaborator

@JkSelf JkSelf commented Sep 23, 2024

Assume the left table has columns a and b:

a       b
2	100
2	1
2	1

The right table has columns c and d:

c       d
2	3
2	-1
2	-1
2	3

The two tables are joined using a full outer join on the condition a == c and b < d. During the doGetOutput phase, the result is matched using a left join, resulting in 3 * 4 = 12 records:

No      a        b       c      d
0	2	100	 2	3
1	2	100	 2	-1
2	2	100	 2	-1
3	2	100	 2	3
4	2	1	 2	3
5	2	1	 2	-1
6	2	1	 2	-1
7	2	1	 2	3
8	2	1	 2	3
9	2	1	 2	-1
10	2	1	 2	-1
11	2	1	 2	3

Then, in the filter method, the records are filtered based on the condition b < d, resulting in the following:

No	a	b	c	d	matched
0	2	100	2	3	FALSE
1	2	100	2	-1	FALSE
2	2	100	2	-1	FALSE
3	2	100	2	3	FALSE
4	2	1	2	3	TRUE
5	2	1	2	-1	FALSE
6	2	1	2	-1	FALSE
7	2	1	2	3	TRUE
8	2	1	2	3	TRUE
9	2	1	2	-1	FALSE
10	2	1	2	-1	FALSE
11	2	1	2	3	TRUE

Finally, records from the left table that do not have a match are filled with nulls, resulting in the following final output:

No	a	b	c	 d
0	2	100	null null
1	2	1	2	 3
2	2	1	2	 3
3	2	1	2	 3
4	2	1	2	 3

The above result is incorrect because it is missing rows from the right table that do not have a match. Among the 12 rows above, rows 0, 4, and 8 correspond to the first record (2, 3) from the right table, rows 1, 5, and 9 correspond to the second record (2, -1) from the right table, rows 2, 6, and 10 correspond to the third record (2, -1) from the right table, and rows 3, 7, and 11 correspond to the fourth record (2, 3) from the right table. From the matching results above, rows 1, 5, and 9, as well as rows 2, 6, and 10, are all false, meaning that the third and fourth records from the right table do not have matching rows. Therefore, the final result is missing rows from the right table that do not have matches. The correct final result should be:

No	a	b	c	 d
0	2	100	null   null
1	2	1	2	 3
2	2	1	2	 3
3	2	1	2	 3
4	2	1	2	 3
5       null    null    2       -1
6       null    null    2       -1

This PR calls the filter function when the keys are the same to filter out rows from the right table that do not have matches. If a row from the right table does not have a match, a new record is inserted with the corresponding columns from the left table set to null.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 23, 2024
Copy link

netlify bot commented Sep 23, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit f634a8c
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/670a3af2a663520008d8e9e9

@pedroerp
Copy link
Contributor

@JkSelf please let me know when it is ready for review. Also, please add summary explaining the exact problem and fix. Thanks!

@JkSelf JkSelf force-pushed the full-outer-result-mismatch branch 2 times, most recently from b13b661 to 30158db Compare September 25, 2024 07:48
@JkSelf
Copy link
Collaborator Author

JkSelf commented Sep 25, 2024

@JkSelf please let me know when it is ready for review. Also, please add summary explaining the exact problem and fix. Thanks!

@pedroerp This PR is ready to review. And i also add the problem and fix description. Can you help to review again? Thanks.

@JkSelf JkSelf changed the title [WIP] Fix full outer result mismatch Fix full outer result mismatch issue Sep 25, 2024
@pedroerp
Copy link
Contributor

pedroerp commented Oct 1, 2024

@JkSelf thanks for looking into this. I'm still having a hard time to follow the code, what exactly was broken and how the proposed change addresses it. Could you elaborate more and maybe provide a few examples in the summary?

We would also need to update the documentation of the structures and classes in the header file; if you update the documentation of the parts you are changing it may make it easier to review.

@JkSelf JkSelf changed the title Fix full outer result mismatch issue Fix full outer result mismatch issue when output contains multiple matching rows Oct 9, 2024
@JkSelf
Copy link
Collaborator Author

JkSelf commented Oct 9, 2024

@JkSelf thanks for looking into this. I'm still having a hard time to follow the code, what exactly was broken and how the proposed change addresses it. Could you elaborate more and maybe provide a few examples in the summary?

We would also need to update the documentation of the structures and classes in the header file; if you update the documentation of the parts you are changing it may make it easier to review.

@pedroerp Add examples to describe the issue in the summary. And also add the comments in header file. Can you help to review again? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants