-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35791][SQL] Release on-going map properly for NULL-aware ANTI join #32939
Conversation
cc @cloud-fan could you help take a look when you have time? Thanks. |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #139892 has finished for PR 32939 at commit
|
@@ -479,6 +479,7 @@ private[joins] object UnsafeHashedRelation { | |||
throw QueryExecutionErrors.cannotAcquireMemoryToBuildUnsafeHashedRelationError() | |||
} | |||
} else if (isNullAware) { | |||
binaryMap.free() | |||
return HashedRelationWithAllNullKeys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we return it before creating the binary map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - the NULL-aware ANTI join will be done if there's row with NULL join keys. Note the if (!key.anyNull || allowsNullKey)
at line 472. So it will be taking effect when we reading the row with NULL join keys, and this information is not known before creating the binary map
@@ -1060,6 +1061,7 @@ private[joins] object LongHashedRelation { | |||
val key = rowKey.getLong(0) | |||
map.append(key, unsafeRow) | |||
} else if (isNullAware) { | |||
map.free() | |||
return HashedRelationWithAllNullKeys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
thanks, merging to master/3.1! |
…join ### What changes were proposed in this pull request? NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for `HashedRelation`, and will immediately return `HashedRelationWithAllNullKeys` without taking care of the map built already. Before returning `HashedRelationWithAllNullKeys`, the map needs to be freed properly to save memory and keep memory accounting correctly. ### Why are the changes needed? Save memory and keep memory accounting correctly for the join query. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests introduced in #29104 . Closes #32939 from c21/free-null-aware. Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e0d81d9) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Thank you @cloud-fan for quick review! |
…join ### What changes were proposed in this pull request? NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for `HashedRelation`, and will immediately return `HashedRelationWithAllNullKeys` without taking care of the map built already. Before returning `HashedRelationWithAllNullKeys`, the map needs to be freed properly to save memory and keep memory accounting correctly. ### Why are the changes needed? Save memory and keep memory accounting correctly for the join query. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests introduced in apache#29104 . Closes apache#32939 from c21/free-null-aware. Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e0d81d9) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…join ### What changes were proposed in this pull request? NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for `HashedRelation`, and will immediately return `HashedRelationWithAllNullKeys` without taking care of the map built already. Before returning `HashedRelationWithAllNullKeys`, the map needs to be freed properly to save memory and keep memory accounting correctly. ### Why are the changes needed? Save memory and keep memory accounting correctly for the join query. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests introduced in apache#29104 . Closes apache#32939 from c21/free-null-aware. Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e0d81d9) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for
HashedRelation
, and will immediately returnHashedRelationWithAllNullKeys
without taking care of the map built already. Before returningHashedRelationWithAllNullKeys
, the map needs to be freed properly to save memory and keep memory accounting correctly.Why are the changes needed?
Save memory and keep memory accounting correctly for the join query.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing unit tests introduced in #29104 .