Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35791][SQL] Release on-going map properly for NULL-aware ANTI join #32939

Closed
wants to merge 1 commit into from

Conversation

c21
Copy link
Contributor

@c21 c21 commented Jun 16, 2021

What changes were proposed in this pull request?

NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for HashedRelation, and will immediately return HashedRelationWithAllNullKeys without taking care of the map built already. Before returning HashedRelationWithAllNullKeys, the map needs to be freed properly to save memory and keep memory accounting correctly.

Why are the changes needed?

Save memory and keep memory accounting correctly for the join query.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit tests introduced in #29104 .

@github-actions github-actions bot added the SQL label Jun 16, 2021
@c21
Copy link
Contributor Author

c21 commented Jun 17, 2021

cc @cloud-fan could you help take a look when you have time? Thanks.

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44422/

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Test build #139892 has finished for PR 32939 at commit d944e92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -479,6 +479,7 @@ private[joins] object UnsafeHashedRelation {
throw QueryExecutionErrors.cannotAcquireMemoryToBuildUnsafeHashedRelationError()
}
} else if (isNullAware) {
binaryMap.free()
return HashedRelationWithAllNullKeys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we return it before creating the binary map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan - the NULL-aware ANTI join will be done if there's row with NULL join keys. Note the if (!key.anyNull || allowsNullKey) at line 472. So it will be taking effect when we reading the row with NULL join keys, and this information is not known before creating the binary map

@@ -1060,6 +1061,7 @@ private[joins] object LongHashedRelation {
val key = rowKey.getLong(0)
map.append(key, unsafeRow)
} else if (isNullAware) {
map.free()
return HashedRelationWithAllNullKeys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.1!

@cloud-fan cloud-fan closed this in e0d81d9 Jun 17, 2021
cloud-fan pushed a commit that referenced this pull request Jun 17, 2021
…join

### What changes were proposed in this pull request?

NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for `HashedRelation`, and will immediately return `HashedRelationWithAllNullKeys` without taking care of the map built already. Before returning `HashedRelationWithAllNullKeys`, the map needs to be freed properly to save memory and keep memory accounting correctly.

### Why are the changes needed?

Save memory and keep memory accounting correctly for the join query.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests introduced in #29104 .

Closes #32939 from c21/free-null-aware.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e0d81d9)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@c21
Copy link
Contributor Author

c21 commented Jun 17, 2021

Thank you @cloud-fan for quick review!

@c21 c21 deleted the free-null-aware branch June 17, 2021 05:58
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…join

### What changes were proposed in this pull request?

NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for `HashedRelation`, and will immediately return `HashedRelationWithAllNullKeys` without taking care of the map built already. Before returning `HashedRelationWithAllNullKeys`, the map needs to be freed properly to save memory and keep memory accounting correctly.

### Why are the changes needed?

Save memory and keep memory accounting correctly for the join query.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests introduced in apache#29104 .

Closes apache#32939 from c21/free-null-aware.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e0d81d9)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
…join

### What changes were proposed in this pull request?

NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for `HashedRelation`, and will immediately return `HashedRelationWithAllNullKeys` without taking care of the map built already. Before returning `HashedRelationWithAllNullKeys`, the map needs to be freed properly to save memory and keep memory accounting correctly.

### Why are the changes needed?

Save memory and keep memory accounting correctly for the join query.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests introduced in apache#29104 .

Closes apache#32939 from c21/free-null-aware.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e0d81d9)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants