-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33524][SQL][TESTS] Change InMemoryTable
not to use Tuple.hashCode for BucketTransform
#30477
Conversation
BucketTransform
not to use Tuple.hashCode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for tracking this down, @dongjoon-hyun!
Thank you, @rdblue ! |
BucketTransform
not to use Tuple.hashCode.BucketTransform
not to use Tuple.hashCode.
BucketTransform
not to use Tuple.hashCode.InMemoryTable
not to use Tuple.hashCode for BucketTransform
(extractor(ref.fieldNames, schema, row).hashCode() & Integer.MAX_VALUE) % numBuckets | ||
val (value, dataType) = extractor(ref.fieldNames, schema, row) | ||
val valueHashCode = if (value == null) 0 else value.hashCode | ||
((valueHashCode + dataType.hashCode()) & Integer.MAX_VALUE) % numBuckets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine. One common hashCode pattern is a + 31 * b, in the JVM source, FWIW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, @srowen . I'll update like that.
Test build #131576 has finished for PR 30477 at commit
|
The last change affects only
|
I'll merge this. Thank you, @rdblue , @HyukjinKwon , @srowen . |
…hCode for `BucketTransform` This PR aims to change `InMemoryTable` not to use `Tuple.hashCode` for `BucketTransform`. SPARK-32168 made `InMemoryTable` to handle `BucketTransform` as a hash of `Tuple` which is dependents on Scala versions. - https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala#L159 **Scala 2.12.10** ```scala $ bin/scala Welcome to Scala 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_272). Type in expressions for evaluation. Or try :help. scala> (1, 1).hashCode res0: Int = -2074071657 ``` **Scala 2.13.3** ```scala Welcome to Scala 2.13.3 (OpenJDK 64-Bit Server VM, Java 1.8.0_272). Type in expressions for evaluation. Or try :help. scala> (1, 1).hashCode val res0: Int = -1669302457 ``` Yes. This is a correctness issue. Pass the UT with both Scala 2.12/2.13. Closes #30477 from dongjoon-hyun/SPARK-33524. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 8380e00) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR aims to change
InMemoryTable
not to useTuple.hashCode
forBucketTransform
.Why are the changes needed?
SPARK-32168 made
InMemoryTable
to handleBucketTransform
as a hash ofTuple
which is dependents on Scala versions.Scala 2.12.10
Scala 2.13.3
Does this PR introduce any user-facing change?
Yes. This is a correctness issue.
How was this patch tested?
Pass the UT with both Scala 2.12/2.13.