-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21344][SQL] BinaryType comparison does signed byte array comparison #18571
Conversation
Test build #79381 has finished for PR 18571 at commit
|
Aren't byte comparisons signed though elsewhere? as well as other integer type comparison? this would become inconsistent. Maybe I'm missing the reason this should be different. |
IIUC, What do you think? |
I checked other dbms (dbms-like) systems handled binary as unassigned;
In some parts of the Spark implementation, they also handle binary as unassigned (https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java#L36). I feel, based on the principle of least astonishment, we'd be better to handle binary as unassigned in user-facing pieces. |
@maropu, thank you for your comment. |
|
||
test("SPARK-21344: BinaryType comparison does signed byte array comparison") { | ||
val b1 = Array[Byte](1) // 0x01 | ||
val b2 = Array[Byte](-1) // 0xff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @kiszk
Can we have another test case with more than one byte for code coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it looks better to check the boundary values like -128
and 127
.
Test build #79425 has finished for PR 18571 at commit
|
@@ -44,7 +44,7 @@ public static long getPrefix(byte[] bytes) { | |||
final int minLen = Math.min(bytes.length, 8); | |||
long p = 0; | |||
for (int i = 0; i < minLen; ++i) { | |||
p |= (128L + Platform.getByte(bytes, Platform.BYTE_ARRAY_OFFSET + i)) | |||
p |= ((long)Platform.getByte(bytes, Platform.BYTE_ARRAY_OFFSET + i) & 0xff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe we need a single space like ((long) Platform...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. done.
Test build #79436 has finished for PR 18571 at commit
|
Test build #79439 has finished for PR 18571 at commit
|
Jenkins, retest this please. |
Test build #79450 has finished for PR 18571 at commit
|
@gatorsmile could you please review this? |
SELECT x'00' < x'0f'
SELECT x'00' < x'ff' Could you add the above queries to a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Test build #79600 has finished for PR 18571 at commit
|
@gatorsmile thank you for your comment, done |
(Array[Byte](1, 1, 1, 1, 1, 1, 1, 1, 1), Array[Byte](1, 1, 1, 1, 1, 1, 1, 1, -1)) | ||
) | ||
data.foreach { case (b1, b2) => | ||
val rowOrdering = InterpretedOrdering.forSchema(Seq(BinaryType, BinaryType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, don't you just have one binary field for each row? Why you specify two fields to compare?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, updated.
BoundReference(1, BinaryType, nullable = true).asc :: Nil) | ||
val rowType = StructType( | ||
StructField("b1", BinaryType, nullable = true) :: | ||
StructField("b2", BinaryType, nullable = true) :: Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You specify two binary fields as row schema. But actually your input just has one binary field (i.e., Row(b1)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood, updated.
@@ -44,7 +44,7 @@ public static long getPrefix(byte[] bytes) { | |||
final int minLen = Math.min(bytes.length, 8); | |||
long p = 0; | |||
for (int i = 0; i < minLen; ++i) { | |||
p |= (128L + Platform.getByte(bytes, Platform.BYTE_ARRAY_OFFSET + i)) | |||
p |= ((long) Platform.getByte(bytes, Platform.BYTE_ARRAY_OFFSET + i) & 0xff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we don't have unit test for this. Shall we add a ByteArraySuite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the unit test for this. I added some cases.
Test build #79612 has finished for PR 18571 at commit
|
It sounds like all the comments from @viirya have been resolved. Will merge it tonight. Thanks! |
LGTM |
…rison ## What changes were proposed in this pull request? This PR fixes a wrong comparison for `BinaryType`. This PR enables unsigned comparison and unsigned prefix generation for an array for `BinaryType`. Previous implementations uses signed operations. ## How was this patch tested? Added a test suite in `OrderingSuite`. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #18571 from kiszk/SPARK-21344. (cherry picked from commit ac5d5d7) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
…rison ## What changes were proposed in this pull request? This PR fixes a wrong comparison for `BinaryType`. This PR enables unsigned comparison and unsigned prefix generation for an array for `BinaryType`. Previous implementations uses signed operations. ## How was this patch tested? Added a test suite in `OrderingSuite`. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #18571 from kiszk/SPARK-21344. (cherry picked from commit ac5d5d7) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
…rison ## What changes were proposed in this pull request? This PR fixes a wrong comparison for `BinaryType`. This PR enables unsigned comparison and unsigned prefix generation for an array for `BinaryType`. Previous implementations uses signed operations. ## How was this patch tested? Added a test suite in `OrderingSuite`. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #18571 from kiszk/SPARK-21344. (cherry picked from commit ac5d5d7) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Thanks! Merging to master/2.2/2.1/2.0 |
…rison ## What changes were proposed in this pull request? This PR fixes a wrong comparison for `BinaryType`. This PR enables unsigned comparison and unsigned prefix generation for an array for `BinaryType`. Previous implementations uses signed operations. ## How was this patch tested? Added a test suite in `OrderingSuite`. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#18571 from kiszk/SPARK-21344. (cherry picked from commit ac5d5d7) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
…rison This PR fixes a wrong comparison for `BinaryType`. This PR enables unsigned comparison and unsigned prefix generation for an array for `BinaryType`. Previous implementations uses signed operations. Added a test suite in `OrderingSuite`. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#18571 from kiszk/SPARK-21344. (cherry picked from commit ac5d5d7) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
This PR fixes a wrong comparison for
BinaryType
. This PR enables unsigned comparison and unsigned prefix generation for an array forBinaryType
. Previous implementations uses signed operations.How was this patch tested?
Added a test suite in
OrderingSuite
.