Make consistent behavior on zeros equality on floating point types #3510

viirya · 2023-01-11T18:50:46Z

Which issue does this PR close?

Closes #3509.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

tustvold · 2023-01-11T20:42:31Z

I'm not sure about this, as it means we no longer are comparing with respect to a standard predicate but one of our own devising. Why special case zero, and not other values like NaNs?

I'd also be interested to know what impact this has on benchmarks.

FWIW If we do make this change, we will need to make changes to normalise within the row format, along with potentially in other places also. Nothing insurmountable, just noting it

tustvold · 2023-01-11T20:43:50Z

arrow-ord/src/ord.rs

 {
    let left: PrimitiveArray<T> = PrimitiveArray::from(left.data().clone());
    let right: PrimitiveArray<T> = PrimitiveArray::from(right.data().clone());
-    Box::new(move |i, j| left.value(i).cmp(&right.value(j)))
+    Box::new(move |i, j| left.value(i).compare(right.value(j)))


👍 regardless this is a good change

viirya · 2023-01-11T21:10:16Z

NaNs are treated as equal by total ordering. I guess total ordering needs to give a comprehensive ordering for possible floating point values. But in practice computation, we don't actually separate positive and negative zeros.

tustvold · 2023-01-11T21:39:11Z

NaNs are treated as equal by total ordering

Not the ordering we use currently, they're ordered based on their constituent bits, NaNs with different byte representations will not compare equal

viirya · 2023-01-11T23:00:30Z

Not the ordering we use currently, they're ordered based on their constituent bits, NaNs with different byte representations will not compare equal

We have NaN equality test to verify that they are equal. I also did a quick verification in rust playground:

fn main() {
    let a = f32::NAN;
    let b = f32::NAN;
    
    println!("a == b: {}", a.to_bits() == b.to_bits());
}

Output:

a == b: true

tustvold · 2023-01-12T00:57:16Z

f32::NaN always returns the same NaN bytes, if you get a NaN by other means such that they have different bit representations you will see the difference

Edit: in fact comparing NaN with -NaN will probably show this

viirya · 2023-01-12T01:16:27Z

I see. That explains why these NaNs are equal. I roughly remember that from JVM experience NaN values' bits are different so I was a bit surprised to see they are equal in above test/play-ground. If there are other bit patterns in Rust that will be seen as NaN too, then it is not guaranteed to be equal.

NaNs should be treated as equal in computation too, like zeros.

Either adding NaN-specific condition like zero, or we avoid such things here and require users to handle it before calling arrow kernels. For example, replacing negative zeros with positive zeros, normalizing NaNs with standard f32::NaN (f64, f16 too).

viirya · 2023-01-12T07:06:08Z

arrow-array/src/arithmetic.rs

+                if self.abs() == $zero && rhs.abs() == $zero {
+                    // `total_cmp` treats positive zero and negative zero as different.
+                    // But for computation system, it usually treats them as equal.
+                    Ordering::Equal
+                } else {
+                    <$t>::total_cmp(&self, &rhs)
+                }


I removed these changes.

viirya · 2023-01-12T07:06:36Z

arrow-ord/src/comparison.rs

+/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
+/// to treat them as equal, please normalize zeros before calling this kernel.


I updated these docs to make the behavior clear to users.

viirya · 2023-01-12T07:07:46Z

arrow-ord/src/ord.rs

+        assert_eq!(Ordering::Less, (cmp)(0, 1));
+        assert_eq!(Ordering::Greater, (cmp)(1, 0));


build_compare's behavior on zeros comparison is inconsistent with comparison kernels. Changed it to consistent.

tustvold

Thank you

jhorstmann · 2023-01-12T12:50:39Z

Looks good to me!

Just a note that the min/max aggregation kernels also use a different definition, I think following the postgres behavior of considering NaN to be greater than any other value.

tustvold · 2023-01-12T13:06:38Z

I think following the postgres behavior of considering NaN to be greater than any other value

Yeah it is honestly baffling to me that they took so long to define a total ordering predicate, we now have a standard but few people follow it 😅

ursabot · 2023-01-13T07:52:42Z

Benchmark runs are scheduled for baseline = 8688dba and contender = d49cd21. d49cd21 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

viirya added 3 commits January 11, 2023 10:46

Treat positive and negative float zeros as equal

854af9c

Update doc

ec4791e

Add test

9cbace1

github-actions bot added the arrow Changes to the arrow crate label Jan 11, 2023

tustvold reviewed Jan 11, 2023

View reviewed changes

github-actions bot removed the arrow Changes to the arrow crate label Jan 12, 2023

viirya commented Jan 12, 2023

View reviewed changes

Make build_compare consistent with comparison kernels

cc6eb8d

viirya force-pushed the float_zeros branch from dc06917 to cc6eb8d Compare January 12, 2023 07:12

tustvold approved these changes Jan 12, 2023

View reviewed changes

tustvold merged commit d49cd21 into apache:master Jan 13, 2023

tustvold added the arrow Changes to the arrow crate label Jan 30, 2023

tustvold mentioned this pull request Aug 14, 2023

Different sort behavior for floats between single-column and lexicographical sort #1941

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make consistent behavior on zeros equality on floating point types #3510

Make consistent behavior on zeros equality on floating point types #3510

viirya commented Jan 11, 2023

tustvold commented Jan 11, 2023

tustvold Jan 11, 2023

viirya commented Jan 11, 2023

tustvold commented Jan 11, 2023

viirya commented Jan 11, 2023

tustvold commented Jan 12, 2023 •

edited

Loading

viirya commented Jan 12, 2023

viirya Jan 12, 2023

viirya Jan 12, 2023

viirya Jan 12, 2023

tustvold left a comment

jhorstmann commented Jan 12, 2023

tustvold commented Jan 12, 2023

ursabot commented Jan 13, 2023

		/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
		/// to treat them as equal, please normalize zeros before calling this kernel.

		assert_eq!(Ordering::Less, (cmp)(0, 1));
		assert_eq!(Ordering::Greater, (cmp)(1, 0));

Make consistent behavior on zeros equality on floating point types #3510

Make consistent behavior on zeros equality on floating point types #3510

Conversation

viirya commented Jan 11, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

tustvold commented Jan 11, 2023

tustvold Jan 11, 2023

Choose a reason for hiding this comment

viirya commented Jan 11, 2023

tustvold commented Jan 11, 2023

viirya commented Jan 11, 2023

tustvold commented Jan 12, 2023 • edited Loading

viirya commented Jan 12, 2023

viirya Jan 12, 2023

Choose a reason for hiding this comment

viirya Jan 12, 2023

Choose a reason for hiding this comment

viirya Jan 12, 2023

Choose a reason for hiding this comment

tustvold left a comment

Choose a reason for hiding this comment

jhorstmann commented Jan 12, 2023

tustvold commented Jan 12, 2023

ursabot commented Jan 13, 2023

tustvold commented Jan 12, 2023 •

edited

Loading