Aggregate functions Max/Min return different results based on order of NaN for floating point types. #21877

kgpai · 2024-02-07T06:03:24Z

Aggregate functions Max/Min return different results based on when NaN is encountered in the input for floating point types. If Nan is the first value then irrespective of what the other values are the result is Nan. This seems wrong.

Expected Behavior

Max/Min shouldnt be sensitive to order of NaN and should return same result.

Current Behavior

presto:di> select max(x) from (values 4.0,nan(),null) T(x);;
 _col0
-------
   4.0
(1 row)

presto:di> select max(x) from (values nan(),4.0,null) T(x);
 _col0
-------
   NaN
(1 row)

Possible Solution

The bug is likely here (https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/aggregation/AbstractMinMaxAggregationFunction.java) where state is initially null and set to Nan and then subsequently all comparison against it fails.

The text was updated successfully, but these errors were encountered:

kgpai · 2024-02-07T06:04:30Z

cc: @spershin @amitkdutta

mbasmanova · 2024-02-07T11:23:56Z

CC: @tdcmeehan @majetideepak @aditi-pandit @feilong-liu @mlyublena @kaikalur

hainenber · 2024-02-09T12:34:09Z

Hi, can I be assigned for this one? Thanks!

tdcmeehan · 2024-02-11T19:07:47Z

It seems that most systems treat NaN as higher any other floating point value¹²³⁴, which is not consistent with IEEE 754⁵. We should revisit to see if the Presto behavior is too incongruent from larger industry expectations. But it seems like we can separate out the small inconsistency in ordering of the NaN value in our min/max aggregation functions for now.

mbasmanova · 2024-02-12T10:43:36Z

It seems that most systems treat NaN as higher any other floating point value

@tdcmeehan Tim, thank you for the references. Small correction: it appears that BigQuery doesn't treat NaN as greater than any other value. Both MIN and MAX in BigQuery are documented as returning NaN if any input is NaN.

"If the argument is NaN for any row in the group, returns NaN."

Yuhta · 2024-02-14T23:01:38Z

If the argument is NaN for any row in the group, returns NaN.

I would say this makes most sense.

kgpai · 2024-02-14T23:03:15Z

@Yuhta Only BigQuery does that though not the others (spark/presto etc..).

spershin · 2024-02-14T23:42:03Z

Folks, anyhow we resolve this, we better make all the min/max of doubles consistent:
array_min/array_max currently both return NaN when found at least one NaN element.
So, min/max should to the same and if not, then array_min/array_max should be changed accordingly.
Makes sense to double check min_by/max_by too.

steveburnett · 2024-02-15T18:48:20Z

However this is resolved, the docs should be checked if any update is needed to match the resolution of the issue. For example, http://prestodb.io/docs/current/functions/math.html.

mbasmanova · 2024-02-15T23:11:13Z

We also noticed that min(x) != min(x, 1).

presto:di> select max(x), min(x) from unnest(array[1.0, nan(), infinity()]) as t(x);
  _col0   | _col1
----------+-------
 Infinity |   1.0
(1 row)

presto:di> select max(x, 1), min(x, 1) from unnest(array[1.0, nan(), infinity()]) as t(x);
 _col0 | _col1
-------+-------
 [NaN] | [1.0]
(1 row)

mlyublena · 2024-02-15T23:47:54Z

It seems that most systems treat NaN as higher any other floating point value1 2 3 4, which is not consistent with IEEE 7545. We should revisit to see if the Presto behavior is too incongruent from larger industry expectations. But it seems like we can separate out the small inconsistency in ordering of the NaN value in our min/max aggregation functions for now.

Footnotes

https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#max ↩

https://docs.snowflake.com/en/sql-reference/data-types-numeric#special-values ↩

https://spark.apache.org/docs/latest/api/sql/index.html#array_max ↩

https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL ↩

https://en.wikipedia.org/wiki/NaN ↩

One comment to the above: BigQuery actually treats NaNs as special values and for both min and max returns NaN if any of the inputs is NaN: https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#min

mlyublena · 2024-02-16T00:03:33Z

In addition to the various aggs and scalar functions listed above, can we make the behavior consistent across all functions + comparisons involving NaN? In particular how do we deal with comparisons and expressions of the sort:

x > nan()
order by <column_containing_nulls>

Note that for ORDER BY with nulls some systems allow specifying where NULLs get ordered with the NULLS FIRST/NULLS LAST construct.

rschlussel · 2024-02-29T18:26:36Z

I've been looking into the NaN inconsistency and commented on this issue (since it had the more relevant title) with my update #21936 (comment). Would appreciate any feedback so we can move forward.

kgpai added the bug label Feb 7, 2024

kgpai mentioned this issue Feb 7, 2024

Presto Aggregate Max(Nan) yields non Nan value facebookincubator/velox#8690

Open

tdcmeehan added the good first issue label Feb 7, 2024

hainenber mentioned this issue Feb 9, 2024

Fix min and max for inputs that include NaN values #21893

Closed

6 tasks

tdcmeehan mentioned this issue Feb 23, 2024

Clarifications on comparisons with NaN #21936

Closed

rschlussel mentioned this issue Feb 28, 2024

map_top_n returns wrong results if NaN appears in the input #22040

Closed

kagamiori mentioned this issue Feb 29, 2024

Fix equality check for simple floating types in RowContainer facebookincubator/velox#7780

Closed

rschlussel mentioned this issue May 31, 2024

Implement new NaN behavior #22386

Merged

6 tasks

elharo closed this as completed Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregate functions Max/Min return different results based on order of NaN for floating point types. #21877

Aggregate functions Max/Min return different results based on order of NaN for floating point types. #21877

kgpai commented Feb 7, 2024

kgpai commented Feb 7, 2024

mbasmanova commented Feb 7, 2024

hainenber commented Feb 9, 2024

tdcmeehan commented Feb 11, 2024 •

edited

Loading

mbasmanova commented Feb 12, 2024

Yuhta commented Feb 14, 2024 •

edited

Loading

kgpai commented Feb 14, 2024

spershin commented Feb 14, 2024

steveburnett commented Feb 15, 2024

mbasmanova commented Feb 15, 2024

mlyublena commented Feb 15, 2024

Footnotes

mlyublena commented Feb 16, 2024

rschlussel commented Feb 29, 2024

Aggregate functions Max/Min return different results based on order of NaN for floating point types. #21877

Aggregate functions Max/Min return different results based on order of NaN for floating point types. #21877

Comments

kgpai commented Feb 7, 2024

Expected Behavior

Current Behavior

Possible Solution

kgpai commented Feb 7, 2024

mbasmanova commented Feb 7, 2024

hainenber commented Feb 9, 2024

tdcmeehan commented Feb 11, 2024 • edited Loading

Footnotes

mbasmanova commented Feb 12, 2024

Yuhta commented Feb 14, 2024 • edited Loading

kgpai commented Feb 14, 2024

spershin commented Feb 14, 2024

steveburnett commented Feb 15, 2024

mbasmanova commented Feb 15, 2024

mlyublena commented Feb 15, 2024

Footnotes

mlyublena commented Feb 16, 2024

rschlussel commented Feb 29, 2024

tdcmeehan commented Feb 11, 2024 •

edited

Loading

Yuhta commented Feb 14, 2024 •

edited

Loading