-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregate functions Max/Min return different results based on order of NaN for floating point types. #21877
Comments
cc: @spershin @amitkdutta |
Hi, can I be assigned for this one? Thanks! |
It seems that most systems treat Footnotes
|
@tdcmeehan Tim, thank you for the references. Small correction: it appears that BigQuery doesn't treat NaN as greater than any other value. Both MIN and MAX in BigQuery are documented as returning NaN if any input is NaN. "If the argument is NaN for any row in the group, returns NaN." |
I would say this makes most sense. |
@Yuhta Only BigQuery does that though not the others (spark/presto etc..). |
Folks, anyhow we resolve this, we better make all the min/max of doubles consistent: |
However this is resolved, the docs should be checked if any update is needed to match the resolution of the issue. For example, http://prestodb.io/docs/current/functions/math.html. |
We also noticed that min(x) != min(x, 1).
|
One comment to the above: BigQuery actually treats NaNs as special values and for both min and max returns NaN if any of the inputs is NaN: https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#min |
In addition to the various aggs and scalar functions listed above, can we make the behavior consistent across all functions + comparisons involving NaN? In particular how do we deal with comparisons and expressions of the sort:
Note that for ORDER BY with nulls some systems allow specifying where NULLs get ordered with the NULLS FIRST/NULLS LAST construct. |
I've been looking into the NaN inconsistency and commented on this issue (since it had the more relevant title) with my update #21936 (comment). Would appreciate any feedback so we can move forward. |
Aggregate functions Max/Min return different results based on when NaN is encountered in the input for floating point types. If Nan is the first value then irrespective of what the other values are the result is Nan. This seems wrong.
Expected Behavior
Max/Min shouldnt be sensitive to order of NaN and should return same result.
Current Behavior
Possible Solution
The bug is likely here (https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/aggregation/AbstractMinMaxAggregationFunction.java) where state is initially null and set to Nan and then subsequently all comparison against it fails.
The text was updated successfully, but these errors were encountered: