Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maximum([fill(NaN, 255); missing]) !== maximum([fill(NaN, 256); missing]) (Should we fix reduce(max, ...) etc.?) #36287

Closed
tkf opened this issue Jun 14, 2020 · 8 comments
Labels
domain:fold sum, maximum, reduce, foldl, etc. domain:missing data Base.missing and related functionality kind:bug Indicates an unexpected problem or unintended behavior kind:correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing kind:minor change Marginal behavior change acceptable for a minor release

Comments

@tkf
Copy link
Member

tkf commented Jun 14, 2020

Currently, we have

julia> maximum([fill(NaN, 256); missing])
NaN

julia> maximum([fill(NaN, 255); missing])
missing

julia> maximum(x for x in [fill(NaN, 256); missing])
missing

julia> maximum(x for x in [fill(NaN, 255); missing])
missing

julia> max(missing, NaN)
missing

As you can see, maximum is incompatible with the definition of max depending on the size of the array. Furthermore, the result changes when processed with the identity iterator transformation.

I think maximum(xs) should be defined as reduce(max, xs) semantically. Currently, it is defined as such at the "syntax" level:

julia/base/reducedim.jl

Lines 716 to 725 in 13b07fc

for (fname, _fname, op) in [(:sum, :_sum, :add_sum), (:prod, :_prod, :mul_prod),
(:maximum, :_maximum, :max), (:minimum, :_minimum, :min)]
@eval begin
# User-facing methods with keyword arguments
@inline ($fname)(a::AbstractArray; dims=:, kw...) = ($_fname)(a, dims; kw...)
@inline ($fname)(f, a::AbstractArray; dims=:, kw...) = ($_fname)(f, a, dims; kw...)
# Underlying implementations using dispatch
($_fname)(a, ::Colon; kw...) = ($_fname)(identity, a, :; kw...)
($_fname)(f, a, ::Colon; kw...) = mapreduce(f, $op, a; kw...)

However, mapreduce implementation for max/min is not compatible with the mathematical definition of mapreduce.

There were related discussions on findmax/min and argmax/min (#27613, #35316) and I've been suggesting to define the total orderings compatible with max/min and use it to define all the similar functions, at least at the API definition level. I think this is the right direction as otherwise parallelizing these reducers is hard; i.e., we need to express them as folds with an associative operator which requires transitivity in the corresponding comparison. Furthermore, rigorously defining these reducers would let us confidently leverage the commutativity of max/min and define efficient maximum(::Diagonal) etc. #30236.

So, I suggest to:

  1. Use the mathematically correct definition in mapreduce(min, ...) and mapreduce(max, ...).
  2. Define the total ordering compatible with min (maybe in Add 2-arg versions of findmax/min, argmax/min #35316) and use it in argmin and findmin. (Note: isless is already max-compatible)

cc @nalimilan @StefanKarpinski @mbauman @JeffBezanson

@tkf tkf added domain:fold sum, maximum, reduce, foldl, etc. domain:missing data Base.missing and related functionality kind:bug Indicates an unexpected problem or unintended behavior kind:minor change Marginal behavior change acceptable for a minor release labels Jun 14, 2020
@nalimilan
Copy link
Member

Good catch. I don't think this qualifies as a minor change though, it's just a bug that should be fixed. maximum([missing; fill(NaN, 256)]), which is very close to one of your examples, even errors in 1.4. The fix I implemented at #35989 should be improved to avoid short-circuiting for NaN when the array can contain missing.

I can't comment on your more general proposal. It sounds appealing but I'm not able to judge. :-)

@tkf
Copy link
Member Author

tkf commented Jun 15, 2020

@nalimilan What do you think about

julia> findmax([NaN, missing])
(NaN, 1)

being incompatible with maximum? That's the minor change part that I wanted to fix; i.e., return (missing, 2) (which is not clear at all from reading the OP, now that I look at it again).

@nalimilan
Copy link
Member

Well I'd say it qualifies as a bug fix, just like changing maximum([fill(NaN, 256); missing]) to return missing. Anyway saying it's a minor change is OK too, what matters is not to wait until 2.0. :-)

@tkf
Copy link
Member Author

tkf commented Jun 15, 2020

OK. findmax explicitly says "If any data element is NaN" (even though it directly contradicts with "The result is in line with max.") so I was a bit worried:

julia/base/array.jl

Lines 2172 to 2177 in 13b07fc

findmax(itr) -> (x, index)
Return the maximum element of the collection `itr` and its index. If there are multiple
maximal elements, then the first one will be returned.
If any data element is `NaN`, this element is returned.
The result is in line with `max`.

@nalimilan
Copy link
Member

Woops. I wasn't aware of that. So yeah that would be a minor change.

@tkf
Copy link
Member Author

tkf commented Jun 15, 2020

I should have mentioned it :)

@LilithHafner LilithHafner added the kind:correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing label Aug 7, 2023
@ViralBShah
Copy link
Member

ViralBShah commented Oct 24, 2023

This specific issue is fixed in 1.9:


julia> maximum([fill(NaN, 256); missing])
missing

julia> maximum([fill(NaN, 255); missing])
missing

julia> maximum(x for x in [fill(NaN, 256); missing])
missing

julia> maximum(x for x in [fill(NaN, 255); missing])
missing

julia> max(missing, NaN)
missing

@nalimilan
Copy link
Member

nalimilan commented Oct 24, 2023

I wonder what fixed it though, as I couldn't find a relevant PR and the code doesn't seem to have been touched since the issue was filed. Examples at #45932 also seem to work now. EDIT: OK, it was fixed by #43604, which removed the short-circuiting in the presence of NaN or missing, and added a test similar to the one in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:fold sum, maximum, reduce, foldl, etc. domain:missing data Base.missing and related functionality kind:bug Indicates an unexpected problem or unintended behavior kind:correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing kind:minor change Marginal behavior change acceptable for a minor release
Projects
None yet
Development

No branches or pull requests

4 participants