Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numeric equality ignoring type with NaNs equal #5314

Closed
StefanKarpinski opened this issue Jan 6, 2014 · 25 comments
Closed

numeric equality ignoring type with NaNs equal #5314

StefanKarpinski opened this issue Jan 6, 2014 · 25 comments
Labels
breaking This change will break code needs decision A decision on this change is needed
Milestone

Comments

@StefanKarpinski
Copy link
Member

One thing I find highly objectionable about the current state of affairs with ==, isequal and === is that there is no way to compare numeric values of different types so that type doesn't matter and NaNs are equal. I was trying to do this test of the fidelity of the a + b*im construct on my sk/imaginary branch and there's just no obvious way to check that a + b*im and Complex(a,b) are the same value even if they are different types, where a and/or b may be NaN.

@StefanKarpinski
Copy link
Member Author

The whole issue can be reduced to the fact that there's no equality operation such that NaN and NaN32 are equal:

julia> NaN == NaN32
false

julia> isequal(NaN,NaN32)
false

julia> NaN === NaN32
false

@JeffBezanson
Copy link
Member

Maybe we should try again here. We could have every type generally implement ==, and then add

isequal(x::Real, y::Real) = ((x==y) & (signbit(x)==signbit(y))) | (isnan(x)&isnan(y))

It would be nice for isequal to only be defined where it is different from == (same for isless and <). Although this is simpler, it has worse pitfalls than what we do now. Now, we can say <(x,y) = isless(x,y), because a total order isless suffices for <, but not the other way around. Therefore if a type only provides < or isless, everything is correct. If we switched to the simpler approach we would have to make sure every type with a partial < implements both functions.

Also, I'm not sure it's necessary for every data structure to have == that calls == recursively, and isequal that calls isequal recursively. Arguably, the IEEE standard only applies when an argument to == is a float. Having == for a non-IEEE-754 type like Dict call isequal on its elements might be the right thing to do. Getting false when comparing two identical data structures just because there's a NaN buried inside always seemed kind of strange to me.

@StefanKarpinski
Copy link
Member Author

Fucking NaN. Honestly, I'm starting to think that the idea of a thing that isn't equal to itself is just not compatible with a sane notion of equality. I'm starting to think we should just throw errors when we encounter NaN :-\

@JeffBezanson
Copy link
Member

That's what makes me want to push NaN into a corner where it only matters for a==b where a and/or b is floating-point. Every other context can do something more sane.

Most operations that can give NaN should throw errors instead; the only problem is performance-critical functions like +.

Then there is also the issue of signed zero.

Switching briefly to ordering, some other types, like sets, also have a canonical strictly-partial order. Currently isless for sets is strict-subset, and I'm not sure that's right. That should be a < method, and we could maybe also define lexcmp for sets.

@lindahua
Copy link
Contributor

lindahua commented Jan 6, 2014

FWIW, MATLAB has a function isequaln that Determine array equality, treating NaN values as equal.

Reference: http://www.mathworks.com/help/matlab/ref/isequaln.html

@pao
Copy link
Member

pao commented Jan 6, 2014

@lindahua I believe we had that function for a while (I may have even added it). It may also have been renamed. I seem to recall a discussion along those lines.

@lindahua
Copy link
Contributor

lindahua commented Jan 6, 2014

This function doesn't live in Julia base though.

@pao
Copy link
Member

pao commented Jan 6, 2014

Yeah, it didn't make the transition out of extras.jl. I don't see any explanation as to why, so it might have been either oversight or disgust; can't say for sure. :D

@JeffBezanson
Copy link
Member

isequal is supposed to do that.

@jiahao
Copy link
Member

jiahao commented Jan 7, 2014

If a=[1.0, NaN] and == is changed to call isequal elementwise and isequal is changed to become true on NaNs, then a==a != all(a.==a)

@jiahao
Copy link
Member

jiahao commented Jan 7, 2014

#5234

@JeffBezanson
Copy link
Member

Numeric arrays are also fairly "number-like" and so would probably decide to call == recursively. But I'm not sure this applies to all data structures.

@nalimilan
Copy link
Member

FWIW, there's a very similar issue when dealing with NAs in DataArrays, and there's also the question of == vs. isequal(), with the former returning NA as soon as one NA is found. JuliaStats/DataArrays.jl#46

@gitfoxi
Copy link
Contributor

gitfoxi commented Jan 7, 2014

As a side thing, in general, it would be cool if arbitrary data structures
with the same contents would evaluate as being equal. Instead you have to
write custom isequal for every type.

The reason is probably that if it the structure were to have circular
references you'd end up in an infinite loop. But there's some
straightforward algorithms for comparing possibly-recursive structures
which detect cycles by keeping track of everywhere it's already been and
terminating recursion -- which is some overhead, but maybe worth it?

On Tue, Jan 7, 2014 at 12:50 AM, Milan Bouchet-Valat <
notifications@github.com> wrote:

FWIW, there's a very similar issue when dealing with NAs in DataArrays,
and there's also the question of == vs. isequal(), with the former
returning NA as soon as one NA is found. JuliaStats/DataArrays.jl#46JuliaStats/DataArrays.jl#46


Reply to this email directly or view it on GitHubhttps://github.com//issues/5314#issuecomment-31721430
.

Michael

@stevengj
Copy link
Member

Note that it's not just NaN != NaN32. As per the IEEE-754 standard, NaN != NaN (it is the only numeric value that does not compare equal to itself).

@JeffBezanson
Copy link
Member

That is settled; we're thinking of isequal here.

@JeffBezanson
Copy link
Member

I want to point out that there is no equality predicate that always does what you want. All we can do is provide some reasonable primitives, and in specific cases you must reason about what kind of equality you need. For example, this exact same issue could have been filed about sign bits instead of NaN (a comparison like ==, but also requiring sign bits to match). Sometimes you will have to write a==b && signbit(a)==signbit(b), or a==b || (isnan(a) && isnan(b)). Or you might have to write a==b && typeof(a)==typeof(b).

@JeffBezanson
Copy link
Member

Perhaps isequal should ignore the types of NaNs. Sounds like a hack, but it is true that nearly all numeric functions will do the same thing given a NaN of any type --- it will propagate and you'll get another NaN. This isn't the case with non-NaNs; different precision arguments will yield different answers.

@StefanKarpinski
Copy link
Member Author

How does this help?

isequal(Complex(1e0,1e0),Complex(1f0,1f0)) ==> false
isequal(Complex(NaN,NaN),Complex(NaN32,NaN32)) ==> true

Clearly you can't satisfy everyone, but this is a case that is particularly nasty to express generically.

The more I think about it, the more I think that making isequal behave as much like == as possible is the best approach. That minimizes the number of things for programmers to think about since isequal isn't really an entirely new thing – it's just == modified as little as possible to make it work for hashing. There just isn't any sensible intermediate between === and == that isn't fundamentally arbitrary.

I also think this is bit of a mistake:

julia> 1 == "foo"
false

julia> 1 < "foo"
ERROR: no method isless(ASCIIString, Int64)
 in < at operators.jl:18

It seems to me that == and <= should fail or work on exactly the same arguments. Making === work everywhere makes sense though since it's a completely global comparison predicate.

@JeffBezanson
Copy link
Member

I do think there is a non-arbitrary intermediate between === and ==, which is to behave like === but treat everything as immutable. This is justified because many values are immutable by convention.

It's really helpful for == to fall back to ===. Otherwise it would be hard to know what methods are needed. For example, if you have a value-or-nothing argument, you might informally check if x==nothing. Having that not work is an unnecessary gotcha. Equality is really different from ordering, since we already admit that there is at least one valid equivalence relation among all values.

@StefanKarpinski
Copy link
Member Author

Unfortunately, that definition is not at all what people expect hashing to do. To play devil's advocate, currently it's unclear whether you should write x == nothing or x === nothing, leading to a lot of variation in people's code. If x == nothing was a no method error, then it would be clear that x === nothing should be used.

@JeffBezanson
Copy link
Member

I agree that one can argue that definition is not useful. The first example I always think of is that string encoding shouldn't matter for hash keys. But for numeric types I'm not so sure.

It's a tough call. It is always nice to trap questionable operations like comparing totally unrelated things, but (1) a lot of people want to just use == everywhere and forget about it, (2) it does make code more generic. Otherwise we might start to see boilerplate like if (isa(x,AbstractArray) && x==y) || x===y.

@stevengj
Copy link
Member

I think that not defining == for comparison of unrelated types would just confuse people for no purpose; we should only throw an error in cases like 3 < "hello" where the desired meaning is not clear.

@StefanKarpinski
Copy link
Member Author

Fair enough. That's really a tangential issue.

@JeffBezanson
Copy link
Member

Fixed by #6624

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

8 participants