Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type hashing breaks equality #26105

Closed
chethega opened this issue Feb 18, 2018 · 24 comments
Closed

Type hashing breaks equality #26105

chethega opened this issue Feb 18, 2018 · 24 comments
Labels
bug Indicates an unexpected problem or unintended behavior hashing types and dispatch Types, subtyping and method dispatch

Comments

@chethega
Copy link
Contributor

t1 = Tuple{Union{UInt, Int}};
t2=Union{Tuple{UInt}, Tuple{Int}};
@show isequal(t1,t2);
#isequal(t1, t2) = true
@show t1===t2;
#t1 === t2 = false
@show hash(t1), hash(t2);
#(hash(t1), hash(t2)) = (0x7a1ad6fdb34f6f46, 0x525bdf12f9a66789)

Both on 0.62 and current master.

@StefanKarpinski StefanKarpinski added bug Indicates an unexpected problem or unintended behavior hashing labels Feb 18, 2018
@StefanKarpinski
Copy link
Member

This is going to be quite difficult to fix. It seems to require being able to completely normalize types to a canonical form in which equality is equivalent to structure. For types, one typically wants to use IdDictalthough, of course, that distinguishes equal types that look different.

@nalimilan nalimilan added the types and dispatch Types, subtyping and method dispatch label Feb 18, 2018
@nalimilan
Copy link
Member

As a defensive strategy, we return the same hash for all complex types until a solution is found. I'm not sure how to define "complex", but an extensive definition could include all non-concrete types.

@chethega
Copy link
Contributor Author

I think normalization already exists?

struct type_normalize{T} end
type_normalize{t1}===type_normalize{t2}
#true 
hash(type_normalize{t1})===hash(type_normalize{t2})
#true 

@chethega
Copy link
Contributor Author

...so the low-effort ugly proposal would be to use Base.hash(T::Type,h::UInt)= hash(objectid(type_normalize{T}),h).

@StefanKarpinski
Copy link
Member

Right, we could actually do some kind of "approximate normalization" – i.e. apply simplifications that aren't really correct but are roughly correct and use the resulting normalized type as a proxy for hashing. All you need is for T1 == T2 approximate normalizations of T1 and T2 are the same.

@chethega
Copy link
Contributor Author

Do we have an example where objectid(type_normalize{t1}) !== objectid(type_normalize{t2}) but t1==t2, with (t1 isa Type) && (t2 isa Type)?

Or are you concerned that this is too expensive?

@StefanKarpinski
Copy link
Member

I'm fairly certain there are cases where that happens. @JeffBezanson will have specific examples.

@vtjnash
Copy link
Member

vtjnash commented Feb 19, 2018

We try to ensure this for concrete types (the cases where it fails are considered bugs), but it's a pretty expensive operation (can be a linear scan testing type-equal against all other types).

@martinholters
Copy link
Member

martinholters commented Feb 19, 2018

Ref. #25430. Here is a failing example:

julia> t1 = Vector{Tuple{Any}}
Array{Tuple{Any},1}

julia> t2 = Vector{Tuple{>:Int}}
Array{Tuple{#s1} where #s1>:Int64,1}

julia> t1 == t2
true

julia> type_normalize{t1} === type_normalize{t2}
false

Note also that the current normalization code by design only ensures that for every concrete type t1, there is at most one t2 such that t1==t2 but t1!==t2. (There are two distinct caches and type-equal types can end up distributed between them.)

Furthermore, assuming we had only one cache and it worked perfectly, the normalization would produce the first type-equal (concrete) type that had been instantiated in the session, so the calculated hash for the same type could differ between sessions. I don't know how problematic that is, but I think we usually try to avoid it.

@JeffBezanson
Copy link
Member

Right, what we're doing now is not intended to be normalization, just caching.

I very much doubt it's worth using type equality (==) in table lookup. One option is to make isequal call === for types. That makes equal types somewhat like 0.0 and -0.0, with isequal making a finer distinction than ==. Another option is to make hash throw an error on types. That would probably be too strict though, since people do successfully use types as dict keys in practice.

@StefanKarpinski
Copy link
Member

What's wrong with the approximate normalization approach? Sure, using type equality in lookup may not be the most typical thing to do, but why not just make it correct?

@JeffBezanson
Copy link
Member

It seems to me that approximate normalization would still be quite expensive and also unclear how to implement. Taking the example in the OP, Tuple{Union{UInt, Int}} needs to be normalized to Union{Tuple{UInt}, Tuple{Int}} since you can't generally go the other way. However that generates union types of size O(2^n). So we might want a much coarser approximation, but I'm not sure what it would be.

@oscardssmith
Copy link
Member

Tuple{Number}?

@chethega
Copy link
Contributor Author

...this also affects dispatch. Separate bug, known bug or not-a-bug?

t1 = Tuple{Union{UInt, Int}};
t2=Union{Tuple{UInt}, Tuple{Int}};
foo(::Type{t1})=println("t1");
t2 isa Type{t1}
#true 
foo(t2)
#ERROR: MethodError: no method matching foo(::Type{Union{Tuple{Int64}, Tuple{UInt64}}})
#Closest candidates are:
#  foo(::Type{Tuple{Union{Int64, UInt64}}}) at REPL[3]:1

@nalimilan
Copy link
Member

@JeffBezanson How about returning the same hash for all problematic types?

@martinholters
Copy link
Member

How about returning the same hash for all problematic types?

It's not so easy to discern a problematic type. Is Vector{Tuple{Any}} problematic? Yes, see example above. But who would have thought so?

@nalimilan
Copy link
Member

I wouldn't, but maybe Jeff would have... ;-)

@JeffBezanson
Copy link
Member

Nope, I wouldn't have :)

@martinholters
Copy link
Member

martinholters commented Feb 20, 2018

So even if we could do problematic_type_oracle = convert(Function, @JeffBezanson), that wouldn't help us.

More seriously, I think given a type, deciding a) "this as as simple as it gets" versus b) "there might be a simpler type equal to this one" might be doable (for some appropriate measure of "simple"). But either deciding for a type of class a) whether there is a more complicated type equal to it (but not already being normalized to it) or finding the simpler-but-equal type (if any) for a type of class b) seems much harder.

So I guess what we really should be discussing is what the better alternative is: throwing an error when trying to hash a type or making all types hash equal?

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Feb 20, 2018

Hashing all types the same is the simplest function that's guaranteed to give the same hash value whenever the inputs are type-equal. The question is if we can compute a somewhat less simple function that still has this property. It does seem to be quite tricky to do so.

@JeffBezanson
Copy link
Member

I definitely think it would be better to make isequal call === on types than to make type dict lookup O(n). In practice, there is not a great need to equate differently-structured but semantically-equal types in table lookup, even in some internal use cases.

@StefanKarpinski
Copy link
Member

So you would then rename the current == for types to something else? Presumably there is some utility of that function, or would you just delete it entirely?

@chethega
Copy link
Contributor Author

chethega commented Sep 4, 2018

The dispatch example from above works as expected on 1.0.0:

t1 = Tuple{Union{UInt, Int}};
t2=Union{Tuple{UInt}, Tuple{Int}};
foo(::Type{t1})=println("t1");
t1==t2 #true
foo(t2) #t1
hash(t1),hash(t2) #(0x81f0591d17fe9f21, 0xfdad4743f836317c)

@vtjnash
Copy link
Member

vtjnash commented Jun 3, 2020

This is captured by #35130

@vtjnash vtjnash closed this as completed Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior hashing types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

No branches or pull requests

7 participants