-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate HashLaws#sameAsUniversalHash
#4319
Conversation
) | ||
def hashDistinct[A](fa: F[A])(implicit H: Hash[A]): F[A] = { | ||
|
||
final class Wrapper(val value: A) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea: what if we did:
trait Hash[A] extends Eq[A] {
final class Wrapper private (val unwrap: A) {
override def equals(x: Any): Boolean =
x match {
case that: Wrapper =>
eqv(unwrap, that.unwrap)
case _ => false
}
override val hashCode: Int = hash(unwrap)
}
def wrap(a: A): Wrapper = new Wrapper(a)
}
Then we could use H.wrap(a)
below and this same pattern could be useful for others using JVM interop.
I don't know what the binary compatibility story is on adding member classes like this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and it's binary-compatible. But it feels a bit ad-hoc. Should we add similar wrappers for Eq
and Order
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be ad-hoc but there are two important differences:
- scala doesn't use typeclasses for hashing but did that with Ordering.
- hashCode and equals are so deeply built into the JVM that many use cases assume that ability.
Eq is somewhat similar, but less impactful IMO due to the outsized role of hashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question: isn't it awkward to work with a type defined like this? Since it's not a static type, but a member of a specific instance of the typeclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a path dependent type, which a lot of scala users almost pretend don't exist but they are quite powerful.
I've used them in several cases an can give some really nice invariants. In this case it is exactly what we want: statically guaranteeing that two values wrap something using the same Hash value.
But anyway, it's just a suggestion. We could add it later. You could make a more minimal change by just using your code (which is basically inlining this design at each use site, but there is only one use site so, it's fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, since we're wrapping each a
into Wrapper
, it should have some impact on the overall performance. But we're ok with it for now – is it correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The alternative is to use the typeclass-based HashMap
that was ultimately added to cats-collections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an alternative.
We could use an IntMap[List[A]]
where the key is the hashCode. For all lists of length 1 emit them all. For all lists with more than one, do the N^2 pair wise check (if the first item is not equal to anything in the tail, emit, else discard repeat).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like:
traverseFilter[State[IntMap[A], *], A, A](fa) { a =>
State {
alreadyIn => {
val ahash = H.hash(a)
alreadyIn.get(ahash) match {
case None =>
// the common case of no collision:
(alreadyIn.updated(ahash, a :: Nil), Some(a)))
case Some(as) =>
// the collision case
val aexists = existing.exists(H.eqv(a, _))
if (aexists) (alreadyIn, None) else (alreadyIn.updated(ahash, a :: existing), Some(a)))
}
}
}
.run(IntMap.empty)
.value
._2
Co-authored-by: Patrick Oscar Boykin <boykin@pobox.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perf nit you can take or leave.
traverseFilter(fa) { a => | ||
State { (distinct: IntMap[List[A]]) => | ||
val ahash = H.hash(a) | ||
val existing = distinct.getOrElse(ahash, Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only nit is this is call by name so we allocate the thunk on the common case of no collisions (which is why I used pattern matching).
I wish we didn't have to think of such things. Maybe eventually in scala 3 getOrElse could be an inline method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh. I was concerned about the allocation of the Option
for the pattern match, but I guess it doesn't allocate in the None
case. I wonder if the JIT can inline this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, yeah,
the reason I used the match was to branch on the common case of no collisions and then we also avoid the if
branch and the dispatch on contains_
.
So, personally, I think internal to a library it is worth avoiding those things and using the code I wrote before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason I used the match was to branch on the common case of no collisions
Why is this the common case? What if you are filtering a collection of duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought Option
pattern matching was optimized by the compiler, but maybe that's in some deprecated version of past optimizing backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question. If you do List.fill(n)(a)
definitely you will have a lot of collisions.
I was assuming that collisions are usually rare, mostly because for most common use cases in HashMaps collisions are rare.
But in any case, once you know you have not seen a hash before, by keeping the match, you can leverage that.
Using the getOrElse
you forget it a bit (you can recover by checking if the list isEmpty, but that requires dispatch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern match decompiles to this (EDIT: updated to e4630d3):
Option option = distinct.get(ahash);
if (None$.MODULE$.equals(option)) {
Object object = a;
return new Tuple2((Object)distinct.updated(ahash, (Object)Nil$.MODULE$.$colon$colon(object)), (Object)new Some(a));
}
if (option instanceof Some) {
Some some = (Some)option;
List existing = (List)some.value();
if (Traverse$.MODULE$.apply(UnorderedFoldable$.MODULE$.catsTraverseForList()).contains_(existing, a, H)) {
return new Tuple2(distinct, (Object)None$.MODULE$);
}
Object object = a;
return new Tuple2((Object)distinct.updated(ahash, (Object)existing.$colon$colon(object)), (Object)new Some(a));
}
throw new MatchError((Object)option);
And also
sameAsScalaHashing
, based on this comment:cats/kernel/src/main/scala/cats/kernel/Hash.scala
Lines 40 to 41 in f47123e
Closes #4284.
I think there may be a couple other places we are relying on this behavior in Cats (@satorg do you remember?). We should make sure to quash those.