Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate HashLaws#sameAsUniversalHash #4319

Merged
merged 4 commits into from
Oct 18, 2022
Merged

Conversation

armanbilge
Copy link
Member

And also sameAsScalaHashing, based on this comment:

// `Hash#toHashing` deliberately not implemented since `scala.util.hashing.Hashing` is only
// compatible with universal equality.

Closes #4284.

I think there may be a couple other places we are relying on this behavior in Cats (@satorg do you remember?). We should make sure to quash those.

@armanbilge armanbilge added this to the 2.9.0 milestone Oct 15, 2022
johnynek
johnynek previously approved these changes Oct 15, 2022
)
def hashDistinct[A](fa: F[A])(implicit H: Hash[A]): F[A] = {

final class Wrapper(val value: A) {
Copy link
Contributor

@johnynek johnynek Oct 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: what if we did:

trait Hash[A] extends Eq[A] {
  final class Wrapper private (val unwrap: A) {
    override def equals(x: Any): Boolean =
      x match {
        case that: Wrapper =>
          eqv(unwrap, that.unwrap)
        case _ => false
      }

     override val hashCode: Int = hash(unwrap) 
  }

  def wrap(a: A): Wrapper = new Wrapper(a)
}

Then we could use H.wrap(a) below and this same pattern could be useful for others using JVM interop.

I don't know what the binary compatibility story is on adding member classes like this though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and it's binary-compatible. But it feels a bit ad-hoc. Should we add similar wrappers for Eq and Order?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be ad-hoc but there are two important differences:

  1. scala doesn't use typeclasses for hashing but did that with Ordering.
  2. hashCode and equals are so deeply built into the JVM that many use cases assume that ability.

Eq is somewhat similar, but less impactful IMO due to the outsized role of hashing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question: isn't it awkward to work with a type defined like this? Since it's not a static type, but a member of a specific instance of the typeclass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a path dependent type, which a lot of scala users almost pretend don't exist but they are quite powerful.

I've used them in several cases an can give some really nice invariants. In this case it is exactly what we want: statically guaranteeing that two values wrap something using the same Hash value.

But anyway, it's just a suggestion. We could add it later. You could make a more minimal change by just using your code (which is basically inlining this design at each use site, but there is only one use site so, it's fine)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, since we're wrapping each a into Wrapper, it should have some impact on the overall performance. But we're ok with it for now – is it correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The alternative is to use the typeclass-based HashMap that was ultimately added to cats-collections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an alternative.

We could use an IntMap[List[A]] where the key is the hashCode. For all lists of length 1 emit them all. For all lists with more than one, do the N^2 pair wise check (if the first item is not equal to anything in the tail, emit, else discard repeat).

Copy link
Contributor

@johnynek johnynek Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like:

    traverseFilter[State[IntMap[A], *], A, A](fa) { a =>
      State {
        alreadyIn => {
          val ahash = H.hash(a)
          alreadyIn.get(ahash) match {
            case None =>
              // the common case of no collision:
              (alreadyIn.updated(ahash, a :: Nil), Some(a)))
            case Some(as) =>
              // the collision case
              val aexists = existing.exists(H.eqv(a, _))
          
              if (aexists) (alreadyIn, None) else (alreadyIn.updated(ahash, a :: existing), Some(a)))
          }
      }
    }
    .run(IntMap.empty)
    .value
    ._2

Co-authored-by: Patrick Oscar Boykin <boykin@pobox.com>
johnynek
johnynek previously approved these changes Oct 17, 2022
Copy link
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf nit you can take or leave.

traverseFilter(fa) { a =>
State { (distinct: IntMap[List[A]]) =>
val ahash = H.hash(a)
val existing = distinct.getOrElse(ahash, Nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only nit is this is call by name so we allocate the thunk on the common case of no collisions (which is why I used pattern matching).

I wish we didn't have to think of such things. Maybe eventually in scala 3 getOrElse could be an inline method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh. I was concerned about the allocation of the Option for the pattern match, but I guess it doesn't allocate in the None case. I wonder if the JIT can inline this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, yeah,

the reason I used the match was to branch on the common case of no collisions and then we also avoid the if branch and the dispatch on contains_.

So, personally, I think internal to a library it is worth avoiding those things and using the code I wrote before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason I used the match was to branch on the common case of no collisions

Why is this the common case? What if you are filtering a collection of duplicates?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought Option pattern matching was optimized by the compiler, but maybe that's in some deprecated version of past optimizing backends.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. If you do List.fill(n)(a) definitely you will have a lot of collisions.

I was assuming that collisions are usually rare, mostly because for most common use cases in HashMaps collisions are rare.

But in any case, once you know you have not seen a hash before, by keeping the match, you can leverage that.

Using the getOrElse you forget it a bit (you can recover by checking if the list isEmpty, but that requires dispatch).

Copy link
Member Author

@armanbilge armanbilge Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern match decompiles to this (EDIT: updated to e4630d3):

            Option option = distinct.get(ahash);
            if (None$.MODULE$.equals(option)) {
                Object object = a;
                return new Tuple2((Object)distinct.updated(ahash, (Object)Nil$.MODULE$.$colon$colon(object)), (Object)new Some(a));
            }
            if (option instanceof Some) {
                Some some = (Some)option;
                List existing = (List)some.value();
                if (Traverse$.MODULE$.apply(UnorderedFoldable$.MODULE$.catsTraverseForList()).contains_(existing, a, H)) {
                    return new Tuple2(distinct, (Object)None$.MODULE$);
                }
                Object object = a;
                return new Tuple2((Object)distinct.updated(ahash, (Object)existing.$colon$colon(object)), (Object)new Some(a));
            }
            throw new MatchError((Object)option);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove Hash law that it equals .hashCode
3 participants