Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Chain length methods #4166

Merged
merged 7 commits into from
Apr 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions core/src/main/scala-2.12/cats/compat/ChainCompat.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
package cats.data

private[data] trait ChainCompat[+A] { _: Chain[A] =>

/**
* The number of elements in this chain, if it can be cheaply computed, -1 otherwise.
* Cheaply usually means: Not requiring a collection traversal.
*/
final def knownSize: Long =
this match {
case Chain.Empty => 0
case Chain.Singleton(_) => 1
case _ => -1
}
}
17 changes: 17 additions & 0 deletions core/src/main/scala-2.13+/cats/data/ChainCompat.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
package cats
package data

private[data] trait ChainCompat[+A] { self: Chain[A] =>

/**
* The number of elements in this chain, if it can be cheaply computed, -1 otherwise.
* Cheaply usually means: Not requiring a collection traversal.
*/
final def knownSize: Long =
this match {
case Chain.Empty => 0
case Chain.Singleton(_) => 1
case Chain.Wrap(seq) => seq.knownSize.toLong
case _ => -1
}
}
34 changes: 14 additions & 20 deletions core/src/main/scala/cats/data/Chain.scala
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ import Chain.{
* O(1) `uncons`, such that walking the sequence via N successive `uncons`
* steps takes O(N).
*/
sealed abstract class Chain[+A] {
sealed abstract class Chain[+A] extends ChainCompat[A] {

/**
* Returns the head and tail of this Chain if non empty, none otherwise. Amortized O(1).
Expand Down Expand Up @@ -566,32 +566,26 @@ sealed abstract class Chain[+A] {
* Returns the number of elements in this structure
*/
final def length: Long = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we have a test for this, but if not we should add that chain.length == chain.toList.length.toLong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really wondering how come Chain#length became Long rather than just Int (which is used for all other collections' length). Is there any chance that Chain#length can outgrow Int type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's annoying - it means Chain can never extend Iterable or Seq.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure it can, just cast to Int, but I don't think that's a good idea anyway. Make a wrapper and call toSeq and just implement the methods that way.

Why is the length an Int in scala? I think because java did this in java.util. Why did java.util do it? In those days computers were all 32 bits, so Long was more expensive (that isn't the case now, I don't think), also the idea of an arraylist with more than 2 billion items seemed crazy in the late 90s.

With scala List, there is nothing that I know of that prevents you from making a list more than 2 billion long: it is just a series of pointers. But if you do, I guess length will just throw or return junk.

In cats, we preferred Long for two reasons:

  1. Long is as cheap as Int now.
  2. We can actually imagine cases where Int is overflowed, but Long will basically never be overflowed because 2^63 is a gigantic number (we may live long enough to see computers with that much ram, but I doubt it).

So, if you want to implement def toSeq: Seq[A] on Chain, you can totally do it. The length being cast to Int is no more of a lie than when List does it, and we live with that all the time. I don't see the problem.

Copy link
Contributor Author

@bplommer bplommer Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure it can, just cast to Int

Not without return type polymorphism (the method name length is already taken by the Long version)- but yeah, you can still do a conversion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but the toSeq or toIterable without doing a deep copy does work.

// TODO: consider optimizing for `Chain.Wrap` case.
// Some underlying seq may not need enumerating all elements to calculate its size.
val iter = iterator
var i: Long = 0
while (iter.hasNext) { i += 1; iter.next(); }
i
@annotation.tailrec
def loop(chains: List[Chain[A]], acc: Long): Long =
chains match {
case Nil => acc
case h :: tail =>
h match {
case Empty => loop(tail, acc)
case Wrap(seq) => loop(tail, acc + seq.length)
case Singleton(a) => loop(tail, acc + 1)
case Append(l, r) => loop(l :: r :: tail, acc)
}
}
loop(this :: Nil, 0L)
}

/**
* Alias for length
*/
final def size: Long = length

/**
* The number of elements in this chain, if it can be cheaply computed, -1 otherwise.
* Cheaply usually means: Not requiring a collection traversal.
*/
final def knownSize: Long =
// TODO: consider optimizing for `Chain.Wrap` case – call the underlying `knownSize` method.
// Note that `knownSize` was introduced since Scala 2.13 only.
this match {
case _ if isEmpty => 0
case Chain.Singleton(_) => 1
case _ => -1
}

/**
* Compares the length of this chain to a test value.
*
Expand Down