-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Chain length methods #4166
Changes from 2 commits
f331cc8
4ec151c
237c2ed
d5000f3
bb24501
718dc8d
2125218
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
package cats.data | ||
|
||
private[data] trait ChainCompat[+A] { _: Chain[A] => | ||
|
||
/** | ||
* The number of elements in this chain, if it can be cheaply computed, -1 otherwise. | ||
* Cheaply usually means: Not requiring a collection traversal. | ||
*/ | ||
final def knownSize: Long = | ||
this match { | ||
case Chain.Empty => 0 | ||
case Chain.Singleton(_) => 1 | ||
case _ => -1 | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
package cats | ||
package data | ||
|
||
private[data] trait ChainCompat[+A] { _: Chain[A] => | ||
|
||
/** | ||
* The number of elements in this chain, if it can be cheaply computed, -1 otherwise. | ||
* Cheaply usually means: Not requiring a collection traversal. | ||
*/ | ||
final def knownSize: Long = | ||
this match { | ||
case Chain.Empty => 0 | ||
case Chain.Singleton(_) => 1 | ||
case Chain.Wrap(seq) => seq.knownSize.toLong | ||
case _ => -1 | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,7 +32,7 @@ import Chain.{ | |
* O(1) `uncons`, such that walking the sequence via N successive `uncons` | ||
* steps takes O(N). | ||
*/ | ||
sealed abstract class Chain[+A] { | ||
sealed abstract class Chain[+A] extends ChainCompat[A] { | ||
|
||
/** | ||
* Returns the head and tail of this Chain if non empty, none otherwise. Amortized O(1). | ||
|
@@ -565,33 +565,19 @@ sealed abstract class Chain[+A] { | |
/** | ||
* Returns the number of elements in this structure | ||
*/ | ||
final def length: Long = { | ||
// TODO: consider optimizing for `Chain.Wrap` case. | ||
// Some underlying seq may not need enumerating all elements to calculate its size. | ||
val iter = iterator | ||
var i: Long = 0 | ||
while (iter.hasNext) { i += 1; iter.next(); } | ||
i | ||
} | ||
final def length: Long = | ||
this match { | ||
case Empty => 0 | ||
case Wrap(seq) => seq.length.toLong | ||
case Singleton(a) => 1 | ||
case Append(leftNE, rightNE) => leftNE.length + rightNE.length | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this can blow the stack. I think you need to use the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This case for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... on the second thought (and as an alternative to just iteration), we can consider implementing a separate (perhaps private) method based on
This method can handle all the length calculations in a stack-safe way. Not sure if it is really worth it, though... |
||
} | ||
|
||
/** | ||
/* | ||
* Alias for length | ||
*/ | ||
final def size: Long = length | ||
|
||
/** | ||
* The number of elements in this chain, if it can be cheaply computed, -1 otherwise. | ||
* Cheaply usually means: Not requiring a collection traversal. | ||
*/ | ||
final def knownSize: Long = | ||
// TODO: consider optimizing for `Chain.Wrap` case – call the underlying `knownSize` method. | ||
// Note that `knownSize` was introduced since Scala 2.13 only. | ||
this match { | ||
case _ if isEmpty => 0 | ||
case Chain.Singleton(_) => 1 | ||
case _ => -1 | ||
} | ||
|
||
/** | ||
* Compares the length of this chain to a test value. | ||
* | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I think what we really might want is:
That is stack safe and leverages associtivity (and we can override the foldMap in the Foldable instance).
With that, you can do:
def length: Long = foldMap(Function.const(1L))
Or you could write a custom loop based on the same idea and make it stack safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: using such
foldMap
for calculatinglength
will undermine the original efforts forWrap
case optimization.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...although the method
foldMap
itself is nice to have for sure.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I mean you can make 3 specialized methods:
combineAll, foldMap, length
and each of them can have their own code.If you just care about length right now, you can just add that one and change the
Wrap(seq)
line to beloop(tail, acc + seq.length.toLong)
and theSingleton(_)
line to beloop(tail, acc + 1L)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: is it going to be faster than a stack-safe implementation based on
Eval
? I knowEval
uses closures, but this code also does some additional allocations ofList
items all the way while recursing. Wdyt?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eval also allocates lists to manage a stack. I think this will be faster since it won't also allocate and call lambdas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option (what did I say about getting sucked down rabbit-holes...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't sweat it. Just copy the code. The inlined copied code will also be more efficient since there is no Long boxing.
We prefer a copy of the code if we can get efficiency wins in library code. We aren't trying to make the internals of cats as beautiful as possible. IMO, we want the API to be beautiful, then we want the library to be stack safe, then we want it to be fast, then we want it to be implemented in a beautiful fashion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking more about this, I am not sure we actually want to implement
foldMap
the way I suggested. Doing so would likely defeat the optimizations ofMonoid.combineAll
. I think the right way is justMonoid.combineAll(toIterator.map(fn))
which is free to use internal mutability on the monoid. If we implement the way I suggested above, we force the monoid to continue to concatenate (which could make things like string concatenation quadratic vs linear).