-
Notifications
You must be signed in to change notification settings - Fork 72
Conversation
d57cbf3
to
c49ed4e
Compare
@@ -166,12 +184,19 @@ trait IterableMonoTransforms[+A, +Repr] extends Any { | |||
def drop(n: Int): Repr = fromIterableWithSameElemType(View.Drop(coll, n)) | |||
|
|||
/** The rest of the collection without its first element. */ | |||
def tail: Repr = drop(1) | |||
def tail: Repr = if (coll.isEmpty) ??? else drop(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I think tail
should not be defined on empty collections, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precedent is to throw an UnsupportedOperationException
, see List
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should go so far as to eliminate non-total methods, given their usefulness. Apart from ::
I don't think we have any type that is statically guaranteed to be non-empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szeiger I’m not eliminating tail
, I’m fixing its implementation. It is not supposed to terminate when called on an empty list.
Concerning the non-emptiness, we also have +:
and :+
, and we could overload the apply
factory method to return a non-empty collection if called with at least one element (as I did in #9).
I just added a commit that solves the stackoverflow that happened at compile-time. |
89dc364
to
9d0d618
Compare
@@ -166,12 +184,19 @@ trait IterableMonoTransforms[+A, +Repr] extends Any { | |||
def drop(n: Int): Repr = fromIterableWithSameElemType(View.Drop(coll, n)) | |||
|
|||
/** The rest of the collection without its first element. */ | |||
def tail: Repr = drop(1) | |||
def tail: Repr = if (coll.isEmpty) ??? else drop(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should go so far as to eliminate non-total methods, given their usefulness. Apart from ::
I don't think we have any type that is statically guaranteed to be non-empty.
* | ||
* Supports efficient indexed access and has a small memory footprint. | ||
*/ | ||
class Array[+A] private (private val elements: scala.Array[AnyRef]) extends IndexedSeq[A] with SeqLike[A, Array] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a public constructor or companion apply
to make it really useful. Wrapping an existing array without copying is important for performance. Wrapping with copying should also be supported (and be the safe default, of course).
To enable wrapping and copying, you need to take primitive arrays into account as well (like the array-like types in my specialization PR do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a companion’s apply
method (inherited from IterableFactory
) which creates an ArrayBuffer
and then copies its contents into a fresh new Array
.
You think we should also add a public constructor aliasing an Array
passed as parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's unsafe but too useful not to have. It doesn't have to be a constructor or factory methods. Depending on the outcome of the mutable vs immutable discussion, it could naturally become Array(...).readOnly
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I agree.
@@ -34,6 +35,10 @@ class LazyList[+A](expr: => LazyList.Evaluated[A]) | |||
case Some((hd, tl)) => s"$hd #:: $tl" | |||
} | |||
else "LazyList(?)" | |||
|
|||
// HACK Needed because we inherit from two implementations (one from LinearSeq and one from SeqLikeFromIterable) | |||
override def drop(n: Int): LazyList[A] = super.drop(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should LinearSeq
have implementations at all or only be a marker trait with a subtype of SeqLikeFromIterable
providing the implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d be happy to see if it’s possible to remove any implementations from LinearSeq
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added a commit that solves the issue. The solution is to not automatically implement drop
in terms of head
and tail
but provide this implementation as a LinearSeqLikeOpt
trait that can optionally be mixed into a concrete collection implementation. (Note that the problem still exists for List
because we want to inherit from both LinearSeqLikeOpt
– to get the optimized drop
implementation — and SeqLikeFromIterable
— to get the implementations of all the other methods)
ea20833
to
4e7a59b
Compare
Shouldn't this backed by an unboxed scala.Array[T], instead of an From my research in http://www.lihaoyi.com/post/BenchmarkingScalaCollections.html, using an I think an immutable array won't be generally useful as a compact data-store unless we ensure that it's backed by an unboxed Array. That may require us to change the API, and take classtags during construction and during transformations like FWIW in my own code I'm currently using my own custom unboxed "immutable" array (which is a read-only view over a growable, unboxed, append-only array buffer): sealed abstract class Agg[+T] extends (Int => T) with TraversableOnce[T]{
def apply(i: Int): T
def length: Int
def toArray[B >: T: ClassTag]: Array[B]
def zipWithIndex: Agg[(T, Int)]
def withFilter(f: T => Boolean): Agg[T]
def filter(f: T => Boolean): Agg[T]
def map[V: ClassTag](f: T => V): Agg[V]
def flatMap[V: ClassTag](f: T => TraversableOnce[V]): Agg[V]
def last: T
def lastOption: Option[T]
def head: T
def contains[B >: T](t: B): Boolean
def headOption: Option[T]
def indices: Range
def indexOf[B >: T](t: B): Int
def groupBy[V](key: T => V): Map[V, Agg[T]]
}
/**
* A custom growable-array of unboxed `T`s.
*/
class Aggregator[T: ClassTag](initialSize: Int = 4) extends Agg[T] {
private[this] var data = new Array[T](initialSize)
private[this] var length0 = 0
def length = length0
def apply(i: Int) = data(i)
def append(i: T) = {
if (length >= data.length) {
// Grow by 3/2 + 1 each time, same as java.util.ArrayList. Saves a bit
// of memory over doubling each time, at the cost of more frequent
// doublings,
val newData = new Array[T](data.length * 3 / 2 + 1)
System.arraycopy(data, 0, newData, 0, length)
data = newData
}
data(length) = i
length0 += 1
}
...
} This really did help reduce memory usage in my current project by several times (I'm holding a lot of large primitive caches and indices in memory...), so this isn't just a theoretical concern. |
This new collection should come with benchmarks and usage recommendations. When should someone pick it over |
Yes, that’s the plan! I will follow the design proposed in #2. |
@julienrf I'd propose to try(and benchmark) out a version that uses an underlying |
@lihaoyi Yes, I already have the machinery necessary for that in https://github.com/scala/collection-strawman/blob/specialization/src/main/scala/strawman/collection/CollectionStrawman6.scala#L469. We should use it here, too. I'll rebase and polish up my specialization branch so we can merge it. Slick also uses its own array-based collections for the AST: https://github.com/slick/slick/blob/master/slick/src/main/scala/slick/util/ConstArray.scala. AFAIR this resulted in a 30% performance improvement for the whole query compiler compared to Scala collections. Switching between different Scala collections (only Vector vs only ArrayBuffer vs whatever was most convenient in a specific case) made almost no difference. |
Even if the operations can be specialized, you have boxing and inefficient
layout to begin with.
…On Jan 9, 2017 12:45, "Dmitry Petrashko" ***@***.***> wrote:
How does the performance of operations compare to Vector and Array? How
does it perform on primitive types?
@julienrf <https://github.com/julienrf> I'd propose to try(and benchmark)
out a version that uses an underlying Array[Any]. The main reason is that
HotSpot can unroll loops that iterate on an array and speculate on the
first element of the array and perfrom specialized operation on the
remaining elements.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADT9k8Qqz-JxU6bypIX3IFiufjZsRabks5rQh3SgaJpZM4LTFJ3>
.
|
@szeiger if you use Array[Any] the underlying array can be unboxed and stored efficiently, but every access will need to go through |
Interesting. I thought Array[Any] would erase to Array[Object].
…On Jan 12, 2017 14:28, "Dmitry Petrashko" ***@***.***> wrote:
@szeiger <https://github.com/szeiger> if you use Array[Any] the
underlying array can be unboxed and stored efficiently, but every access
will need to go through ScalaRuntime.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADT9oq_fcvHS9G3VVcyfZupFtRmKyY6ks5rRiqMgaJpZM4LTFJ3>
.
|
@szeiger, no, it erazes to simply Object, and you'll need methods like https://github.com/lampepfl/dotty/blob/master/library/src/scala/runtime/ScalaRunTime.scala#L82 to access elements inside it. |
@szeiger, same happens with Array[T] where T isn't bounded by AnyRef. |
Not sure if I'm missing something to but Maybe you need some specialization cleverness to avoid boxing on-access but having an efficient at-rest data structure is trivial, and important. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think immutable.Array should indeed be polymorphic. I would actually have a tendency to make it a value class.
There's also the issue about how to organize the ...Like traits, i.e. whether to split them into completely abstract traits and fromIterable traits. In my view these two topics should be treated separately.
with IterableMonoTransforms[A, C[A @uncheckedVariance]] // sound bcs of VarianceNote | ||
with IterablePolyTransforms[A, C] { | ||
with IterablePolyTransforms[A, C] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to split IterableLike and IterableLikeFromIterable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IterableLike
defines abstract methods with no implementation.
IterableLikeFromIterable
implements these methods based on a fromIterable(it: Iterable[A]): C[A]
method.
In the case of arrays, it didn’t make sense to inherit from operations implemented in terms of fromIterable
(most of them are implemented in terms of Array.tabulate
).
Does that make sense to you?
@DarkDimius I changed the implementation to use an |
I got preliminary answers to these questions. I ran the benchmarks (see #19) and compared You can find the comparison of execution time of various operations here. Unsurprisingly, we see that You can find the comparison of memory footprint here. Slightly more compact than |
@julienrf presumably your tail performance probably would be better if tail used arrayCopy? Im curious what the goal is here - if this is supposed to be a new general purpose immutable Seq as in List/Vector I have a number of thoughts (this being the internet and all) from my years working on a large scala codebase that I would be happy to flesh out. |
FWIW I'd actually prefer an immutable "dumb" flat-array as the "standard" data-type v.s. the current The only thing that
Furthermore, for the things where
In general, I think that To put it another way, I don't think people should lightly take the 10x slowdown in common operations or the 3-4x increase in memory usage for small collections unless they really need both item-by-item construction and indexed lookup. And I think that's a sufficiently niche use case that pushing people towards an |
One problem I see is naming: Scala collections that are imported by default are immutable ( |
@lihaoyi I definitely agree - the 200 byte size (56 bytes for the My dream for a new immutable Seq is array-backed (With unboxed primitive via specialization or otherwise), with start/end indexes + a dirty flag to allow for fast Making the wrapped array "smarter" could probably be squeezed in 8 bytes (so 16 -> 24), which is an important target because otherwise you are 32 bytes (in x86-64 with CompressedOops land) |
On a somewhat related note, |
https://slides.com/mukel/delaying-arrays/ (in order to correctly iterate through slides press space instead of left) |
The big thing about this array was that instead of counting complexity as standard algorithmic complexity, I wanted to optimize memory locality. |
The biggest drawback of the DelayingArrays is that it may indeed consume twice the memory needed to efficiently store the data. |
Yes, good point. And I could do the same for
First, I wanted to compare the time and space characteristics of an immutable collection backed by an array. I think its characteristics make it worth introducing as a concrete immutable collection type, for use cases where |
https://github.com/DarkDimius/DelayingArrays/blob/rewrite/src/main/scala/me/d_d/delaying/DelayingArray.scala - here is a source for a collection that was presented in those reports and presentation. https://github.com/DarkDimius/DelayingArrays/blob/rewrite/src/main/scala/me/d_d/delaying/DArray.scala - here is a more complex version which I played with where I was trying to squeeze even more performance but it wasn't worth it) |
Array is not the only non-immutable collection in the default namespace though. I'm not convinced by the name Regarding the performance, I agree that |
We will need to be cautious changing However, there isn't much problem cutting out the 32-wide trie of Vector and replacing it with anything that has superior characteristic. Vector does nothing dreadfully, and it actually does higher-order operations pretty well. So despite its immensity and shocking constant factors, it's a really good go-to collection for "I don't care if it's fast, I just want it to work, and O(n^2) doesn't count as 'working'". Any other collection which is at least vaguely similar in speed and memory usage is a good alternate candidate. We should try out delaying arrays in some other common usage scenarios like concurrent access. The benchmarks in the paper look good! |
One thing of value in a "dumb" While this may seem trivial, I would argue that it isn't. In the last 5 years, the fact that Thus I think the simplicity and ease-of-understandability of a "dumb" |
@lihaoyi - I completely agree. That we may be able to improve Vector doesn't, I think, have much bearing on whether we want an immutable Array. It's really simple and really fast for what it's really fast at. These are nice qualities to have, and if it's specialized to be even faster that's even better. |
I also think we are talking about two different things here
The two do not necessarily have the same best solutions. #6 is clearly about the first issue, not the second (for some time, I misunderstood this point). I therefore think we should not use the name In any case, this discussion is not critical to reach the the go/no go decision on the strawman. We can do this later once we flesh out the supported implementation types. |
d1264ac
to
1fcc234
Compare
Superseeded by #31. |
I introduce immutable arrays, a concrete collection that uses an
Array
as internal representation. It supports fast indexed access and has a compact in-memory representation.The implementations of the collections operations (
filter
,map
, etc.) were originally provided by theIterableLike
trait, and these implementations were all based on iterators, and thus were not appropriate for working with arrays (where, most of the time we know a priori the size of the resulting array).Instead of inheriting from these “default” implementations and then overriding them, I decided to move the “default” implementations of collection operations into separate traits. As a consequence,
IterableMonoTransforms
andIterablePolyTransforms
are now completely abstract.Most of the operations on immutable arrays are based on
tabulate
.