Simplify Chunk #2181

mpilquist · 2020-12-11T03:19:45Z

Fixes the bug found in #2180, where Chunk.concat throws on Scala.js as a result of 1.isInstanceOf[Byte] returning true.

This PR makes a number of simplifications:

Chunk.concat now takes an implicit ClassTag and always instantiates an array via that class tag. If passing a ClassTag[Byte], you get a Chunk backed by a primitive byte array. No more trickery with KnownElementType and runtime type checks on chunk contents. As a result of this change, use sites of concat which are polymorphic in the element type need a ClassTag.
Chunk.Queue is now a subtype of Chunk and the toChunk method (which called concat) is gone. In its place, the compact method was added to Chunk, which returns a Chunk.ArraySlice. compact requires a ClassTag as well. Previously, chunks would be accumulated in to a Chunk.Queue and then copied to a Array[Any] via the final call to toChunk. Now, the Chunk.Queue is returned directly, avoiding the need to do any copying. As a result, pulls that used Chunk.Queue to accumulate and then Pull.output(acc.toChunk) must be modified to accumulate a Chunk[O] via ++ and then Pull.output(acc).
Chunk now has an O(1) ++ method, which uses Chunk.Queue underneath.
The various primitive array subtypes -- e.g., Chunk.Ints, Chunk.Bytes -- were all removed as they are now redundant with the single ArraySlice. One caveat: the primitive array chunks allowed unboxed indexed access via .at(idx), but this wasn't used in practice and folks instead should just access the underlying array directly.
The various NIO byte buffer chunks were removed as well, though ByteBuffer and CharBuffer were kept due to their frequent use in Java NIO APIs.

…ak-sonatype-0.19.2 Update sbt-spiewak-sonatype to 0.19.2

…drop KnownElementType, consolidate array types

mpilquist · 2020-12-12T15:03:27Z

Note, I tried adding a rope like balanced binary tree implementation of Chunk but it performed worse than Chunk.Queue:

  private case class Concat[+O](left: Chunk[O], right: Chunk[O]) extends Chunk[O] {
    def size: Int = left.size + right.size

    def apply(i: Int): O =
      if (i < left.size) left(i)
      else right(i - left.size)

    def copyToArray[O2 >: O](xs: Array[O2], start: Int): Unit = {
      left.copyToArray(xs, start)
      right.copyToArray(xs, start + left.size)
    }

    protected def splitAtChunk_(n: Int): (Chunk[O], Chunk[O]) =
      if (n < left.size) {
        val (pfx, sfx) = left.splitAtChunk_(n)
        (pfx, sfx ++ right)
      } else {
        val (pfx, sfx) = right.splitAtChunk_(n - left.size)
        (left ++ pfx, sfx)
      }

    override def ++[O2 >: O](that: Chunk[O2]): Chunk[O2] = 
      if (that.isEmpty) this
      else if (isEmpty) that
      else {
        @annotation.tailrec
        def go(node: Concat[O2], last: Chunk[O2]): Chunk[O2] = {
          val lastN = last.size
          if (lastN >= node.size || lastN * 2 <= node.right.size)
            Concat(node, last)
          else
            node.left match {
              case left: Concat[O2] => go(left, Concat(node.right, last))
              case _            => Concat(node, last)
            }
        } 
        go(this, that)
    }
  }

SystemFw · 2020-12-12T15:46:31Z

Do you mind expanding a bit more on how this avoids A: ClassTag bubbling up? Making ChunkQueue <: Chunk avoids needing to do that in most places, but won't you need to pick that somewhere, if there are no specialised versions anymore? Pretty sure I'm missing something simple :)

EDIT:

Nevermind, got it I think. ArraySlice works for primitives (for which there is a ClassTag), ChunkQueue works for the rest, boxes, and requires no class tags.

I wonder if that can result in boxing if a polymorphic combinator gets instantiated to a primitive, but I haven't fully thought it through

mpilquist · 2020-12-12T16:00:17Z

I'm going to update the main ticket description with a summary of the changes, especially as there's migration work needed for folks that were using Chunk.Queue or the specialized subtypes of Chunk.

Couple of main themes:

Chunk.concat now takes a ClassTag and always returns a chunk backed by a single array. No trickery needed to create that array now thanks to the ClassTag
Polymorphic call sites of Chunk.concat now use ++ on Chunk, which creates a Chunk.Queue, effectively deferring the concatenation until some point later. Chunk.Queue is now a proper implementation of Chunk so no need to flatten it when done accumulating (which was previously done by calling toChunk which called concat).
Key tradeoff is that if you have lots of tiny chunks backing a chunk, index based access is O(n), but this is a rare pattern in practice, and folks can always call .compact to flatten to a single array backed chunk.

SystemFw · 2020-12-12T16:06:04Z

Key tradeoff is that if you have lots of tiny chunks backing a chunk, index based access is O(n), but this is a rare pattern in practice, and folks can always call .compact to flatten to a single array backed chunk.

But this requires a ClassTag bound, right? (if you're being generic on the chunk contents, which would hopefully happen rarely when combined with the other constraints above)

mpilquist · 2020-12-12T16:11:20Z

Right, three choices:

Pick up a ClassTag to allocate arrays and avoid boxing individual elements
Defer concatenation by returning concatenated chunks
Box all elements via List/Vector/Buffer/Array[Any]

While (2) can cause performance issues when working with singleton chunks, doing so already causes performance problems in fs2 -- the interpreter overhead ends up as dominating factor.

SystemFw · 2020-12-12T16:13:12Z

I see, that makes sense. It would be good to get some of this in the scaladoc as well, once you're done coding it up. The current one still mentions individual specialised subtypes of Chunk

mpilquist · 2020-12-12T20:52:46Z

Docs and PR description updated. This one is ready for review.

… into topic/simplify-chunk

nikiforo · 2020-12-15T09:35:16Z

core/shared/src/main/scala/fs2/Chunk.scala

+  /** Returns the a chunk which consists of the elements of this chunk and the elements of
+    * the supplied chunk. This operation is amortized O(1).
+    */
+  def ++[O2 >: O](that: Chunk[O2]): Chunk[O2] =


nikiforo · 2020-12-15T17:29:47Z

core/shared/src/main/scala/fs2/Chunk.scala

+      go(chunks, i)
+    }
+
+    override def take(n: Int): Queue[O] =


An interesting fact

According to the results of this benchmark

package fs2 package benchmark import scala.util.Random import cats.effect.IO import cats.effect.unsafe.implicits.global import org.openjdk.jmh.annotations.{Benchmark, Param, Scope, Setup, State} @State(Scope.Thread) class ChunkQueueBenchmark { @Param(Array("10", "100", "10000")) var chunkSize: Int = _ var chunk: Chunk.Queue[Int] = _ @Setup def setup(): Unit = { val seq = Seq.fill(100_000/chunkSize)(Chunk.seq(Seq.fill(chunkSize)(Random.nextInt()))) chunk = Chunk.Queue(seq: _*) } @Benchmark def drop(): Unit = chunk.drop(chunkSize - 1) @Benchmark def take(): Unit = chunk.take(chunkSize - 1) }

drop doesn't degrade when increasing SIZE of chunks, while take does.

[info] ChunkQueueBenchmark.drop 10 thrpt 21521956,531 ops/s [info] ChunkQueueBenchmark.drop 100 thrpt 21206792,447 ops/s [info] ChunkQueueBenchmark.drop 10000 thrpt 19296012,270 ops/s [info] ChunkQueueBenchmark.take 10 thrpt 13121396,574 ops/s [info] ChunkQueueBenchmark.take 100 thrpt 2185148,679 ops/s [info] ChunkQueueBenchmark.take 10000 thrpt 21255,437 ops/s

How about chunk.drop(1)?

oh yes, I messed up the benchmark, I used chunkSize - 1 instead of 100_000 - 1 thanks! 😳

I previously thought, that appending to the end of immutable queue is linear. In take(n) elements are appended, thus I thought that it would be n^2 to the amount of chunks. However, I was wrong. TIL, that appending to both the end and the head of the immutable queue in scala takes O(1), as does head(pop first) and dequeue(pop last).

scala.collection.immutable.Queue doesn't get nearly enough credit :)

scala-steward and others added 13 commits December 7, 2020 20:59

Update sbt-spiewak-sonatype to 0.19.2

73d006e

Merge pull request typelevel#2175 from scala-steward/update/sbt-spiew…

d7b9d30

…ak-sonatype-0.19.2 Update sbt-spiewak-sonatype to 0.19.2

Add IArray support to Chunk

d841b31

Scalafmt

6b333ee

Scalafmt

25b77a4

Add compilation to IArray

4ccdcd3

Drop 3.0.0-M1 support

35560e1

Make Chunk.Queue extend Chunk, make Chunk.concat require a ClassTag, …

c7d433b

…drop KnownElementType, consolidate array types

Remove most of the nio buffer chunks as they were never used

f9d55f3

Scalafmt

7644431

Fix 3 build

fbded70

Fix io project

d0e0290

Fix benchmark project

d65d36a

Add Chunk.++

bafe7c1

Fix test failures

4fc33ea

Scaladoc

db71f88

mpilquist added 3 commits December 12, 2020 22:17

Add IArray support

ff788d2

Fix 2.12 compilation

1e0ad96

Fix 2.13 compilation

01c6c37

mpilquist mentioned this pull request Dec 13, 2020

Add IArray support to Chunk #2180

Closed

mpilquist marked this pull request as ready for review December 13, 2020 14:34

mpilquist added 2 commits December 14, 2020 15:06

Merge branch 'develop' into topic/simplify-chunk

430f961

Merge branch 'topic/simplify-chunk' of https://github.com/mpilquist/fs2…

00c9d69

… into topic/simplify-chunk

nikiforo reviewed Dec 15, 2020

View reviewed changes

Fix typo in ScalaDoc

fa7894b

nikiforo reviewed Dec 15, 2020

View reviewed changes

mpilquist merged commit 8ee6c87 into typelevel:develop Dec 16, 2020

samspills mentioned this pull request Jul 22, 2021

provide a chunkDecoder pipe circe/circe-fs2#273

Closed

mpilquist mentioned this pull request Sep 23, 2021

Remove unnecessary indexing in Chunk combinators #2630

Merged

mpilquist deleted the topic/simplify-chunk branch February 18, 2024 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify Chunk #2181

Simplify Chunk #2181

mpilquist commented Dec 11, 2020 •

edited

Loading

mpilquist commented Dec 12, 2020

SystemFw commented Dec 12, 2020 •

edited

Loading

mpilquist commented Dec 12, 2020

SystemFw commented Dec 12, 2020

mpilquist commented Dec 12, 2020

SystemFw commented Dec 12, 2020

mpilquist commented Dec 12, 2020

nikiforo Dec 15, 2020

nikiforo Dec 15, 2020

mpilquist Dec 15, 2020

nikiforo Dec 15, 2020

nikiforo Dec 15, 2020 •

edited

Loading

mpilquist Dec 15, 2020

Simplify Chunk #2181

Simplify Chunk #2181

Conversation

mpilquist commented Dec 11, 2020 • edited Loading

mpilquist commented Dec 12, 2020

SystemFw commented Dec 12, 2020 • edited Loading

mpilquist commented Dec 12, 2020

SystemFw commented Dec 12, 2020

mpilquist commented Dec 12, 2020

SystemFw commented Dec 12, 2020

mpilquist commented Dec 12, 2020

nikiforo Dec 15, 2020

Choose a reason for hiding this comment

nikiforo Dec 15, 2020

Choose a reason for hiding this comment

mpilquist Dec 15, 2020

Choose a reason for hiding this comment

nikiforo Dec 15, 2020

Choose a reason for hiding this comment

nikiforo Dec 15, 2020 • edited Loading

Choose a reason for hiding this comment

mpilquist Dec 15, 2020

Choose a reason for hiding this comment

mpilquist commented Dec 11, 2020 •

edited

Loading

SystemFw commented Dec 12, 2020 •

edited

Loading

nikiforo Dec 15, 2020 •

edited

Loading