Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Chunk #2181

Merged
merged 22 commits into from
Dec 16, 2020
Merged

Simplify Chunk #2181

merged 22 commits into from
Dec 16, 2020

Conversation

mpilquist
Copy link
Member

@mpilquist mpilquist commented Dec 11, 2020

Fixes the bug found in #2180, where Chunk.concat throws on Scala.js as a result of 1.isInstanceOf[Byte] returning true.

This PR makes a number of simplifications:

  • Chunk.concat now takes an implicit ClassTag and always instantiates an array via that class tag. If passing a ClassTag[Byte], you get a Chunk backed by a primitive byte array. No more trickery with KnownElementType and runtime type checks on chunk contents. As a result of this change, use sites of concat which are polymorphic in the element type need a ClassTag.
  • Chunk.Queue is now a subtype of Chunk and the toChunk method (which called concat) is gone. In its place, the compact method was added to Chunk, which returns a Chunk.ArraySlice. compact requires a ClassTag as well. Previously, chunks would be accumulated in to a Chunk.Queue and then copied to a Array[Any] via the final call to toChunk. Now, the Chunk.Queue is returned directly, avoiding the need to do any copying. As a result, pulls that used Chunk.Queue to accumulate and then Pull.output(acc.toChunk) must be modified to accumulate a Chunk[O] via ++ and then Pull.output(acc).
  • Chunk now has an O(1) ++ method, which uses Chunk.Queue underneath.
  • The various primitive array subtypes -- e.g., Chunk.Ints, Chunk.Bytes -- were all removed as they are now redundant with the single ArraySlice. One caveat: the primitive array chunks allowed unboxed indexed access via .at(idx), but this wasn't used in practice and folks instead should just access the underlying array directly.
  • The various NIO byte buffer chunks were removed as well, though ByteBuffer and CharBuffer were kept due to their frequent use in Java NIO APIs.

@mpilquist
Copy link
Member Author

Note, I tried adding a rope like balanced binary tree implementation of Chunk but it performed worse than Chunk.Queue:

  private case class Concat[+O](left: Chunk[O], right: Chunk[O]) extends Chunk[O] {
    def size: Int = left.size + right.size

    def apply(i: Int): O =
      if (i < left.size) left(i)
      else right(i - left.size)

    def copyToArray[O2 >: O](xs: Array[O2], start: Int): Unit = {
      left.copyToArray(xs, start)
      right.copyToArray(xs, start + left.size)
    }

    protected def splitAtChunk_(n: Int): (Chunk[O], Chunk[O]) =
      if (n < left.size) {
        val (pfx, sfx) = left.splitAtChunk_(n)
        (pfx, sfx ++ right)
      } else {
        val (pfx, sfx) = right.splitAtChunk_(n - left.size)
        (left ++ pfx, sfx)
      }

    override def ++[O2 >: O](that: Chunk[O2]): Chunk[O2] = 
      if (that.isEmpty) this
      else if (isEmpty) that
      else {
        @annotation.tailrec
        def go(node: Concat[O2], last: Chunk[O2]): Chunk[O2] = {
          val lastN = last.size
          if (lastN >= node.size || lastN * 2 <= node.right.size)
            Concat(node, last)
          else
            node.left match {
              case left: Concat[O2] => go(left, Concat(node.right, last))
              case _            => Concat(node, last)
            }
        } 
        go(this, that)
    }
  }

@SystemFw
Copy link
Collaborator

SystemFw commented Dec 12, 2020

Do you mind expanding a bit more on how this avoids A: ClassTag bubbling up? Making ChunkQueue <: Chunk avoids needing to do that in most places, but won't you need to pick that somewhere, if there are no specialised versions anymore? Pretty sure I'm missing something simple :)

EDIT:

Nevermind, got it I think. ArraySlice works for primitives (for which there is a ClassTag), ChunkQueue works for the rest, boxes, and requires no class tags.

I wonder if that can result in boxing if a polymorphic combinator gets instantiated to a primitive, but I haven't fully thought it through

@mpilquist
Copy link
Member Author

I'm going to update the main ticket description with a summary of the changes, especially as there's migration work needed for folks that were using Chunk.Queue or the specialized subtypes of Chunk.

Couple of main themes:

  • Chunk.concat now takes a ClassTag and always returns a chunk backed by a single array. No trickery needed to create that array now thanks to the ClassTag
  • Polymorphic call sites of Chunk.concat now use ++ on Chunk, which creates a Chunk.Queue, effectively deferring the concatenation until some point later. Chunk.Queue is now a proper implementation of Chunk so no need to flatten it when done accumulating (which was previously done by calling toChunk which called concat).
  • Key tradeoff is that if you have lots of tiny chunks backing a chunk, index based access is O(n), but this is a rare pattern in practice, and folks can always call .compact to flatten to a single array backed chunk.

@SystemFw
Copy link
Collaborator

Key tradeoff is that if you have lots of tiny chunks backing a chunk, index based access is O(n), but this is a rare pattern in practice, and folks can always call .compact to flatten to a single array backed chunk.

But this requires a ClassTag bound, right? (if you're being generic on the chunk contents, which would hopefully happen rarely when combined with the other constraints above)

@mpilquist
Copy link
Member Author

Right, three choices:

  1. Pick up a ClassTag to allocate arrays and avoid boxing individual elements
  2. Defer concatenation by returning concatenated chunks
  3. Box all elements via List/Vector/Buffer/Array[Any]

While (2) can cause performance issues when working with singleton chunks, doing so already causes performance problems in fs2 -- the interpreter overhead ends up as dominating factor.

@SystemFw
Copy link
Collaborator

I see, that makes sense. It would be good to get some of this in the scaladoc as well, once you're done coding it up. The current one still mentions individual specialised subtypes of Chunk

@mpilquist
Copy link
Member Author

Docs and PR description updated. This one is ready for review.

@mpilquist mpilquist marked this pull request as ready for review December 13, 2020 14:34
/** Returns the a chunk which consists of the elements of this chunk and the elements of
* the supplied chunk. This operation is amortized O(1).
*/
def ++[O2 >: O](that: Chunk[O2]): Chunk[O2] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

go(chunks, i)
}

override def take(n: Int): Queue[O] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting fact

According to the results of this benchmark


package fs2
package benchmark

import scala.util.Random
import cats.effect.IO
import cats.effect.unsafe.implicits.global
import org.openjdk.jmh.annotations.{Benchmark, Param, Scope, Setup, State}

@State(Scope.Thread)
class ChunkQueueBenchmark {
  @Param(Array("10", "100", "10000"))
  var chunkSize: Int = _

  var chunk: Chunk.Queue[Int] = _
  @Setup
  def setup(): Unit = {
    val seq = Seq.fill(100_000/chunkSize)(Chunk.seq(Seq.fill(chunkSize)(Random.nextInt())))
    chunk = Chunk.Queue(seq: _*)
  }

  @Benchmark
  def drop(): Unit =
    chunk.drop(chunkSize - 1)

  @Benchmark
  def take(): Unit =
    chunk.take(chunkSize - 1)
}

drop doesn't degrade when increasing SIZE of chunks, while take does.

[info] ChunkQueueBenchmark.drop           10  thrpt       21521956,531          ops/s
[info] ChunkQueueBenchmark.drop          100  thrpt       21206792,447          ops/s
[info] ChunkQueueBenchmark.drop        10000  thrpt       19296012,270          ops/s

[info] ChunkQueueBenchmark.take           10  thrpt       13121396,574          ops/s
[info] ChunkQueueBenchmark.take          100  thrpt        2185148,679          ops/s
[info] ChunkQueueBenchmark.take        10000  thrpt          21255,437          ops/s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about chunk.drop(1)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, I messed up the benchmark, I used chunkSize - 1 instead of 100_000 - 1 thanks! 😳

Copy link
Contributor

@nikiforo nikiforo Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously thought, that appending to the end of immutable queue is linear. In take(n) elements are appended, thus I thought that it would be n^2 to the amount of chunks. However, I was wrong. TIL, that appending to both the end and the head of the immutable queue in scala takes O(1), as does head(pop first) and dequeue(pop last).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala.collection.immutable.Queue doesn't get nearly enough credit :)

@mpilquist mpilquist merged commit 8ee6c87 into typelevel:develop Dec 16, 2020
@mpilquist mpilquist deleted the topic/simplify-chunk branch February 18, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants