Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix rechunkRandomly with large ending chunk #3277

Conversation

domaspoliakas
Copy link
Contributor

Fixes #3276

Not sure if I put the test in the right place, let me know if you'd like to move it!

@@ -1042,4 +1042,13 @@ class StreamSuite extends Fs2Suite {
}
}

test("rechunkRandomlyWithSeed should correclty rechunk big chunks at the end of a stream") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we keep those tests here.

group("rechunkRandomlyWithSeed") {

Co-authored-by: Arman Bilge <armanbilge@gmail.com>
}

underlying.uncons.flatMap {
case None => Pull.done
Copy link

@filipwiech filipwiech Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea, since the new implementation is slightly more complex anyway, maybe we could avoid creation of the random object (val random = new scala.util.Random(seed)) in this empty case, since it's then not used? Or perhaps it's not worth it, not sure. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I'm not sure if it's worth it. It'd only have any effect if the random rechunking was called repeatedly on empty streams, which seems like an extraordinarily niche scenario.

Anyway, I'm happy to do it if told otherwise, but I'll save the CI electricity for now

.chunks
.compile
.toList
assert(chunks.forall(_.size < 500))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small comment, if I understand the documentation and the implementation of the rechunkRandomlyWithSeed correctly, then the minimum and maximum size factors are inclusive, so maybe the assertion should actually be:

Suggested change
assert(chunks.forall(_.size < 500))
assert(chunks.forall(_.size <= 500))

But I could be wrong about that. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it should be inclusive

@domaspoliakas
Copy link
Contributor Author

CI seems to have broken on some unrelated part of code

Copy link

@filipwiech filipwiech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

go(acc ++ hd, size, tl)
s.uncons.flatMap {
case None => Pull.output(acc)
case Some((hd, tl)) =>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: while reading the newest iteration of the code I had to do a double take, because there is now shading here (hd and tl used in the go function hide the variables with the same names from the higher scope of underlying.uncons.flatMap). Maybe it would be nice to avoid this for the sake of clarity. 🙂

@mpilquist mpilquist merged commit 26fd0aa into typelevel:main Aug 28, 2023
@domaspoliakas domaspoliakas deleted the fix-rechunkRandomly-with-large-ending-chunk branch August 29, 2023 07:06
@Jasper-M
Copy link
Contributor

#3195 also seems to have progressed after this PR. You no longer get 1 very large Chunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rechunkRandomlyWithSeed fails to rechunk in case of large chunks
5 participants