Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider renaming groupBy to chunkBy #901

Closed
lJoublanc opened this issue Jul 21, 2017 · 3 comments
Closed

consider renaming groupBy to chunkBy #901

lJoublanc opened this issue Jul 21, 2017 · 3 comments
Milestone

Comments

@lJoublanc
Copy link
Contributor

This suggestion comes from the fact that the signature of groupBy in other libraries is typically something along the lines of

class Collection[T] {
  def groupBy[S](key : T => S) : Collection[(S,Collection[T])]
}

Furthermore, the semantics of groupBy are to typically return each key only a single time, whereas this method returns the same key multiple times, potentially.

I'm thinking of stdlib collections and also RX.

I think the function is still useful, but would suggest changing the name to chunkBy (and possibly the return type from Vector to Chunk?), to make it clear the semantics are inherently different from the groupBy found in other libraries.

Although the behaviour of groupBy is not ubiquitous in the same way as map,filter, and flatMap are, this will probably cause confusion to fs2 converts, like me!

Also comment on gitter from original implementor :

Torsten Scholak @tscholak Jul 20 16:39
Hey, I did the initial implementation of groupBy in fs2, and I remember thinking about the semantics. It's just not feasible to assure hat each key is unique unless you are willing to buffer the whole stream.
If you can't do that, for instance, because the stream could grow beyond all bounds, or if you need to process on the fly without waiting for all elements, then that is not an option

@mpilquist
Copy link
Member

WDYT @tscholak?

@tscholak
Copy link
Contributor

tscholak commented Jul 23, 2017

I guess there are several issues here worth discussing:

  1. Can we actually get exactly-once semantics for groupBy?
  2. Should we rename the existing groupBy so that people are not thrown off by the at-least-once semantics for the keys?
  3. Can we just leave everything as is?

First, when I was implementing groupBy, I first thought that we could somehow have exactly-once semantics with the return type Stream[F, (K, Stream[F, V])]. Maybe it could work with topics, I thought. Maybe one could produce a stream of bounded topics, one for each unique key as they appear, and map subscriptions over them. I could not figure out how to do it, mostly due to problems with termination and error propagation. Maybe there is a way, but this function would have weird blocking behaviour.

Second, I am open to suggestions regarding a new name for the existing groupBy with at-least-once semantics. I don't think it should be chunkBy, though, because I would expect that such a function does not change the type at all (like bufferBy) -- it would just re-chunk the stream according to some user-provided pattern. groupChangesBy?

The third option is to make no changes at all. The current example for groupBy, I think, makes it pretty clear how the function behaves, cf. https://github.com/functional-streams-for-scala/fs2/blob/series/0.10/core/shared/src/main/scala/fs2/Stream.scala#L541. The type signature of groupBy also does not indicate any de-duplication of keys.

@mpilquist
Copy link
Member

What about the name indexBy or characterize?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants