consider renaming `groupBy` to `chunkBy` #901

lJoublanc · 2017-07-21T07:24:32Z

This suggestion comes from the fact that the signature of groupBy in other libraries is typically something along the lines of

class Collection[T] {
  def groupBy[S](key : T => S) : Collection[(S,Collection[T])]
}

Furthermore, the semantics of groupBy are to typically return each key only a single time, whereas this method returns the same key multiple times, potentially.

I'm thinking of stdlib collections and also RX.

I think the function is still useful, but would suggest changing the name to chunkBy (and possibly the return type from Vector to Chunk?), to make it clear the semantics are inherently different from the groupBy found in other libraries.

Although the behaviour of groupBy is not ubiquitous in the same way as map,filter, and flatMap are, this will probably cause confusion to fs2 converts, like me!

Also comment on gitter from original implementor :

Torsten Scholak @tscholak Jul 20 16:39
Hey, I did the initial implementation of groupBy in fs2, and I remember thinking about the semantics. It's just not feasible to assure hat each key is unique unless you are willing to buffer the whole stream.
If you can't do that, for instance, because the stream could grow beyond all bounds, or if you need to process on the fly without waiting for all elements, then that is not an option

The text was updated successfully, but these errors were encountered:

mpilquist · 2017-07-23T13:23:30Z

WDYT @tscholak?

tscholak · 2017-07-23T14:56:06Z

I guess there are several issues here worth discussing:

Can we actually get exactly-once semantics for groupBy?
Should we rename the existing groupBy so that people are not thrown off by the at-least-once semantics for the keys?
Can we just leave everything as is?

First, when I was implementing groupBy, I first thought that we could somehow have exactly-once semantics with the return type Stream[F, (K, Stream[F, V])]. Maybe it could work with topics, I thought. Maybe one could produce a stream of bounded topics, one for each unique key as they appear, and map subscriptions over them. I could not figure out how to do it, mostly due to problems with termination and error propagation. Maybe there is a way, but this function would have weird blocking behaviour.

Second, I am open to suggestions regarding a new name for the existing groupBy with at-least-once semantics. I don't think it should be chunkBy, though, because I would expect that such a function does not change the type at all (like bufferBy) -- it would just re-chunk the stream according to some user-provided pattern. groupChangesBy?

The third option is to make no changes at all. The current example for groupBy, I think, makes it pretty clear how the function behaves, cf. https://github.com/functional-streams-for-scala/fs2/blob/series/0.10/core/shared/src/main/scala/fs2/Stream.scala#L541. The type signature of groupBy also does not indicate any de-duplication of keys.

mpilquist · 2017-08-18T22:05:10Z

What about the name indexBy or characterize?

mpilquist added this to the 0.10 milestone Aug 22, 2017

mpilquist mentioned this issue Nov 27, 2017

Renamed groupBy to groupAdjacentBy and changed its return type to provide a Segment instead of a Vector #1004

Merged

pchlupacek closed this as completed in #1004 Nov 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider renaming `groupBy` to `chunkBy` #901

consider renaming `groupBy` to `chunkBy` #901

lJoublanc commented Jul 21, 2017

mpilquist commented Jul 23, 2017

tscholak commented Jul 23, 2017 •

edited

Loading

mpilquist commented Aug 18, 2017

consider renaming groupBy to chunkBy #901

consider renaming groupBy to chunkBy #901

Comments

lJoublanc commented Jul 21, 2017

mpilquist commented Jul 23, 2017

tscholak commented Jul 23, 2017 • edited Loading

mpilquist commented Aug 18, 2017

consider renaming `groupBy` to `chunkBy` #901

consider renaming `groupBy` to `chunkBy` #901

tscholak commented Jul 23, 2017 •

edited

Loading