Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scala 2.13 woes: Parallelism silently lost #101

Closed
swsnr opened this issue May 8, 2020 · 8 comments
Closed

Scala 2.13 woes: Parallelism silently lost #101

swsnr opened this issue May 8, 2020 · 8 comments

Comments

@swsnr
Copy link

swsnr commented May 8, 2020

Hello,

the following code compiles in Scala 2.12 and 2.13 but silently looses parallelism in 2.13:

val items = if (processInParallelEnabled) {
  someSetOfThings.toArray.par
} else {
  someSetOfThings
}
items.map(expensiveFunction).foldLeft(…)

In Scala 2.12 items is of type GenIterable[T] but in 2.13 it's IterableOnce[T] which does not define .map. Instead .map now comes from IterableOnceExtensionMethods invokes .iterator.map on anything that's not a Scala 2.13 Iterable[T].

.iterator.map however is a sequential interface (by design), so .map becomes sequential no matter whether a parallel collection was constructed.

This kind of silent change in behaviour can silently cause some serious performance regressions (luckily we had a person with a lot of foresight at our group who wrote a test for parallel execution of a certain piece of code which caught this regression).

I am not so familiar with Scala collections as to suggest a solution but I'd like to make a few remarks:

  1. I'd rather prefer if the code didn't compile on 2.13 instead of silently changing behaviour 🙂
  2. I believe that we should be able to use sequential and parallel collection under a shared interface, to transparently switch between both worlds.
  3. As far as I can see .iterator is sequential by design; this makes IterableOnce a rather dangerous interface for parallel collections to inherit because it allows to accidentally turn a parallel collection into a sequential one.

I guess parallel collections should implement some base trait of Scala 2.13 collections;,if that's not possible I think I'd prefer if it did implement no interface at all, not even IterableOnce, and lived entirely in a world on its own; then it's at least not possible to misuse parallel collections.

Cheers

@SethTisue
Copy link
Member

reminiscent of scala/bug#11987 (there, ordering is lost when using the supertype; here, parallelism is lost instead)

@swsnr
Copy link
Author

swsnr commented May 8, 2020

This one only affects Scala 2.13, whereas the issue you linked already affects 2.12, though.

@julienrf
Copy link
Contributor

the following code compiles in Scala 2.12 and 2.13 but silently looses parallelism in 2.13:

I wouldn’t say “silently” since you get a warning telling you that you should not call map directly on an IterableOnce: https://scastie.scala-lang.org/SxXClEMbQum1s9oKR6d9lQ

However, I agree that this is an issue. Actually, I don’t remember why we decided to have a common super type (IterableOnce) between parallel collections and sequential collections. I believe we should remove the IterableOnce parent trait from the parallel collections, since this type is inherently sequential.

@swsnr
Copy link
Author

swsnr commented May 11, 2020

@julienrf I've seen this warning but all things considered I'll definitely say that there's a silent and serious change in behaviour 🙂 The warning fails to highlight the consequences, recommends the wrong thing, and applying scalafix would probably make the wrong choice silently.

@SethTisue
Copy link
Member

I believe we should remove the IterableOnce parent trait from the parallel collections

We were planning on going 1.0.0 soon, so if that were to happen, now would be the time.

I don’t remember why we decided to have a common super type (IterableOnce) between parallel collections and sequential collections

the discussions were here:

several people (including Rex and Seb) advocated no-common-supertype, but Julien, Martin, and Stefan all lined up on the "IterableOnce is okay" side

@SethTisue
Copy link
Member

SethTisue commented May 12, 2020

Note that if we decide just to improve the documentation on what we have, the logical place to do it would be https://docs.scala-lang.org/overviews/core/collections-migration-213.html

@SethTisue
Copy link
Member

SethTisue commented May 20, 2020

To explore the possibility of making the parallel collections no longer extend IterableOnce, I tried it over at #103. It took me about half a day. Most tests already pass. The remaining failures don't look like fundamental difficulties to me, offhand anyway.

I had to add overloads in a few places, other than that it was mainly a matter of adding a bunch of calls to .seq.

I'm left ambivalent about this whole issue. the IterableOnce parentage is not a straight-up mistake; it's a design decision that several of the major people behind the collections redesign were in favor of, and it's pretty late to be second-guessing it now.

I don't intend to pursue this further, personally.

I would add that the question of whether this could still be changed has to be considered in the context of the overall maintenance status of this repo. The work of splitting this repo off from the main Scala repo was done by the core Scala team (namely by Stefan), but as with some of the previously spun-off modules like scala-xml and scala-parser-combinators, we can't really afford to remain heavily involved with driving this repo forward, beyond the usual light maintenance, version bumps, anointing of new maintainers, and such.

So, hypothetically, if one or more volunteer maintainers were to adopt this repo and really take charge of driving it forward (as has happened in other module repos: scala-xml, scala-parser-combinators, and Scala-swing all come to mind), and then if those maintainers really wanted to own this parentage change and push it through in a new major version, with appropriate care taken on documentation and migration, then I wouldn't stand in the way. But this is all hypothetical.

@swsnr
Copy link
Author

swsnr commented May 20, 2020

@SethTisue Thank you.

As far as I'm concerned we can close this issue; I see that it's not easy to change, and we're not so invested to parallel collections (we only use these in a few places for quick wins) as to push this through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants