Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory efficiency for used names collection and other Relations building #764

Merged
merged 3 commits into from
Mar 28, 2021

Conversation

retronym
Copy link
Member

@retronym retronym commented Apr 28, 2020

  • Build the maps in the used name relation in a mutable map,
    then use a HashMap builder in a second pass to build the
    resulting maps. In Scala 2.12.12+ or 2.13.0+, use of the
    builder is fastest as the builder mutates the structure
    internally.
  • Intern the keys and values, so we don't have a separate
    instance of UsedName("toString", EnumSet.of(UseScope.Default))
    for each call site.
  • Factor this into RelationBuilder, and use this in
    ProtobufReaders where I recently made similar opts.

@retronym retronym requested a review from jvican April 28, 2020 05:12
def toImmutable[K, V](
map: java.util.HashMap[K, (K, mutable.Builder[V, immutable.HashSet[V]])]
): immutable.HashMap[K, immutable.HashSet[V]] = {
val builder = immutable.HashMap.newBuilder[K, immutable.HashSet[V]]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala/scala#8726 is the 2.12.x backport of efficient immutable.HashMap$Builder (thanks, @mkeskells!). With that, or with 2.13, this step is step is quite efficient.

I considered an alternative of just doing an unsafeToImmutableMap wrapper, but that is a somewhat more fragile if the users of the built Relation start to call map operations on it.

It would be great if we had a richer immutable.HashMap.Builder that supported key getOrElseUpdate on the map being built, as well as a find transformValues step. Then, we could build up the desired structure from the begining and do a second pass that just mutates the data in the leaves.

@retronym retronym requested a review from dwijnand April 28, 2020 05:29
@retronym retronym force-pushed the topic/used-names-memory branch from 21f554c to 130309b Compare April 28, 2020 05:57
@retronym

This comment has been minimized.

@retronym retronym changed the title Memory efficiency for used names collection Memory efficiency for used names collection and other relations Apr 28, 2020
@retronym retronym changed the title Memory efficiency for used names collection and other relations Memory efficiency for used names collection and other Relations building Apr 28, 2020
@jvican

This comment has been minimized.

@retronym

This comment has been minimized.

@eed3si9n

This comment has been minimized.

Copy link
Member

@dwijnand dwijnand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A light-touch/non-paranoid LGTM.

@retronym retronym force-pushed the topic/used-names-memory branch from e47b500 to 07b336f Compare April 30, 2020 10:42
@retronym retronym marked this pull request as draft April 30, 2020 10:42
@retronym

This comment has been minimized.

@retronym
Copy link
Member Author

Note to self: the benchmarks are unlikely to show much benefit on 2.12.11. Run on 2.12.x-SNAPSHOT instead.

In that build:

  • ambient allocations are reduced by sharing the buffer for classfile reading, so we can better focus on the changes in this PR.
  • immutable.HashMap.newBuilder is way more efficient.

@dwijnand dwijnand mentioned this pull request May 4, 2020
16 tasks
@retronym retronym added this to the 1.4.0 milestone May 19, 2020
@retronym retronym force-pushed the topic/used-names-memory branch 2 times, most recently from d44f509 to 3715290 Compare May 22, 2020 03:44
@retronym retronym marked this pull request as ready for review May 22, 2020 03:50
@retronym retronym force-pushed the topic/used-names-memory branch 2 times, most recently from 6830263 to f7e1812 Compare May 22, 2020 03:55
@dwijnand dwijnand force-pushed the topic/used-names-memory branch from 666b3ea to 31d3263 Compare July 3, 2020 19:58
  - Build the maps in the used name relation in a mutable map,
    then use a HashMap builder in a second pass to build the
    resulting maps. In Scala 2.12.12+ or 2.13.0+, use of the
    builder is fastest as the builder mutates the structure
    internally.
  - Intern the keys and values, so we don't have a separate
    instance of `UsedName("toString", EnumSet.of(UseScope.Default)`
    for each call site.
  - Factor this into RelationBuilder, and use this in
    ProtobufReaders where I recently made similar opts.

```
[info] All tests passed.
[info] Passed: Total 17, Failed 0, Errors 0, Passed 17
[success] Total time: 149 s (02:29), completed 28/04/2020 3:06:32 PM
sbt:zinc Root>
```
@retronym retronym force-pushed the topic/used-names-memory branch from 31d3263 to 8d49d9c Compare August 19, 2020 05:05
@eed3si9n eed3si9n modified the milestones: 1.4.0, 1.4.1 Oct 4, 2020
@eed3si9n eed3si9n modified the milestones: 1.4.1, 1.5.0 Mar 28, 2021
@eed3si9n eed3si9n merged commit 80be7d7 into sbt:develop Mar 28, 2021
@dotta
Copy link
Contributor

dotta commented Apr 4, 2021

@retronym Hey! :)

I'm concerned that with this change we may experience concurrency hazards when compiling with Hydra (such as visibility issues), as the underlying AnalysisCallback instance is shared between the different compilation threads. If you agree, could we find a way to make this change compatible with Hydra?

\cc @dragos @eed3si9n

@retronym
Copy link
Member Author

retronym commented Apr 4, 2021

Yep, that's totally fine with me. Looking at this again, I see I did synchronizes writes to the usedName RelationBuilder, but I don't have a synchronized in the final read in def addUsedNames so updates from the threads might not be visible.

@retronym
Copy link
Member Author

retronym commented Apr 4, 2021

You might want a different performance tradeoff (preferring less potential for contention and worrying less about allocation pressure). In that case you could create a second implementation of RelationBuilder that is more like the original code (TrieMap-based) and enable that in Hydra.

@dotta
Copy link
Contributor

dotta commented Apr 5, 2021

Yep, that's it. Problem is that swapping the AnalysisCallback from the outside (i.e., from a sbt plugin) isn't really possible at the moment (and we should do it in a way that works within the IntelliJ scala-plugin, as it leverages zinc and we have a Hydra integration in IntelliJ too). This doesn't look simple to achieve.

As the changes in this PR are motivated by (allocation) performance, what's the order of magnitude of the savings obtained using the current implementation? (I suspect you have been seeing a meaningful difference in large code bases, haven't you? If so, could you share some figures) By the way, would you have a benchmark that we can use to see if we can tweak the implementation in a way that would make it work with Hydra too, and at no meaningful penalty for vanilla Scala?

@dotta
Copy link
Contributor

dotta commented Apr 10, 2021

@retronym ping :)

@retronym
Copy link
Member Author

@dotta We've now got a way to avoid building relations (bi-directional maps) for used names. So we can go back to the lock-free data structures: https://github.com/sbt/zinc/pull/995/files#r677157060

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants