Memory efficiency for used names collection and other Relations building #764

retronym · 2020-04-28T05:08:25Z

Build the maps in the used name relation in a mutable map,
then use a HashMap builder in a second pass to build the
resulting maps. In Scala 2.12.12+ or 2.13.0+, use of the
builder is fastest as the builder mutates the structure
internally.
Intern the keys and values, so we don't have a separate
instance of UsedName("toString", EnumSet.of(UseScope.Default))
for each call site.
Factor this into RelationBuilder, and use this in
ProtobufReaders where I recently made similar opts.

retronym · 2020-04-28T05:19:57Z

internal/zinc-core/src/main/scala/sbt/internal/inc/RelationBuilder.scala

+    def toImmutable[K, V](
+        map: java.util.HashMap[K, (K, mutable.Builder[V, immutable.HashSet[V]])]
+    ): immutable.HashMap[K, immutable.HashSet[V]] = {
+      val builder = immutable.HashMap.newBuilder[K, immutable.HashSet[V]]


scala/scala#8726 is the 2.12.x backport of efficient immutable.HashMap$Builder (thanks, @mkeskells!). With that, or with 2.13, this step is step is quite efficient.

I considered an alternative of just doing an unsafeToImmutableMap wrapper, but that is a somewhat more fragile if the users of the built Relation start to call map operations on it.

It would be great if we had a richer immutable.HashMap.Builder that supported key getOrElseUpdate on the map being built, as well as a find transformValues step. Then, we could build up the desired structure from the begining and do a second pass that just mutates the data in the leaves.

internal/zinc-core/src/main/scala/sbt/internal/inc/Relations.scala

internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalCompile.scala

dwijnand

A light-touch/non-paranoid LGTM.

internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalCompile.scala

retronym · 2020-04-30T12:23:04Z

Note to self: the benchmarks are unlikely to show much benefit on 2.12.11. Run on 2.12.x-SNAPSHOT instead.

In that build:

ambient allocations are reduced by sharing the buffer for classfile reading, so we can better focus on the changes in this PR.
immutable.HashMap.newBuilder is way more efficient.

- Build the maps in the used name relation in a mutable map, then use a HashMap builder in a second pass to build the resulting maps. In Scala 2.12.12+ or 2.13.0+, use of the builder is fastest as the builder mutates the structure internally. - Intern the keys and values, so we don't have a separate instance of `UsedName("toString", EnumSet.of(UseScope.Default)` for each call site. - Factor this into RelationBuilder, and use this in ProtobufReaders where I recently made similar opts. ``` [info] All tests passed. [info] Passed: Total 17, Failed 0, Errors 0, Passed 17 [success] Total time: 149 s (02:29), completed 28/04/2020 3:06:32 PM sbt:zinc Root> ```

dotta · 2021-04-04T07:44:33Z

@retronym Hey! :)

I'm concerned that with this change we may experience concurrency hazards when compiling with Hydra (such as visibility issues), as the underlying AnalysisCallback instance is shared between the different compilation threads. If you agree, could we find a way to make this change compatible with Hydra?

\cc @dragos @eed3si9n

retronym · 2021-04-04T08:18:47Z

Yep, that's totally fine with me. Looking at this again, I see I did synchronizes writes to the usedName RelationBuilder, but I don't have a synchronized in the final read in def addUsedNames so updates from the threads might not be visible.

retronym · 2021-04-04T08:20:49Z

You might want a different performance tradeoff (preferring less potential for contention and worrying less about allocation pressure). In that case you could create a second implementation of RelationBuilder that is more like the original code (TrieMap-based) and enable that in Hydra.

dotta · 2021-04-05T08:38:13Z

Yep, that's it. Problem is that swapping the AnalysisCallback from the outside (i.e., from a sbt plugin) isn't really possible at the moment (and we should do it in a way that works within the IntelliJ scala-plugin, as it leverages zinc and we have a Hydra integration in IntelliJ too). This doesn't look simple to achieve.

As the changes in this PR are motivated by (allocation) performance, what's the order of magnitude of the savings obtained using the current implementation? (I suspect you have been seeing a meaningful difference in large code bases, haven't you? If so, could you share some figures) By the way, would you have a benchmark that we can use to see if we can tweak the implementation in a way that would make it work with Hydra too, and at no meaningful penalty for vanilla Scala?

dotta · 2021-04-10T11:40:33Z

@retronym ping :)

retronym · 2021-07-27T06:32:56Z

@dotta We've now got a way to avoid building relations (bi-directional maps) for used names. So we can go back to the lock-free data structures: https://github.com/sbt/zinc/pull/995/files#r677157060

retronym requested a review from jvican April 28, 2020 05:12

retronym commented Apr 28, 2020

View reviewed changes

internal/zinc-core/src/main/scala/sbt/internal/inc/Relations.scala Show resolved Hide resolved

retronym commented Apr 28, 2020

View reviewed changes

internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalCompile.scala Outdated Show resolved Hide resolved

retronym requested a review from dwijnand April 28, 2020 05:29

retronym force-pushed the topic/used-names-memory branch from 21f554c to 130309b Compare April 28, 2020 05:57

This comment has been minimized.

Sign in to view

retronym changed the title ~~Memory efficiency for used names collection~~ Memory efficiency for used names collection and other relations Apr 28, 2020

retronym changed the title ~~Memory efficiency for used names collection and other relations~~ Memory efficiency for used names collection and other Relations building Apr 28, 2020

This comment has been minimized.

Sign in to view

retronym added the performance label Apr 30, 2020

dwijnand approved these changes Apr 30, 2020

View reviewed changes

internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalCompile.scala Outdated Show resolved Hide resolved

retronym force-pushed the topic/used-names-memory branch from e47b500 to 07b336f Compare April 30, 2020 10:42

retronym marked this pull request as draft April 30, 2020 10:42

This comment has been minimized.

Sign in to view

dwijnand mentioned this pull request May 4, 2020

Epic: Integrate changes in mspnf/zinc #777

Closed

16 tasks

retronym mentioned this pull request May 7, 2020

Reduce memory of Zinc by de-duplicating Strings, etc #780

Closed

retronym added this to the 1.4.0 milestone May 19, 2020

retronym force-pushed the topic/used-names-memory branch 2 times, most recently from d44f509 to 3715290 Compare May 22, 2020 03:44

retronym marked this pull request as ready for review May 22, 2020 03:50

retronym force-pushed the topic/used-names-memory branch 2 times, most recently from 6830263 to f7e1812 Compare May 22, 2020 03:55

dwijnand force-pushed the topic/used-names-memory branch from 666b3ea to 31d3263 Compare July 3, 2020 19:58

retronym added 2 commits August 19, 2020 15:05

Reduce footprint by interning stringified Names in Analysis

8d49d9c

retronym force-pushed the topic/used-names-memory branch from 31d3263 to 8d49d9c Compare August 19, 2020 05:05

eed3si9n modified the milestones: 1.4.0, 1.4.1 Oct 4, 2020

Merge branch 'develop' into topic/used-names-memory

0500da6

eed3si9n modified the milestones: 1.4.1, 1.5.0 Mar 28, 2021

eed3si9n merged commit 80be7d7 into sbt:develop Mar 28, 2021

retronym mentioned this pull request Jul 27, 2021

Improve performance of loading used names from persisted Analysis file #995

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory efficiency for used names collection and other Relations building #764

Memory efficiency for used names collection and other Relations building #764

retronym commented Apr 28, 2020 •

edited

Loading

retronym Apr 28, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

dwijnand left a comment

This comment has been minimized.

retronym commented Apr 30, 2020

dotta commented Apr 4, 2021

retronym commented Apr 4, 2021

retronym commented Apr 4, 2021 •

edited

Loading

dotta commented Apr 5, 2021 •

edited

Loading

dotta commented Apr 10, 2021

retronym commented Jul 27, 2021

Memory efficiency for used names collection and other Relations building #764

Memory efficiency for used names collection and other Relations building #764

Conversation

retronym commented Apr 28, 2020 • edited Loading

retronym Apr 28, 2020

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

dwijnand left a comment

Choose a reason for hiding this comment

This comment has been minimized.

retronym commented Apr 30, 2020

dotta commented Apr 4, 2021

retronym commented Apr 4, 2021

retronym commented Apr 4, 2021 • edited Loading

dotta commented Apr 5, 2021 • edited Loading

dotta commented Apr 10, 2021

retronym commented Jul 27, 2021

retronym commented Apr 28, 2020 •

edited

Loading

retronym commented Apr 4, 2021 •

edited

Loading

dotta commented Apr 5, 2021 •

edited

Loading