Improve join index memory accounting #15672

arhimondr · 2023-01-11T17:09:58Z

Description

Additional context and related issues

Release notes

(X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

arhimondr · 2023-01-11T17:10:16Z

Still WIP

core/trino-main/src/main/java/io/trino/operator/join/unspilled/HashBuilderOperator.java

losipiuk · 2023-01-11T18:10:22Z

core/trino-main/src/main/java/io/trino/operator/join/JoinHashSupplier.java

@@ -79,9 +82,29 @@ public JoinHashSupplier(
            positionLinksFactoryBuilder = ArrayPositionLinks.builder(addresses.size());
        }

-        this.pages = channelsToPages(channels);
+        long additionalPagesRetainedSizeInBytes = 0;
+        // Currently PageIndex is retained by the HashBuilderOperator after JoinHash is created.


Is or is-not. If it is retained then why do we need to account for this memory once again?

This is rather convoluted. HashBuilderOperator retains PageIndex. When PageIndex is being built memory usage is set to PageIndex#getEstimatedSize. After LookupSource (JoinHash) is created memory usage of HashBuilderOperator is set to JoinHash#getInMemorySizeInBytes while a reference to PageIndex is still retained. I think the assumption is that the JoinHash retains same data what PageIndex does, and it had been (almost) true up until the introduction of positionCounts. However the overhead of pages created for JoinHash have never been accounted, that's also what I'm trying to accommodate here.

It feels like ideally it has to be refactored:

Once JoinHash is created PageIndex has to be released. This is problematic with spilling, but should be doable in a version that doesn't support spilling

JoinHash ideally has to be refactored to avoid this duality of having List<Page> in addition to List<List<Block>>. However it is not a straightforward change.

losipiuk · 2023-01-11T18:11:03Z

core/trino-main/src/main/java/io/trino/operator/join/JoinHashSupplier.java

-        this.pages = channelsToPages(channels);
+        long additionalPagesRetainedSizeInBytes = 0;
+        // Currently PageIndex is retained by the HashBuilderOperator after JoinHash is created.
+        // JoinHash has to account for all data structures retained by PageIndex as after JoinHash is created memory retained by PageIndex is no longer accounted for.


Also can we just see how much much memory PageIndex consumed instead recomputing piece by piece here?

PageIndex stores data as a List<List<Block>>. However JoinHash nees the same data in a form of List<Page>, so we have to account the overhead of Page wrappers, but avoid accounting the size of blocks twice. Ideally it would be great to refactor and only use data in one form.

sopel39 · 2023-01-12T15:33:39Z

core/trino-main/src/main/java/io/trino/operator/join/unspilled/HashBuilderOperator.java

@@ -306,6 +306,11 @@ private void finishInput()
            return;
        }

+        ListenableFuture<Void> reserved = localUserMemoryContext.setBytes(index.getEstimatedLookupSourceSizeInBytes(hashArraySizeSupplier));
+        if (!reserved.isDone()) {
+            // Yield when not enough memory is available to proceed, finish is expected to be called again when some memory is freed


That seems to be awfully dependent on Driver detail. Could we make operator return blocked future instead.

Please add tests also

That seems to be awfully dependent on Driver detail. Could we make operator return blocked future instead.

It is generally true that currently it is responsibility of the system to "park" Drivers when the system is low on memory. For example if the HashAggregationOperator pushes the memory pool over the limit we currently don't explicitly block the operator, instead we simply update the memory utilization and expect the engine not to proceed if the system is low on memory.

Please add tests also

Yeah, still working on it.

arhimondr · 2023-01-12T22:05:41Z

Ready for review

core/trino-main/src/main/java/io/trino/operator/join/JoinHashSupplier.java

sopel39 · 2023-01-16T14:00:02Z

core/trino-main/src/main/java/io/trino/operator/PagesIndex.java

@@ -616,4 +618,12 @@ public Page computeNext()
            }
        };
    }
+
+    public long getEstimatedLookupSourceSizeInBytes(HashArraySizeSupplier hashArraySizeSupplier)


Estimate will depend on which PagesHash implementation is used and is also subject to change in the future.

You could pass LocalMemoryContext to io.trino.operator.PagesIndex#createLookupSourceSupplier and make createLookupSourceSupplier return a future.

Discussed offline. Decided to extract helper methods for estimating every part of the JoinHash

sopel39 · 2023-01-16T14:52:54Z

core/trino-main/src/main/java/io/trino/operator/join/JoinHashSupplier.java

@@ -79,7 +83,20 @@ public JoinHashSupplier(
            positionLinksFactoryBuilder = ArrayPositionLinks.builder(addresses.size());
        }

+        long additionalPagesRetainedSizeInBytes = 0;
+        // Currently PageIndex is retained by the HashBuilderOperator after JoinHash is created.
+        // JoinHash has to account for all data structures retained by PageIndex as after JoinHash is created memory retained by PageIndex is no longer accounted for.


for all data structures retained by PageIndex as after JoinHash

Where PagesIndex is still referenced? Can we simplify it (composite retained memory computation or use memory context) rather than feeding back special parameters?

Discussed offline. Decided to set PageIndex to null once it is no longer needed.

arhimondr · 2023-01-16T22:16:52Z

Updated

core/trino-main/src/main/java/io/trino/operator/join/unspilled/HashBuilderOperator.java

sopel39 · 2023-01-17T21:44:37Z

core/trino-main/src/main/java/io/trino/operator/join/unspilled/HashBuilderOperator.java

@@ -321,14 +323,13 @@ private void disposeLookupSourceIfRequested()
            return;
        }

-        index.clear();


check state index is null?

close set's it to null unconditionally (as well as lookupSourceSupplier). Actually let me remove the lookupSourceSupplier = null as well

core/trino-spi/src/main/java/io/trino/spi/Page.java

core/trino-main/src/main/java/io/trino/operator/join/JoinUtils.java

core/trino-main/src/main/java/io/trino/operator/join/JoinHashSupplier.java

core/trino-main/src/test/java/io/trino/operator/join/unspilled/TestHashJoinOperator.java

core/trino-main/src/main/java/io/trino/operator/join/JoinHashSupplier.java

core/trino-main/src/main/java/io/trino/operator/join/BigintPagesHash.java

arhimondr · 2023-01-18T00:41:57Z

@sopel39 Updated

close assigns lookupSourceSupplier to null unconditionally

sopel39 · 2023-01-18T10:24:50Z

core/trino-main/src/main/java/io/trino/operator/join/unspilled/HashBuilderOperator.java

@@ -322,7 +322,6 @@ private void disposeLookupSourceIfRequested()
            return;
        }

-        lookupSourceSupplier = null;


It is pre-existing

arhimondr requested review from losipiuk and sopel39 January 11, 2023 17:09

cla-bot bot added the cla-signed label Jan 11, 2023

arhimondr changed the title ~~Improve join index memory accounting~~ [WIP] Improve join index memory accounting Jan 11, 2023

losipiuk reviewed Jan 11, 2023

View reviewed changes

core/trino-main/src/main/java/io/trino/operator/join/unspilled/HashBuilderOperator.java Show resolved Hide resolved

losipiuk reviewed Jan 11, 2023

View reviewed changes

arhimondr force-pushed the fix-join-index-memory-accounting branch from 4d635cf to f6dd7fa Compare January 11, 2023 19:34

sopel39 reviewed Jan 12, 2023

View reviewed changes

arhimondr force-pushed the fix-join-index-memory-accounting branch from f6dd7fa to 8e0cd24 Compare January 12, 2023 22:05

arhimondr changed the title ~~[WIP] Improve join index memory accounting~~ Improve join index memory accounting Jan 12, 2023

losipiuk reviewed Jan 13, 2023

View reviewed changes

core/trino-main/src/main/java/io/trino/operator/join/JoinHashSupplier.java Outdated Show resolved Hide resolved

losipiuk approved these changes Jan 13, 2023

View reviewed changes

arhimondr force-pushed the fix-join-index-memory-accounting branch from 8e0cd24 to 1ca769a Compare January 15, 2023 20:37

losipiuk approved these changes Jan 16, 2023

View reviewed changes

sopel39 reviewed Jan 16, 2023

View reviewed changes

arhimondr force-pushed the fix-join-index-memory-accounting branch from 1ca769a to 36c42eb Compare January 16, 2023 22:09

arhimondr force-pushed the fix-join-index-memory-accounting branch from 36c42eb to 31b33c1 Compare January 17, 2023 03:02

sopel39 requested a review from gaurav8297 January 17, 2023 15:38

sopel39 reviewed Jan 17, 2023

View reviewed changes

core/trino-main/src/main/java/io/trino/operator/join/BigintPagesHash.java Outdated Show resolved Hide resolved

arhimondr force-pushed the fix-join-index-memory-accounting branch from 31b33c1 to fabf99c Compare January 18, 2023 00:41

arhimondr added 3 commits January 17, 2023 22:43

Reset positionCounts,pageCount in PagesIndex#clear

922eecb

Do not retain PageIndex after LookupSource is built

5e0e567

Remove unnecessary assignment

b1ffea3

close assigns lookupSourceSupplier to null unconditionally

arhimondr added 2 commits January 17, 2023 22:43

Account for memory retained by Page instances in JoinHash

bf3963a

Reserve memory before creating a LookupSource

702d868

arhimondr force-pushed the fix-join-index-memory-accounting branch from fabf99c to 702d868 Compare January 18, 2023 04:00

sopel39 approved these changes Jan 18, 2023

View reviewed changes

arhimondr merged commit 9d6764d into trinodb:master Jan 18, 2023

arhimondr deleted the fix-join-index-memory-accounting branch January 18, 2023 15:36

github-actions bot added this to the 406 milestone Jan 18, 2023

colebow mentioned this pull request Jan 18, 2023

Add Trino 406 release notes #15625

Merged

MnO2 mentioned this pull request Mar 7, 2023

Improve lookup source memory accounting prestodb/presto#19144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve join index memory accounting #15672

Improve join index memory accounting #15672

arhimondr commented Jan 11, 2023

arhimondr commented Jan 11, 2023

losipiuk Jan 11, 2023

arhimondr Jan 11, 2023

losipiuk Jan 11, 2023

arhimondr Jan 11, 2023

sopel39 Jan 12, 2023

arhimondr Jan 12, 2023 •

edited

Loading

arhimondr commented Jan 12, 2023

sopel39 Jan 16, 2023

arhimondr Jan 16, 2023

sopel39 Jan 16, 2023 •

edited

Loading

arhimondr Jan 16, 2023

arhimondr commented Jan 16, 2023

sopel39 Jan 17, 2023

arhimondr Jan 17, 2023

arhimondr commented Jan 18, 2023

sopel39 Jan 18, 2023

arhimondr Jan 18, 2023

Improve join index memory accounting #15672

Improve join index memory accounting #15672

Conversation

arhimondr commented Jan 11, 2023

Description

Additional context and related issues

Release notes

arhimondr commented Jan 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arhimondr Jan 12, 2023 • edited Loading

Choose a reason for hiding this comment

arhimondr commented Jan 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 Jan 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arhimondr commented Jan 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arhimondr commented Jan 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arhimondr Jan 12, 2023 •

edited

Loading

sopel39 Jan 16, 2023 •

edited

Loading