Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum #12699

gf2121 · 2023-10-19T05:09:19Z

Description

Previous talk is too long to track so i opened a new PR to make a summery here. More details are available in #12661.

After merging of #12631, we found a regression for term-dictionary-related tasks like PKLookup, Fuzzy ... After some digging I suspect that the regression is caused by more Outputs#add and Outputs#read on reading side, as the 'MSB VLong output format' making FST sharing more output prefix. The difference of calling times of Outputs#add and Outputs#read is shown below:

	LSB VLong	MSB VLong	diff
Outputs#read times	116097	149803	29.03%
Outputs#add times	144	111568	77377.78%

Solution

This patch tries to do two optimizations to get back the speed:

Instead of combining all outputs into single one, this patch collect all BytesRefs into an array and build a DataInput view over it, reducing the cost of objects construction and memory copy.
Instead of copying bytes into floor data, this patch directly uses the last output as floor data. Floor data is guaranteed continuous in single arc because we can not have two same fp encoded before floor data.

Benchmark

Both run 30 JVM round and 50 tasks per JVM:

Comparing to after merging of #12631

BaseLine ( After merging of #12631 )

$ git log
Write MSB VLong for better outputs sharing in block tree index (#12631)
DeletedTerms#clear should reset ByteBlockPool (#12630)

Candidate

$ git log
Optimize output accumulating
Write MSB VLong for better outputs sharing in block tree index
DeletedTerms#clear should reset ByteBlockPool

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   OrHighNotHigh      166.79      (4.5%)      165.12      (4.3%)   -1.0% (  -9% -    8%) 0.376
                    OrHighNotLow      258.47      (4.3%)      256.46      (3.3%)   -0.8% (  -7% -    7%) 0.430
            HighIntervalsOrdered        0.38      (3.3%)        0.38      (3.1%)   -0.8% (  -6% -    5%) 0.359
                 LowSloppyPhrase        1.64      (3.4%)        1.63      (4.9%)   -0.7% (  -8% -    7%) 0.544
                   OrNotHighHigh      150.83      (5.1%)      149.86      (4.4%)   -0.6% (  -9% -    9%) 0.597
                HighSloppyPhrase        3.67      (3.3%)        3.64      (3.6%)   -0.6% (  -7% -    6%) 0.470
                         Prefix3       67.16      (4.3%)       66.78      (2.8%)   -0.6% (  -7% -    6%) 0.543
                    OrNotHighMed      135.65      (4.4%)      134.95      (3.7%)   -0.5% (  -8% -    7%) 0.620
                    OrNotHighLow      257.14      (2.0%)      255.85      (1.7%)   -0.5% (  -4% -    3%) 0.291
                     AndHighHigh       19.33      (2.8%)       19.25      (2.2%)   -0.4% (  -5% -    4%) 0.517
                    OrHighNotMed      128.51      (5.3%)      128.06      (4.2%)   -0.3% (  -9% -    9%) 0.778
                 MedSloppyPhrase        9.56      (3.6%)        9.54      (3.8%)   -0.3% (  -7% -    7%) 0.788
                      OrHighHigh       13.23      (3.3%)       13.20      (3.2%)   -0.2% (  -6% -    6%) 0.801
                       OrHighLow      199.40      (1.9%)      199.18      (1.7%)   -0.1% (  -3% -    3%) 0.812
                        Wildcard       69.22      (2.7%)       69.19      (1.9%)   -0.0% (  -4% -    4%) 0.952
                      AndHighMed       42.14      (3.7%)       42.14      (2.9%)    0.0% (  -6% -    6%) 0.993
                    HighSpanNear        7.00      (3.1%)        7.00      (3.1%)    0.1% (  -5% -    6%) 0.945
                          Fuzzy1       44.05      (1.8%)       44.08      (1.7%)    0.1% (  -3% -    3%) 0.872
                       OrHighMed       38.54      (2.6%)       38.59      (3.2%)    0.1% (  -5% -    6%) 0.871
                         Respell       28.40      (2.2%)       28.45      (2.2%)    0.2% (  -4% -    4%) 0.790
                     LowSpanNear        4.23      (1.7%)        4.24      (1.5%)    0.2% (  -2% -    3%) 0.580
             MedIntervalsOrdered        3.47      (6.6%)        3.49      (7.3%)    0.3% ( -12% -   15%) 0.856
             LowIntervalsOrdered        2.76      (4.3%)        2.77      (5.0%)    0.4% (  -8% -   10%) 0.761
                          Fuzzy2       35.28      (1.6%)       35.42      (1.6%)    0.4% (  -2% -    3%) 0.325
                      HighPhrase       38.91      (2.6%)       39.07      (2.5%)    0.4% (  -4% -    5%) 0.532
                         MedTerm      248.39      (4.2%)      249.48      (4.2%)    0.4% (  -7% -    9%) 0.685
                       LowPhrase       26.44      (2.0%)       26.57      (2.3%)    0.5% (  -3% -    4%) 0.385
                       MedPhrase        9.51      (2.3%)        9.57      (2.6%)    0.6% (  -4% -    5%) 0.365
                      AndHighLow      338.26      (3.0%)      340.61      (2.6%)    0.7% (  -4% -    6%) 0.339
                         LowTerm      242.60      (4.5%)      244.38      (4.8%)    0.7% (  -8% -   10%) 0.545
                     MedSpanNear        1.11      (1.4%)        1.12      (1.5%)    0.8% (  -2% -    3%) 0.042
                        HighTerm      230.63      (3.8%)      233.01      (4.6%)    1.0% (  -7% -    9%) 0.339
                        PKLookup      100.76      (2.9%)      105.17      (3.2%)    4.4% (  -1% -   10%) 0.000

Comparing to before merging of #12631

BaseLine ( Before merging of #12631 )

$ git log
DeletedTerms#clear should reset ByteBlockPool

Candidate

$ git log
Optimize output accumulating
Write MSB VLong for better outputs sharing in block tree index
DeletedTerms#clear should reset ByteBlockPool

(Not sure why seeing tiny speed up for Orxxx tasks. I suspect it is a noise)

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
             MedIntervalsOrdered        3.50      (7.9%)        3.43      (7.4%)   -2.2% ( -16% -   14%) 0.269
                        Wildcard       70.27      (2.9%)       69.22      (2.2%)   -1.5% (  -6% -    3%) 0.023
            HighIntervalsOrdered        0.38      (4.0%)        0.37      (4.0%)   -1.3% (  -8% -    6%) 0.194
             LowIntervalsOrdered        2.78      (5.3%)        2.75      (5.2%)   -1.1% ( -10% -    9%) 0.417
                         Prefix3       66.95      (4.3%)       66.25      (3.1%)   -1.0% (  -8% -    6%) 0.282
                         LowTerm      246.48      (8.3%)      244.15      (6.8%)   -0.9% ( -14% -   15%) 0.631
                       MedPhrase        9.57      (3.6%)        9.50      (3.1%)   -0.8% (  -7% -    6%) 0.366
                HighSloppyPhrase        3.63      (3.5%)        3.62      (3.8%)   -0.3% (  -7% -    7%) 0.761
                      HighPhrase       39.07      (3.5%)       38.96      (3.3%)   -0.3% (  -6% -    6%) 0.749
                        PKLookup      105.27      (2.8%)      105.13      (3.0%)   -0.1% (  -5% -    5%) 0.859
                 MedSloppyPhrase        9.49      (3.2%)        9.48      (3.9%)   -0.1% (  -6% -    7%) 0.920
                         Respell       28.45      (2.6%)       28.46      (2.2%)    0.0% (  -4% -    5%) 0.957
                 LowSloppyPhrase        1.63      (5.0%)        1.63      (5.6%)    0.1% (  -9% -   11%) 0.929
                       LowPhrase       26.31      (3.3%)       26.36      (2.1%)    0.2% (  -5% -    5%) 0.782
                          Fuzzy1       43.71      (1.3%)       43.82      (1.7%)    0.3% (  -2% -    3%) 0.509
                   OrHighNotHigh      164.97      (3.7%)      165.40      (4.4%)    0.3% (  -7% -    8%) 0.806
                     AndHighHigh       19.17      (2.1%)       19.23      (2.3%)    0.3% (  -3% -    4%) 0.569
                     MedSpanNear        1.11      (2.0%)        1.12      (2.0%)    0.6% (  -3% -    4%) 0.266
                      AndHighLow      338.06      (2.1%)      340.30      (1.9%)    0.7% (  -3% -    4%) 0.196
                     LowSpanNear        4.20      (2.4%)        4.23      (2.1%)    0.7% (  -3% -    5%) 0.194
                      OrHighHigh       13.15      (4.0%)       13.26      (4.5%)    0.8% (  -7% -    9%) 0.454
                   OrNotHighHigh      148.59      (3.9%)      149.94      (5.1%)    0.9% (  -7% -   10%) 0.442
                        HighTerm      229.24      (3.7%)      231.60      (4.2%)    1.0% (  -6% -    9%) 0.314
                    HighSpanNear        6.94      (3.3%)        7.01      (3.6%)    1.1% (  -5% -    8%) 0.224
                    OrHighNotMed      126.33      (4.3%)      127.74      (4.9%)    1.1% (  -7% -   10%) 0.343
                    OrHighNotLow      254.19      (3.1%)      257.19      (3.9%)    1.2% (  -5% -    8%) 0.200
                      AndHighMed       41.35      (5.0%)       42.00      (3.2%)    1.6% (  -6% -   10%) 0.148
                          Fuzzy2       34.61      (1.8%)       35.34      (2.1%)    2.1% (  -1% -    6%) 0.000
                       OrHighMed       37.84      (5.8%)       38.65      (4.2%)    2.1% (  -7% -   12%) 0.107
                    OrNotHighLow      248.52      (2.8%)      254.75      (2.5%)    2.5% (  -2% -    7%) 0.000
                       OrHighLow      193.81      (2.3%)      200.07      (2.5%)    3.2% (  -1% -    8%) 0.000
                    OrNotHighMed      130.29      (3.6%)      134.66      (4.7%)    3.4% (  -4% -   12%) 0.002
                         MedTerm      241.15      (3.8%)      249.46      (4.5%)    3.4% (  -4% -   12%) 0.001

mikemccand

Net/net I love this idea -- it's basically the same idea as StringBuilder (append small strings to list and concatenate only in the end), better than the O(N^2) "just keep concatenating a single bigger and bigger string" that FST does today.

I love that this change erases a longstanding TODO also!

I wonder if we could make this appendable output more widely accessible (in oal.fst.util) for other use cases that need Outputs<BytesRef>, but let's not do that here.

I left some small comments ... I think this is close. It would be nice to recover the PKLookup perf hit. Thanks @gf2121!

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnumFrame.java

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java

mikemccand · 2023-11-21T17:32:40Z

I'll try to review the latest PR soon -- thanks @gf2121.

mikemccand

Thanks @gf2121 -- I left mostly smallish comments, but some head scratching -- hard to remember how these two crazy iterators work.

mikemccand · 2023-11-21T17:25:33Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java

+    }
+
+    void pop() {
+      this.num--;


Maybe assert num > 0 above this line?

Also you don't need the this. prefix when accessing the members in these methods.

mikemccand · 2023-11-22T12:29:12Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnumFrame.java

+
+  void load(SegmentTermsEnum.OutputAccumulator outputAccumulator) throws IOException {
+    outputAccumulator.prepareRead();
+    final long code = ite.fr.readVLongOutput(outputAccumulator);


Remove this silly final?

mikemccand · 2023-11-22T12:29:51Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnumFrame.java

+    load(code);
+  }
+
+  void load(Long boxCode) throws IOException {


Hmm rename to blockCode? Not sure why it's a box :)

I called it box because we are using a big Long... blockCode is better :)

mikemccand · 2023-11-22T12:31:56Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnumFrame.java

+  }
+
+  void load(Long boxCode) throws IOException {
+    if (boxCode != null) {


Maybe add comment that this means this block is the first one in a possible sequence of floor blocks corresponding to a single seek point from the FST terms index?

mikemccand · 2023-11-22T12:33:22Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java

@@ -373,9 +380,6 @@ public boolean seekExact(BytesRef target) throws IOException {

      int cmp = 0;

-      // TODO: reverse vLong byte order for better FST


Yay, stale TODO garbage collection!

mikemccand · 2023-11-22T12:34:49Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java

+    int index;
+
+    void push(BytesRef output) {
+      if (output != Lucene90BlockTreeTermsReader.NO_OUTPUT) {


Might be better to just check if (output.length > 0) {? It'd handle cases (not sure this happens) where caller is passing an empty BytesRef that is not NO_OUTPUT?

This object comparing is what we did before. I believe it is right because we have strict contract for Outputs operations.

While we have strict contracts, I can see that BytesSequenceOutputs#add(BytesRef, BytesRef) has assertions that the length is > 0.

Here in OutputAccumulator readByte() makes a big assumption that a BytesRef has at least length 1. If it had a length of 0, we would read past the ref end and read bytes sitting in a byte[] that we shouldn't.

IMO OutputAccumulator needs to be way more cautious than BytesSequenceOutputs#add(BytesRef, BytesRef) because OutputAccumulator isn't making copies and is relying on the underlying byte arrays not changing.

mikemccand · 2023-11-22T12:38:49Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java

+    }
+
+    void setFloorData(ByteArrayDataInput floorData) {
+      assert outputIndex == num - 1


Could we also assert index == 0 in this assert? I.e. we are at the start of the last BytesRef?

I think the index can be non-zero here. The last arc could contains suffix of the MSBVLong output.

mikemccand · 2023-11-22T12:41:56Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java

+      this.current = outputs[0];
+    }
+
+    void setFloorData(ByteArrayDataInput floorData) {


Could you add a comment that, while this reads/decodes bytes into floorData, it does not alter the read position?

mikemccand · 2023-11-22T12:45:50Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java

-    floorDataReader.reset(floorData, 0, numBytes);
+  public void setFloorData(SegmentTermsEnum.OutputAccumulator outputAccumulator) {
+    outputAccumulator.setFloorData(floorDataReader);
+    rewindPos = floorDataReader.getPosition();


Can we move this above line 107? It smells a bit putting this after -- it requires knowing setFloorData doesn't change the read position.

Hmm i'm a bit confused. I think setFloorData changes the position of floorDataReader because it calls
floorDataReader#reset, so we can not move this line above?

Woops you're right! Sorry :)

mikemccand · 2023-11-22T12:46:46Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java

@@ -247,7 +243,7 @@ void rewind() {
    nextEnt = -1;
    hasTerms = hasTermsOrig;
    if (isFloor) {
-      floorDataReader.rewind();
+      floorDataReader.setPosition(rewindPos);


I wonder if this is still needed? setFloorData didn't change the position? Or might other operations before the reset() have altered the read position? Hard to think about.

I think we need this. The floorDataReader is a ByteArrayDataInput and its position will get changed when we read from it.

gf2121 · 2023-11-22T14:54:23Z

Thanks for review @mikemccand !

but some head scratching -- hard to remember how these two crazy iterators work.

Agree that this is head scratching... I make a chart to try to explain this better.

mikemccand

Looks great @gf2121 -- you want to merge and backport to 9.x?

gf2121 · 2023-11-28T05:04:00Z

Thanks for review and great suggestions @mikemccand !

you want to merge and backport to 9.x?

Yes. I'll merge and backport this this.

gf2121 · 2023-11-29T08:13:41Z

Hi @mikemccand @jpountz !

The newest nightly benchmark shows we get back some speed for PKLookUp, but still not as good as the performance before the merge of #12631.

This result is not quite same as what i saw locally, but reasonable for me as we do more Outputs#read anyway.

Since we are preparing for the release of 9.9.0. I'm not sure if it is an acceptable trade-off now. Should we backport this or revert #12631? I'd appreciate your suggestions :)

mikemccand · 2023-11-29T10:54:47Z

I don't think we should revert #12631 -- that one was a nice disk space improvement, and the PKLookup performance is quite noisy -- going up or down by just as much when we upgrade JDKs or switch to Panama MMap impl. Thanks @gf2121.

…ectTermsEnum (apache#12699)" This reverts commit 9574cbd.

gf2121 · 2023-11-29T11:00:28Z

Thanks @mikemccand for feedback! I'll backport this to 9.9.0 then.

…Enum (#12699)

jpountz · 2023-11-29T15:09:57Z

+1 to not reverting #12631. The fact that our terms index used to not share prefixes properly was a bug, I'm glad it's fixed even though it may make performance slightly slower because more nodes have non-empty outputs to contribute.

FWIW it looks like your change did not only help PKLookup, but possibly also CountTerm, Respell, Fuzzy1 and Fuzzy2.

…ectTermsEnum (apache#12699)" This reverts commit d92efa3.

This commit updates Lucene to a patched version of 9.9.0 without apache/lucene#12699. See apache/lucene#12895.

…ectTermsEnum (apache#12699)" This reverts commit 9574cbd.

optimize output accumulating

71ef88a

gf2121 mentioned this pull request Oct 19, 2023

Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum #12661

Closed

gf2121 requested review from jpountz and mikemccand October 19, 2023 05:12

gf2121 mentioned this pull request Oct 19, 2023

Avoid object construction when linear searching arcs #12692

Merged

mikemccand reviewed Nov 13, 2023

View reviewed changes

gf2121 added 2 commits November 13, 2023 20:46

iter with review comments

0509ba8

more rename

1d30a46

gf2121 requested a review from mikemccand November 15, 2023 09:52

mikemccand reviewed Nov 22, 2023

View reviewed changes

gf2121 added 2 commits November 22, 2023 22:20

review iter

61c7034

Merge remote-tracking branch 'my/outputs_acc' into outputs_acc

42b2d0f

gf2121 requested a review from mikemccand November 22, 2023 15:36

gf2121 added 2 commits November 27, 2023 23:24

add comment

127fa00

remove final

94dbc69

mikemccand approved these changes Nov 27, 2023

View reviewed changes

gf2121 merged commit 9574cbd into apache:main Nov 28, 2023
4 checks passed

gf2121 added this to the 9.9.0 milestone Nov 28, 2023

gf2121 added a commit to gf2121/lucene that referenced this pull request Nov 29, 2023

Revert "Optimize outputs accumulating for SegmentTermsEnum and Inters…

b0558bd

…ectTermsEnum (apache#12699)" This reverts commit 9574cbd.

asfgit pushed a commit that referenced this pull request Nov 29, 2023

Optimize outputs accumulating for SegmentTermsEnum and IntersectTerms…

d92efa3

…Enum (#12699)

benwtrent mentioned this pull request Dec 8, 2023

Corruption read on term dictionaries in Lucene 9.9 #12895

Closed

rjernst added a commit to rjernst/lucene that referenced this pull request Dec 8, 2023

Revert "Optimize outputs accumulating for SegmentTermsEnum and Inters…

361aaae

…ectTermsEnum (apache#12699)" This reverts commit d92efa3.

rjernst added a commit to rjernst/lucene that referenced this pull request Dec 8, 2023

Revert "Optimize outputs accumulating for SegmentTermsEnum and Inters…

bb4fec6

…ectTermsEnum (apache#12699)" This reverts commit d92efa3.

rjernst added a commit to rjernst/elasticsearch that referenced this pull request Dec 8, 2023

Update Lucene to remove terms enum optimzation

cf9401e

This commit updates Lucene to a patched version of 9.9.0 without apache/lucene#12699. See apache/lucene#12895.

rjernst mentioned this pull request Dec 8, 2023

Update Lucene to remove terms enum optimzation elastic/elasticsearch#103229

Merged

elasticsearchmachine pushed a commit to elastic/elasticsearch that referenced this pull request Dec 9, 2023

Update Lucene to remove terms enum optimzation (#103229)

6fac62e

This commit updates Lucene to a patched version of 9.9.0 without apache/lucene#12699. See apache/lucene#12895.

javanna added a commit to javanna/lucene that referenced this pull request Dec 9, 2023

Revert "Optimize outputs accumulating for SegmentTermsEnum and Inters…

f2c4c4a

…ectTermsEnum (apache#12699)" This reverts commit 9574cbd.

javanna mentioned this pull request Dec 9, 2023

Revert "Optimize outputs accumulating for SegmentTermsEnum and InterssectTermsEnum " #12899

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum #12699

Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum #12699

gf2121 commented Oct 19, 2023

mikemccand left a comment

mikemccand commented Nov 21, 2023

mikemccand left a comment

mikemccand Nov 21, 2023

mikemccand Nov 22, 2023

mikemccand Nov 22, 2023

gf2121 Nov 22, 2023

mikemccand Nov 22, 2023

mikemccand Nov 22, 2023

mikemccand Nov 22, 2023

gf2121 Nov 22, 2023

benwtrent Dec 8, 2023 •

edited

Loading

mikemccand Nov 22, 2023

gf2121 Nov 22, 2023

mikemccand Nov 22, 2023

mikemccand Nov 22, 2023

gf2121 Nov 22, 2023

mikemccand Nov 27, 2023

mikemccand Nov 22, 2023

gf2121 Nov 22, 2023

gf2121 commented Nov 22, 2023

mikemccand left a comment

gf2121 commented Nov 28, 2023

gf2121 commented Nov 29, 2023

mikemccand commented Nov 29, 2023

gf2121 commented Nov 29, 2023

jpountz commented Nov 29, 2023

		@@ -373,9 +380,6 @@ public boolean seekExact(BytesRef target) throws IOException {

		int cmp = 0;

		// TODO: reverse vLong byte order for better FST

Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum #12699

Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum #12699

Conversation

gf2121 commented Oct 19, 2023

Description

Solution

Benchmark

Comparing to after merging of #12631

Comparing to before merging of #12631

mikemccand left a comment

Choose a reason for hiding this comment

mikemccand commented Nov 21, 2023

mikemccand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent Dec 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gf2121 commented Nov 22, 2023

mikemccand left a comment

Choose a reason for hiding this comment

gf2121 commented Nov 28, 2023

gf2121 commented Nov 29, 2023

mikemccand commented Nov 29, 2023

gf2121 commented Nov 29, 2023

jpountz commented Nov 29, 2023

benwtrent Dec 8, 2023 •

edited

Loading