Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push and pop OutputAccumulator as IntersectTermsEnumFrames are pushed and popped #12900

Merged
merged 9 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,6 @@
import org.apache.lucene.util.Version;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Ignore;

/*
Verify we can read previous versions' indexes, do searches
Expand Down Expand Up @@ -2245,7 +2244,6 @@ public void testFailOpenOldIndex() throws IOException {
// #12895: test on a carefully crafted 9.8.0 index (from a small contiguous subset
// of wikibigall unique terms) that shows the read-time exception of
// IntersectTermsEnum (used by WildcardQuery)
@Ignore("re-enable once we merge #12900")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay :)

public void testWildcardQueryExceptions990() throws IOException {
Path path = createTempDir("12895");

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,8 @@ public IntersectTermsEnum(
assert setSavedStartTerm(startTerm);

currentFrame = f;
outputAccumulator.push(currentFrame.arc.output());

if (startTerm != null) {
seekToStartTerm(startTerm);
}
Expand Down Expand Up @@ -184,8 +186,7 @@ private IntersectTermsEnumFrame pushFrame(int state) throws IOException {
int idx = currentFrame.prefix;
assert currentFrame.suffix > 0;

outputAccumulator.reset();
outputAccumulator.push(arc.output());
int initOutputCount = outputAccumulator.outputCount();
while (idx < f.prefix) {
final int target = term.bytes[idx] & 0xff;
// TODO: we could be more efficient for the next()
Expand All @@ -198,9 +199,11 @@ private IntersectTermsEnumFrame pushFrame(int state) throws IOException {
}

f.arc = arc;
f.outputNum = outputAccumulator.outputCount() - initOutputCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK we need this because a single frame might traverse multiple FST arcs, so we need to accumulate possibly / often more than one output.

assert arc.isFinal();
outputAccumulator.push(arc.nextFinalOutput());
f.load(outputAccumulator);
outputAccumulator.pop(arc.nextFinalOutput());
return f;
}

Expand Down Expand Up @@ -343,6 +346,7 @@ private boolean popPushNext() throws IOException {
throw NoMoreTermsException.INSTANCE;
}
final long lastFP = currentFrame.fpOrig;
outputAccumulator.pop(currentFrame.outputNum);
currentFrame = stack[currentFrame.ord - 1];
currentTransition = currentFrame.transition;
assert currentFrame.lastSubFP == lastFP;
Expand Down Expand Up @@ -429,6 +433,7 @@ private BytesRef _next() throws IOException {
currentFrame = null;
return null;
}
outputAccumulator.pop(currentFrame.outputNum);
currentFrame = stack[currentFrame.ord - 1];
currentTransition = currentFrame.transition;
isSubBlock = popPushNext();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ final class IntersectTermsEnumFrame {

final ByteArrayDataInput bytesReader = new ByteArrayDataInput();

int outputNum;

int startBytePos;
int suffix;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -495,7 +495,7 @@ public boolean seekExact(BytesRef target) throws IOException {
targetUpto = 0;
outputAccumulator.push(arc.nextFinalOutput());
currentFrame = pushFrame(arc, 0);
outputAccumulator.pop();
outputAccumulator.pop(arc.nextFinalOutput());
Copy link
Contributor Author

@gf2121 gf2121 Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like another potential bug : If arc.nextFinalOutput() == NO_OUTPUT, we will not add the output into the accumulator but we pop one out.

Not sure why we have not seen any bug report for this. Maybe our output format protect it implicitly? This is in SegmentTermsEnum so it could possibly affect merge. I'll dig to see if it is a real bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! And the new check you added in pop will prevent this bug, right?

But I also think this was (luckily!?) not a bug, because this case is when the STE is not yet seek'd to any term, so we are starting at the FST entry arc / root block, and the empty string FST input must always have a final output (the "root block"), which will never be NO_OUTPUT. Are there any other places in STE where we were blindly popping like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the other places where we "blindly" pop are fine -- we conditionalize by arc.isFinal() check, and we know FST for BlockTree terms index will always have a nextFinalOutput on final nodes since it holds the floor data for that terms block. And there there is one other place in seekCeil where we also rely on empty string having a final output, which is fine. So net/net I don't think this was a bug, but it smelled close to one!

}

// if (DEBUG) {
Expand Down Expand Up @@ -569,7 +569,7 @@ public boolean seekExact(BytesRef target) throws IOException {
// if (DEBUG) System.out.println(" arc is final!");
outputAccumulator.push(arc.nextFinalOutput());
currentFrame = pushFrame(arc, targetUpto);
outputAccumulator.pop();
outputAccumulator.pop(arc.nextFinalOutput());
// if (DEBUG) System.out.println(" curFrame.ord=" + currentFrame.ord + " hasTerms=" +
// currentFrame.hasTerms);
}
Expand Down Expand Up @@ -767,7 +767,7 @@ public SeekStatus seekCeil(BytesRef target) throws IOException {
targetUpto = 0;
outputAccumulator.push(arc.nextFinalOutput());
currentFrame = pushFrame(arc, 0);
outputAccumulator.pop();
outputAccumulator.pop(arc.nextFinalOutput());
}

// if (DEBUG) {
Expand Down Expand Up @@ -841,7 +841,7 @@ public SeekStatus seekCeil(BytesRef target) throws IOException {
// if (DEBUG) System.out.println(" arc is final!");
outputAccumulator.push(arc.nextFinalOutput());
currentFrame = pushFrame(arc, targetUpto);
outputAccumulator.pop();
outputAccumulator.pop(arc.nextFinalOutput());
// if (DEBUG) System.out.println(" curFrame.ord=" + currentFrame.ord + " hasTerms=" +
// currentFrame.hasTerms);
}
Expand Down Expand Up @@ -1187,14 +1187,27 @@ static class OutputAccumulator extends DataInput {

void push(BytesRef output) {
if (output != Lucene90BlockTreeTermsReader.NO_OUTPUT) {
assert output.length > 0;
outputs = ArrayUtil.grow(outputs, num + 1);
outputs[num++] = output;
}
}

void pop() {
assert num > 0;
num--;
void pop(BytesRef output) {
if (output != Lucene90BlockTreeTermsReader.NO_OUTPUT) {
assert num > 0;
assert outputs[num - 1] == output;
num--;
}
}

void pop(int cnt) {
assert num >= cnt;
num -= cnt;
}

int outputCount() {
return num;
}

void reset() {
Expand Down
Loading