-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues with ReinitializingSourceProvider #118370
Fix issues with ReinitializingSourceProvider #118370
Conversation
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough. When multiple threads access `ReinitializingSourceProvider` the simple thread accounting could still result in returned `SourceProvider` being used by multiple threads concurrently. The ReinitializingSourceProvider was introduced via elastic#117792 Closes elastic#118238
Hi @martijnvg, I've created a changelog YAML for you. |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
@@ -1673,17 +1674,32 @@ public void testScriptField() throws Exception { | |||
String sourceMode = randomBoolean() ? "stored" : "synthetic"; | |||
Settings.Builder settings = indexSettings(1, 0).put(indexSettings()).put("index.mapping.source.mode", sourceMode); | |||
client().admin().indices().prepareCreate("test-script").setMapping(mapping).setSettings(settings).get(); | |||
for (int i = 0; i < 10; i++) { | |||
int numDocs = 256; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test adjustments were required to make it much more likely that the test failure as was reported in #118238 would reproduce locally. Without the fix to ReinitializingSourceProvider
this test should fail after a few iterations, with the fix to ReinitializingSourceProvider
this test failure reproduces.
provider = new PerThreadSourceProvider(sourceProviderFactory.get(), currentThread); | ||
this.perThreadProvider = provider; | ||
} | ||
return perThreadProvider.source.getSource(ctx, doc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use provider
not this.perThreadProvider
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did try this, but then I ran into this assertion failure:
java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([89DC8DB16062966D]:0)
at org.apache.lucene.tests.index.AssertingLeafReader$AssertingNumericDocValues.advanceExact(AssertingLeafReader.java:757)
at org.apache.lucene.index.SingletonSortedNumericDocValues.advanceExact(SingletonSortedNumericDocValues.java:62)
at org.elasticsearch.index.mapper.SortedNumericDocValuesSyntheticFieldLoader$ImmediateDocValuesLoader.advanceToDoc(SortedNumericDocValuesSyntheticFieldLoader.java:142)
at org.elasticsearch.index.mapper.ObjectMapper$SyntheticSourceFieldLoader$ObjectDocValuesLoader.advanceToDoc(ObjectMapper.java:965)
at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.write(SourceLoader.java:210)
at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.source(SourceLoader.java:181)
at org.elasticsearch.index.mapper.SourceLoader$Synthetic$LeafWithMetrics.source(SourceLoader.java:146)
at org.elasticsearch.search.lookup.SyntheticSourceProvider$SyntheticSourceLeafLoader.getSource(SyntheticSourceProvider.java:58)
at org.elasticsearch.search.lookup.SyntheticSourceProvider.getSource(SyntheticSourceProvider.java:42)
at org.elasticsearch.xpack.esql.plugin.ReinitializingSourceProvider.getSource(ReinitializingSourceProvider.java:37)
at org.elasticsearch.search.lookup.LeafSearchLookup.lambda$new$0(LeafSearchLookup.java:40)
at org.elasticsearch.script.AbstractFieldScript.extractFromSource(AbstractFieldScript.java:107)
at org.elasticsearch.script.AbstractFieldScript.emitFromSource(AbstractFieldScript.java:127)
at org.elasticsearch.script.LongFieldScript$1$1.execute(LongFieldScript.java:29)
at org.elasticsearch.script.AbstractFieldScript.runForDoc(AbstractFieldScript.java:159)
at org.elasticsearch.index.fielddata.LongScriptDocValues.advanceExact(LongScriptDocValues.java:26)
at org.elasticsearch.search.MultiValueMode$6.advanceExact(MultiValueMode.java:534)
at org.elasticsearch.index.fielddata.FieldData$17.advanceExact(FieldData.java:620)
at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.getValueForDoc(LongComparator.java:80)
at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.copy(LongComparator.java:105)
at org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.collectAnyHit(TopFieldCollector.java:124)
at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.collect(TopFieldCollector.java:209)
at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:305)
at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:264)
at org.elasticsearch.compute.lucene.LuceneOperator$LuceneScorer.scoreNextRange(LuceneOperator.java:193)
at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.collect(LuceneTopNSourceOperator.java:168)
at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.getCheckedOutput(LuceneTopNSourceOperator.java:148)
at org.elasticsearch.compute.lucene.LuceneOperator.getOutput(LuceneOperator.java:118)
at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:258)
at org.elasticsearch.compute.operator.Driver.run(Driver.java:189)
at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:378)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1575)
So then I went with this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But thinking more about this, just using the provider
local variable should fix the issue. This is why I also went for it initially. Maybe there is something wrong elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed: 653ed40
The assertion that tripped here is because of another reason. The linked test failure failed because different threads using the same stored field readers or doc values instances (for the latter case, I observed this locally). This assertion failure here is because we go backwards with the current docid. I added protection against that in this commit, which reinitializes source provider if last seen docid is lower than the current docid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to reproduce this failure by running: ./gradlew ":x-pack:plugin:esql:internalClusterTest" --tests "org.elasticsearch.xpack.esql.action.Esql*IT.testScriptField" -Dtests.iters=128
Without the commit this fails, with the commit the executes successfully.
…using doc values APIs when source mode is synthetic: ``` java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([BA10D1FF912D808]:0) at org.apache.lucene.tests.index.AssertingLeafReader$AssertingNumericDocValues.advanceExact(AssertingLeafReader.java:757) at org.apache.lucene.index.SingletonSortedNumericDocValues.advanceExact(SingletonSortedNumericDocValues.java:62) at org.elasticsearch.index.mapper.SortedNumericDocValuesSyntheticFieldLoader$ImmediateDocValuesLoader.advanceToDoc(SortedNumericDocValuesSyntheticFieldLoader.java:142) at org.elasticsearch.index.mapper.ObjectMapper$SyntheticSourceFieldLoader$ObjectDocValuesLoader.advanceToDoc(ObjectMapper.java:965) at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.write(SourceLoader.java:210) at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.source(SourceLoader.java:181) at org.elasticsearch.index.mapper.SourceLoader$Synthetic$LeafWithMetrics.source(SourceLoader.java:146) at org.elasticsearch.search.lookup.SyntheticSourceProvider$SyntheticSourceLeafLoader.getSource(SyntheticSourceProvider.java:58) at org.elasticsearch.search.lookup.SyntheticSourceProvider.getSource(SyntheticSourceProvider.java:42) at org.elasticsearch.xpack.esql.plugin.ReinitializingSourceProvider.getSource(ReinitializingSourceProvider.java:41) at org.elasticsearch.search.lookup.LeafSearchLookup.lambda$new$0(LeafSearchLookup.java:40) at org.elasticsearch.script.AbstractFieldScript.extractFromSource(AbstractFieldScript.java:107) at org.elasticsearch.script.AbstractFieldScript.emitFromSource(AbstractFieldScript.java:127) at org.elasticsearch.script.LongFieldScript$1$1.execute(LongFieldScript.java:29) at org.elasticsearch.script.AbstractFieldScript.runForDoc(AbstractFieldScript.java:159) at org.elasticsearch.index.fielddata.LongScriptDocValues.advanceExact(LongScriptDocValues.java:26) at org.elasticsearch.search.MultiValueMode$6.advanceExact(MultiValueMode.java:534) at org.elasticsearch.index.fielddata.FieldData$17.advanceExact(FieldData.java:620) at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.getValueForDoc(LongComparator.java:80) at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.copy(LongComparator.java:105) at org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.collectAnyHit(TopFieldCollector.java:124) at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.collect(TopFieldCollector.java:209) at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:305) at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:264) at org.elasticsearch.compute.lucene.LuceneOperator$LuceneScorer.scoreNextRange(LuceneOperator.java:193) at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.collect(LuceneTopNSourceOperator.java:168) at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.getCheckedOutput(LuceneTopNSourceOperator.java:148) at org.elasticsearch.compute.lucene.LuceneOperator.getOutput(LuceneOperator.java:118) at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:258) at org.elasticsearch.compute.operator.Driver.run(Driver.java:189) at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:378) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34) at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1575) ```
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough. The read from `perThreadProvider` field could be stale and therefore returning a previous source provider. Instead the source provider should be returned from `provider` local variable. This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected. Closes elastic#118238
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough. The read from `perThreadProvider` field could be stale and therefore returning a previous source provider. Instead the source provider should be returned from `provider` local variable. This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected. Closes elastic#118238
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough. The read from `perThreadProvider` field could be stale and therefore returning a previous source provider. Instead the source provider should be returned from `provider` local variable. This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected. Closes #118238
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough. The read from `perThreadProvider` field could be stale and therefore returning a previous source provider. Instead the source provider should be returned from `provider` local variable. This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected. Closes #118238 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough. The read from
perThreadProvider
field could be stale and therefore returning a previous source provider. Instead the source provider should be returned fromprovider
local variable.This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.
The ReinitializingSourceProvider was introduced via #117792
Closes #118238