Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with ReinitializingSourceProvider #118370

Merged
merged 6 commits into from
Dec 11, 2024

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Dec 10, 2024

The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough. The read from perThreadProvider field could be stale and therefore returning a previous source provider. Instead the source provider should be returned from provider local variable.

This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.

The ReinitializingSourceProvider was introduced via #117792

Closes #118238

The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough.
When multiple threads access `ReinitializingSourceProvider` the simple thread accounting could still result in returned `SourceProvider` being used by multiple threads concurrently.

The ReinitializingSourceProvider was introduced via elastic#117792

Closes elastic#118238
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@@ -1673,17 +1674,32 @@ public void testScriptField() throws Exception {
String sourceMode = randomBoolean() ? "stored" : "synthetic";
Settings.Builder settings = indexSettings(1, 0).put(indexSettings()).put("index.mapping.source.mode", sourceMode);
client().admin().indices().prepareCreate("test-script").setMapping(mapping).setSettings(settings).get();
for (int i = 0; i < 10; i++) {
int numDocs = 256;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test adjustments were required to make it much more likely that the test failure as was reported in #118238 would reproduce locally. Without the fix to ReinitializingSourceProvider this test should fail after a few iterations, with the fix to ReinitializingSourceProvider this test failure reproduces.

provider = new PerThreadSourceProvider(sourceProviderFactory.get(), currentThread);
this.perThreadProvider = provider;
}
return perThreadProvider.source.getSource(ctx, doc);
Copy link
Member

@dnhatn dnhatn Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use provider not this.perThreadProvider here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try this, but then I ran into this assertion failure:

java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([89DC8DB16062966D]:0)
        at org.apache.lucene.tests.index.AssertingLeafReader$AssertingNumericDocValues.advanceExact(AssertingLeafReader.java:757)
        at org.apache.lucene.index.SingletonSortedNumericDocValues.advanceExact(SingletonSortedNumericDocValues.java:62)
        at org.elasticsearch.index.mapper.SortedNumericDocValuesSyntheticFieldLoader$ImmediateDocValuesLoader.advanceToDoc(SortedNumericDocValuesSyntheticFieldLoader.java:142)
        at org.elasticsearch.index.mapper.ObjectMapper$SyntheticSourceFieldLoader$ObjectDocValuesLoader.advanceToDoc(ObjectMapper.java:965)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.write(SourceLoader.java:210)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.source(SourceLoader.java:181)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$LeafWithMetrics.source(SourceLoader.java:146)
        at org.elasticsearch.search.lookup.SyntheticSourceProvider$SyntheticSourceLeafLoader.getSource(SyntheticSourceProvider.java:58)
        at org.elasticsearch.search.lookup.SyntheticSourceProvider.getSource(SyntheticSourceProvider.java:42)
        at org.elasticsearch.xpack.esql.plugin.ReinitializingSourceProvider.getSource(ReinitializingSourceProvider.java:37)
        at org.elasticsearch.search.lookup.LeafSearchLookup.lambda$new$0(LeafSearchLookup.java:40)
        at org.elasticsearch.script.AbstractFieldScript.extractFromSource(AbstractFieldScript.java:107)
        at org.elasticsearch.script.AbstractFieldScript.emitFromSource(AbstractFieldScript.java:127)
        at org.elasticsearch.script.LongFieldScript$1$1.execute(LongFieldScript.java:29)
        at org.elasticsearch.script.AbstractFieldScript.runForDoc(AbstractFieldScript.java:159)
        at org.elasticsearch.index.fielddata.LongScriptDocValues.advanceExact(LongScriptDocValues.java:26)
        at org.elasticsearch.search.MultiValueMode$6.advanceExact(MultiValueMode.java:534)
        at org.elasticsearch.index.fielddata.FieldData$17.advanceExact(FieldData.java:620)
        at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.getValueForDoc(LongComparator.java:80)
        at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.copy(LongComparator.java:105)
        at org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.collectAnyHit(TopFieldCollector.java:124)
        at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.collect(TopFieldCollector.java:209)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:305)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:264)
        at org.elasticsearch.compute.lucene.LuceneOperator$LuceneScorer.scoreNextRange(LuceneOperator.java:193)
        at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.collect(LuceneTopNSourceOperator.java:168)
        at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.getCheckedOutput(LuceneTopNSourceOperator.java:148)
        at org.elasticsearch.compute.lucene.LuceneOperator.getOutput(LuceneOperator.java:118)
        at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:258)
        at org.elasticsearch.compute.operator.Driver.run(Driver.java:189)
        at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:378)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1575)

So then I went with this approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But thinking more about this, just using the provider local variable should fix the issue. This is why I also went for it initially. Maybe there is something wrong elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed: 653ed40

The assertion that tripped here is because of another reason. The linked test failure failed because different threads using the same stored field readers or doc values instances (for the latter case, I observed this locally). This assertion failure here is because we go backwards with the current docid. I added protection against that in this commit, which reinitializes source provider if last seen docid is lower than the current docid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to reproduce this failure by running: ./gradlew ":x-pack:plugin:esql:internalClusterTest" --tests "org.elasticsearch.xpack.esql.action.Esql*IT.testScriptField" -Dtests.iters=128

Without the commit this fails, with the commit the executes successfully.

…using doc values APIs when source mode is synthetic:

```
java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([BA10D1FF912D808]:0)
        at org.apache.lucene.tests.index.AssertingLeafReader$AssertingNumericDocValues.advanceExact(AssertingLeafReader.java:757)
        at org.apache.lucene.index.SingletonSortedNumericDocValues.advanceExact(SingletonSortedNumericDocValues.java:62)
        at org.elasticsearch.index.mapper.SortedNumericDocValuesSyntheticFieldLoader$ImmediateDocValuesLoader.advanceToDoc(SortedNumericDocValuesSyntheticFieldLoader.java:142)
        at org.elasticsearch.index.mapper.ObjectMapper$SyntheticSourceFieldLoader$ObjectDocValuesLoader.advanceToDoc(ObjectMapper.java:965)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.write(SourceLoader.java:210)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.source(SourceLoader.java:181)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$LeafWithMetrics.source(SourceLoader.java:146)
        at org.elasticsearch.search.lookup.SyntheticSourceProvider$SyntheticSourceLeafLoader.getSource(SyntheticSourceProvider.java:58)
        at org.elasticsearch.search.lookup.SyntheticSourceProvider.getSource(SyntheticSourceProvider.java:42)
        at org.elasticsearch.xpack.esql.plugin.ReinitializingSourceProvider.getSource(ReinitializingSourceProvider.java:41)
        at org.elasticsearch.search.lookup.LeafSearchLookup.lambda$new$0(LeafSearchLookup.java:40)
        at org.elasticsearch.script.AbstractFieldScript.extractFromSource(AbstractFieldScript.java:107)
        at org.elasticsearch.script.AbstractFieldScript.emitFromSource(AbstractFieldScript.java:127)
        at org.elasticsearch.script.LongFieldScript$1$1.execute(LongFieldScript.java:29)
        at org.elasticsearch.script.AbstractFieldScript.runForDoc(AbstractFieldScript.java:159)
        at org.elasticsearch.index.fielddata.LongScriptDocValues.advanceExact(LongScriptDocValues.java:26)
        at org.elasticsearch.search.MultiValueMode$6.advanceExact(MultiValueMode.java:534)
        at org.elasticsearch.index.fielddata.FieldData$17.advanceExact(FieldData.java:620)
        at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.getValueForDoc(LongComparator.java:80)
        at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.copy(LongComparator.java:105)
        at org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.collectAnyHit(TopFieldCollector.java:124)
        at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.collect(TopFieldCollector.java:209)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:305)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:264)
        at org.elasticsearch.compute.lucene.LuceneOperator$LuceneScorer.scoreNextRange(LuceneOperator.java:193)
        at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.collect(LuceneTopNSourceOperator.java:168)
        at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.getCheckedOutput(LuceneTopNSourceOperator.java:148)
        at org.elasticsearch.compute.lucene.LuceneOperator.getOutput(LuceneOperator.java:118)
        at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:258)
        at org.elasticsearch.compute.operator.Driver.run(Driver.java:189)
        at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:378)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1575)
```
@martijnvg martijnvg added the auto-backport Automatically create backport pull requests when merged label Dec 11, 2024
@martijnvg martijnvg changed the title Fix concurrency issue with ReinitializingSourceProvider Fix issues with ReinitializingSourceProvider Dec 11, 2024
@martijnvg martijnvg merged commit e3bddd0 into elastic:main Dec 11, 2024
16 checks passed
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 11, 2024
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough.  The read from `perThreadProvider` field could be stale and therefore returning a previous source provider.  Instead the source provider should be returned from `provider` local variable.

This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.

Closes elastic#118238
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.17
8.x

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 11, 2024
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough.  The read from `perThreadProvider` field could be stale and therefore returning a previous source provider.  Instead the source provider should be returned from `provider` local variable.

This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.

Closes elastic#118238
elasticsearchmachine pushed a commit that referenced this pull request Dec 11, 2024
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough.  The read from `perThreadProvider` field could be stale and therefore returning a previous source provider.  Instead the source provider should be returned from `provider` local variable.

This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.

Closes #118238
elasticsearchmachine pushed a commit that referenced this pull request Dec 11, 2024
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough.  The read from `perThreadProvider` field could be stale and therefore returning a previous source provider.  Instead the source provider should be returned from `provider` local variable.

This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.

Closes #118238

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.17.1 v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] EsqlActionBreakerIT class failing
4 participants