-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the remaining scroll response documents on update by query bulk requests #71430
Conversation
…equests In update by query requests where max_docs < size and conflicts=proceed we weren't using the remaining documents from the scroll response in cases where there were conflicts and in the first bulk request the successful updates < max_docs. This commit address that problem and use the remaining documents from the scroll response instead of requesting a new page. Closes elastic#63671
325c9e8
to
9d8e5b1
Compare
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments. Also, I would like to have the documentation refined to make it clear how max_docs
now work with conflicts in all 3 cases (reindex, update and delete by query).
...s/reindex/src/main/java/org/elasticsearch/index/reindex/AbstractAsyncBulkByScrollAction.java
Outdated
Show resolved
Hide resolved
...s/reindex/src/main/java/org/elasticsearch/index/reindex/AbstractAsyncBulkByScrollAction.java
Outdated
Show resolved
Hide resolved
modules/reindex/src/test/java/org/elasticsearch/index/reindex/UpdateByQueryConflictTests.java
Outdated
Show resolved
Hide resolved
@elasticmachine run elasticsearch-ci/packaging-sample-unix
Some of the builds running on centos were timing out yesterday and it seems to be a known problem. |
@henningandersen could you take a look into this PR when you have the chance? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates here, looks good. I have a few more comments.
true, | ||
(bulkByScrollResponse, maxDocs, conflictingUpdates) -> { | ||
assertThat(bulkByScrollResponse.getUpdated(), is((long) maxDocs)); | ||
assertThat(bulkByScrollResponse.getVersionConflicts(), is((long) conflictingUpdates)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move this assertion into the generic test case (executeConcurrentUpdatesOnSubsetOfDocs
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tackled in 9fe5874
// Modify a subset of the target documents concurrently | ||
final List<SearchHit> originalDocs = Arrays.asList(searchResponse.getHits().getHits()); | ||
final List<SearchHit> docsModifiedConcurrently = new ArrayList<>(); | ||
if (randomBoolean()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a case where we run out of docs on the query, i.e., we end up updating less than max_docs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tackled in 9fe5874
static class ScrollConsumableHitsResponse { | ||
private final ScrollableHitSource.AsyncResponse asyncResponse; | ||
private final List<? extends ScrollableHitSource.Hit> hits; | ||
private final AtomicInteger consumedOffset = new AtomicInteger(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can just be an int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That offset can be updated from different threads, as AbstractAsyncBulkByScrollAction#prepareBulkRequest
runs in the GENERIC
thread pool, after throttling has been applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@elasticmachine update branch |
…equests In update by query requests where max_docs < size and conflicts=proceed we weren't using the remaining documents from the scroll response in cases where there were conflicts and in the first bulk request the successful updates < max_docs. This commit address that problem and use the remaining documents from the scroll response instead of requesting a new page. Closes elastic#63671 Backport of elastic#71430
…equests (#71931) In update by query requests where max_docs < size and conflicts=proceed we weren't using the remaining documents from the scroll response in cases where there were conflicts and in the first bulk request the successful updates < max_docs. This commit address that problem and use the remaining documents from the scroll response instead of requesting a new page. Closes #63671 Backport of #71430
In update by query requests where max_docs < size and conflicts=proceed
we weren't using the remaining documents from the scroll response in
cases where there were conflicts and in the first bulk request the
successful updates < max_docs. This commit address that problem and
use the remaining documents from the scroll response instead of
requesting a new page.
Closes #63671