-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Fix pagination during rolling upgrade #85399
base: main
Are you sure you want to change the base?
SQL: Fix pagination during rolling upgrade #85399
Conversation
bc3c25d
to
1b10dc1
Compare
d355c5f
to
dd1e0f5
Compare
Pinging @elastic/es-ql (Team:QL) |
Hi @Luegg, I've created a changelog YAML for you. |
assertTextCursorCompatibleAcrossVersions(bwcVersion, oldNodesClient, newNodesClient); | ||
} | ||
|
||
@AwaitsFix(bugUrl = "tbd") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See last remark in the PR description. I'll create an according issue once this PR is approved and ready to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind about this comment. The @AwaitsFix
referred to a previous attempt to address the issue. With the current attempt, text format based cursors should also be usable across versions if all nodes support the cursor redirect from this PR.
a3333d1
to
25c3ddb
Compare
Hi @Luegg, I've created a changelog YAML for you. |
assertEquals(scrollJson.get("succeeded"), true); | ||
} | ||
|
||
public void testCursorForIndexAllocatedOnNewNodeWorkOnOldNode() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this test and testTextCursorFromNewNodeWorksOnOldNode
can only be tested once we have a release including the cursor redirect. Any ideas on how to run the test anyway would be highly appreciated :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since 8.2 has now been branched out, I can backport this PR to 8.2 when it's ready to merge, see how the tests are doing and then merge to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it seems pretty good.
Just leaving a couple of comments
private static DiscoveryNode findNodeWithVersionOtherThanLocalNode(ClusterService clusterService, Version expectedVersion) { | ||
DiscoveryNode localNode = clusterService.state().nodes().getLocalNode(); | ||
if (expectedVersion.equals(localNode.getVersion())) { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the logic here: if current node has the right version, it should be a good candidate itself... and a QlVersionMismatchException shouldn't have been thrown in the first place. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. This condition should not happen and I don't think you can end up in this situation. But if it is the case, you would want to avoid the retry because you could end up in an endless loop otherwise. Throwing an exception here is probably even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it makes sense.
The exception makes the intention much more clear 👍
Consumer<Exception> onFailure, | ||
Logger log | ||
) { | ||
if (log.isDebugEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(very minor note) IMHO this log does not belong here. This method is responsible of retrying the request, not of checking the version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a good place could also be the catch
block in the calling methods but that would duplicate the logging code. I've added some more context to the log message that also emphasizes that it looks for a matching node.
12b915a
to
509a31f
Compare
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Resolves #83726 by trying to redirect requests with a cursor with the wrong version to a node with the matching version.
The most frequent scenario that this PR addresses is the following situation during a rolling upgrade:
TransportActionUtils.executeRequestWithRetryAttempt
. The returned cursor will only work on the nodes with the old version.Especially in larger clusters, one might also encounter an additional error mode that is addressed by this PR:
The fix should also work with requests with
format=txt
but only if all nodes support the redirect. This case is more complicated because the cursor wraps another cursor together with the formatter and inner and outer cursor might not have the same version.