-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the minimum size of the management pool to 2 #90193
Conversation
Make sure that individual `wait_for_completion=true` requests don't block concurrent stats and cat requests.
Hi @arteam, I've created a changelog YAML for you. |
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this fixes the bug that #89564 exposes unfortunately - if there were two concurrent calls to the list-tasks API then they could both deadlock within TaskManager#waitForTaskCompletion
even after this change.
Instead, I think we should make TaskManager#waitForTaskCompletion
async rather than blocking the management thread as it does today..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as a short term fix, provided that we follow up with a proper fix for the task management wait and in that fix lower the minimum again to 1.
I agree that is the ideal fix. However, if that cannot be done in time for 8.5.0 then increasing the minimum number of threads to 2 at least keeps ESS in the same situation as it was in in 8.4 and earlier. In 8.4 and earlier ESS told every node it had at least 2 processors to avoid strange lockups, and those strange lockups (undiagnosed at the time they were originally mentioned) were almost certainly the same problem as #89564. In 8.5 ESS is going to tell nodes with one processor that they have one processor, so the strange lockups will be seen by ESS users on small nodes unless something is done in Elasticsearch. So by all means fix this properly for 8.5.0 if there's time, but something has to be done for 8.5.0 even if that's a quick workaround. |
@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample |
💔 Backport failedThe backport operation could not be completed due to the following error:
You can use sqren/backport to manually backport by running |
The commit went to the master branch before the 8.5 branch was cut, so this change is included in 8.5 and doesn't need a backport. |
* main: (186 commits) [DOCS] Add 8.5 release notes and fix links (elastic#90201) Mute DownsampleActionSingleNodeTests.testCannotRollupWhileOtherRollupInProgress (elastic#90198) Bump to version 8.6.0 Increase the minimum size of the management pool to 2 (elastic#90193) Speed `getIntLE` from `BytesReference` (elastic#90147) Restrict nodes for testClusterPrimariesActive1 (elastic#90191) Fix bug with `BigIntArray` serialization (elastic#90142) Synthetic _source: test _source filtering (elastic#90138) Modernize cardinality agg tests (elastic#90114) Mute failing test (elastic#90186) Move assertion in ES85BloomFilterPostingsFormat to fix test (elastic#90150) Restrict nodes for testClusterPrimariesActive2 (elastic#90184) Batch index delete cluster state updates (elastic#90033) Register stable plugins in ActionModule (elastic#90067) Mute failing test (elastic#90180) [HealthAPI] Disk: Use _ for diagnosis id (elastic#90179) [HealtAPI] Disk: use shorter help URLs (elastic#90178) Fixing disk health indicator unit tests (elastic#90175) Enable the health node and the disk health indicator elastic#84811 (elastic#90085) Add missing Disk Indicator health api IDs (elastic#90174) ...
* main: (121 commits) [DOCS] Add 8.5 release notes and fix links (elastic#90201) Mute DownsampleActionSingleNodeTests.testCannotRollupWhileOtherRollupInProgress (elastic#90198) Bump to version 8.6.0 Increase the minimum size of the management pool to 2 (elastic#90193) Speed `getIntLE` from `BytesReference` (elastic#90147) Restrict nodes for testClusterPrimariesActive1 (elastic#90191) Fix bug with `BigIntArray` serialization (elastic#90142) Synthetic _source: test _source filtering (elastic#90138) Modernize cardinality agg tests (elastic#90114) Mute failing test (elastic#90186) Move assertion in ES85BloomFilterPostingsFormat to fix test (elastic#90150) Restrict nodes for testClusterPrimariesActive2 (elastic#90184) Batch index delete cluster state updates (elastic#90033) Register stable plugins in ActionModule (elastic#90067) Mute failing test (elastic#90180) [HealthAPI] Disk: Use _ for diagnosis id (elastic#90179) [HealtAPI] Disk: use shorter help URLs (elastic#90178) Fixing disk health indicator unit tests (elastic#90175) Enable the health node and the disk health indicator elastic#84811 (elastic#90085) Add missing Disk Indicator health api IDs (elastic#90174) ...
Following elastic#90193 this should no longer fail.
Following #90193 this should no longer fail.
Instead of blocking either a transport or management thread here (the latter will lead to a dead-lock if there's only a single management thread available, we should just queue up those sends and run them async via listener). This test is currently only working on `main` because of elastic#90193, this makes it work with a single management thread as well so it works on 7.x again. closes elastic#92629
Instead of blocking either a transport or management thread here (the latter will lead to a dead-lock if there's only a single management thread available, we should just queue up those sends and run them async via listener). This test is currently only working on `main` because of #90193, this makes it work with a single management thread as well so it works on 7.x again. closes #92629
Instead of blocking either a transport or management thread here (the latter will lead to a dead-lock if there's only a single management thread available, we should just queue up those sends and run them async via listener). This test is currently only working on `main` because of elastic#90193, this makes it work with a single management thread as well so it works on 7.x again. closes elastic#92629
Instead of blocking either a transport or management thread here (the latter will lead to a dead-lock if there's only a single management thread available, we should just queue up those sends and run them async via listener). This test is currently only working on `main` because of elastic#90193, this makes it work with a single management thread as well so it works on 7.x again. closes elastic#92629
Instead of blocking either a transport or management thread here (the latter will lead to a dead-lock if there's only a single management thread available, we should just queue up those sends and run them async via listener). This test is currently only working on `main` because of #90193, this makes it work with a single management thread as well so it works on 7.x again. closes #92629
Instead of blocking either a transport or management thread here (the latter will lead to a dead-lock if there's only a single management thread available, we should just queue up those sends and run them async via listener). This test is currently only working on `main` because of #90193, this makes it work with a single management thread as well so it works on 7.x again. closes #92629
Make sure that individual
wait_for_completion=true
requests don't block concurrent stats and cat requests.See #89564