Increase the minimum size of the management pool to 2 #90193

arteam · 2022-09-21T16:07:10Z

Make sure that individual wait_for_completion=true requests don't block concurrent stats and cat requests.

See #89564

Make sure that individual `wait_for_completion=true` requests don't block concurrent stats and cat requests.

elasticsearchmachine · 2022-09-21T16:08:44Z

Hi @arteam, I've created a changelog YAML for you.

elasticsearchmachine · 2022-09-21T16:14:08Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner

I don't think this fixes the bug that #89564 exposes unfortunately - if there were two concurrent calls to the list-tasks API then they could both deadlock within TaskManager#waitForTaskCompletion even after this change.

Instead, I think we should make TaskManager#waitForTaskCompletion async rather than blocking the management thread as it does today..

…gement-pool

henningandersen

LGTM as a short term fix, provided that we follow up with a proper fix for the task management wait and in that fix lower the minimum again to 1.

discussion in another channel

droberts195 · 2022-09-21T16:34:10Z

Instead, I think we should make TaskManager#waitForTaskCompletion async rather than blocking the management thread as it does today.

I agree that is the ideal fix. However, if that cannot be done in time for 8.5.0 then increasing the minimum number of threads to 2 at least keeps ESS in the same situation as it was in in 8.4 and earlier. In 8.4 and earlier ESS told every node it had at least 2 processors to avoid strange lockups, and those strange lockups (undiagnosed at the time they were originally mentioned) were almost certainly the same problem as #89564. In 8.5 ESS is going to tell nodes with one processor that they have one processor, so the strange lockups will be seen by ESS users on small nodes unless something is done in Elasticsearch. So by all means fix this properly for 8.5.0 if there's time, but something has to be done for 8.5.0 even if that's a quick workaround.

arteam · 2022-09-21T16:42:17Z

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

elasticsearchmachine · 2022-09-21T17:41:30Z

💔 Backport failed

The backport operation could not be completed due to the following error:

There are no branches to backport to. Aborting.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 90193

arteam · 2022-09-21T18:36:13Z

The commit went to the master branch before the 8.5 branch was cut, so this change is included in 8.5 and doesn't need a backport.

* main: (186 commits) [DOCS] Add 8.5 release notes and fix links (elastic#90201) Mute DownsampleActionSingleNodeTests.testCannotRollupWhileOtherRollupInProgress (elastic#90198) Bump to version 8.6.0 Increase the minimum size of the management pool to 2 (elastic#90193) Speed `getIntLE` from `BytesReference` (elastic#90147) Restrict nodes for testClusterPrimariesActive1 (elastic#90191) Fix bug with `BigIntArray` serialization (elastic#90142) Synthetic _source: test _source filtering (elastic#90138) Modernize cardinality agg tests (elastic#90114) Mute failing test (elastic#90186) Move assertion in ES85BloomFilterPostingsFormat to fix test (elastic#90150) Restrict nodes for testClusterPrimariesActive2 (elastic#90184) Batch index delete cluster state updates (elastic#90033) Register stable plugins in ActionModule (elastic#90067) Mute failing test (elastic#90180) [HealthAPI] Disk: Use _ for diagnosis id (elastic#90179) [HealtAPI] Disk: use shorter help URLs (elastic#90178) Fixing disk health indicator unit tests (elastic#90175) Enable the health node and the disk health indicator elastic#84811 (elastic#90085) Add missing Disk Indicator health api IDs (elastic#90174) ...

* main: (121 commits) [DOCS] Add 8.5 release notes and fix links (elastic#90201) Mute DownsampleActionSingleNodeTests.testCannotRollupWhileOtherRollupInProgress (elastic#90198) Bump to version 8.6.0 Increase the minimum size of the management pool to 2 (elastic#90193) Speed `getIntLE` from `BytesReference` (elastic#90147) Restrict nodes for testClusterPrimariesActive1 (elastic#90191) Fix bug with `BigIntArray` serialization (elastic#90142) Synthetic _source: test _source filtering (elastic#90138) Modernize cardinality agg tests (elastic#90114) Mute failing test (elastic#90186) Move assertion in ES85BloomFilterPostingsFormat to fix test (elastic#90150) Restrict nodes for testClusterPrimariesActive2 (elastic#90184) Batch index delete cluster state updates (elastic#90033) Register stable plugins in ActionModule (elastic#90067) Mute failing test (elastic#90180) [HealthAPI] Disk: Use _ for diagnosis id (elastic#90179) [HealtAPI] Disk: use shorter help URLs (elastic#90178) Fixing disk health indicator unit tests (elastic#90175) Enable the health node and the disk health indicator elastic#84811 (elastic#90085) Add missing Disk Indicator health api IDs (elastic#90174) ...

Following elastic#90193 this should no longer fail.

Following #90193 this should no longer fail.

Instead of blocking either a transport or management thread here (the latter will lead to a dead-lock if there's only a single management thread available, we should just queue up those sends and run them async via listener). This test is currently only working on `main` because of elastic#90193, this makes it work with a single management thread as well so it works on 7.x again. closes elastic#92629

Instead of blocking either a transport or management thread here (the latter will lead to a dead-lock if there's only a single management thread available, we should just queue up those sends and run them async via listener). This test is currently only working on `main` because of #90193, this makes it work with a single management thread as well so it works on 7.x again. closes #92629

Instead of blocking either a transport or management thread here (the latter will lead to a dead-lock if there's only a single management thread available, we should just queue up those sends and run them async via listener). This test is currently only working on `main` because of elastic#90193, this makes it work with a single management thread as well so it works on 7.x again. closes elastic#92629

Instead of blocking either a transport or management thread here (the latter will lead to a dead-lock if there's only a single management thread available, we should just queue up those sends and run them async via listener). This test is currently only working on `main` because of #90193, this makes it work with a single management thread as well so it works on 7.x again. closes #92629

…c#90193)" This reverts commit beea2b0.

Increase the minimum size of the management pool to 2

961de23

Make sure that individual `wait_for_completion=true` requests don't block concurrent stats and cat requests.

arteam added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. auto-backport Automatically create backport pull requests when merged v8.5.0 >bug >enhancement and removed >bug labels Sep 21, 2022

arteam and others added 2 commits September 21, 2022 18:08

Update docs/changelog/90193.yaml

7e15a9f

Adjust ScalingThreadPoolTests for the new management thread pool limit

a1350ec

arteam requested a review from henningandersen September 21, 2022 16:13

arteam marked this pull request as ready for review September 21, 2022 16:13

elasticsearchmachine added the Team:Distributed Meta label for distributed team label Sep 21, 2022

DaveCTurner previously requested changes Sep 21, 2022

View reviewed changes

Merge remote-tracking branch 'origin/main' into increase-size-of-mana…

d45df17

…gement-pool

henningandersen approved these changes Sep 21, 2022

View reviewed changes

arteam merged commit beea2b0 into elastic:main Sep 21, 2022

arteam deleted the increase-size-of-management-pool branch September 21, 2022 18:35

arteam mentioned this pull request Sep 22, 2022

Fix ScalingThreadPoolTests for the management thread pool #90214

Merged

droberts195 mentioned this pull request Sep 22, 2022

[ML] Unmute "Test existing but corrupt snapshot" test #90220

Merged

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Sep 22, 2022

[ML] Unmute "Test existing but corrupt snapshot" test

5c9b7b1

Following elastic#90193 this should no longer fail.

droberts195 added a commit that referenced this pull request Sep 23, 2022

[ML] Unmute "Test existing but corrupt snapshot" test (#90220)

84956c2

Following #90193 this should no longer fail.

original-brownbear mentioned this pull request Jan 3, 2023

[CI] ReopenWhileClosingIT testReopenDuringCloseOnMultipleIndices failing #92629

Closed

original-brownbear mentioned this pull request Jan 4, 2023

Fix ReopenWhileClosingIT Deadlocks #92662

Merged

DaveCTurner mentioned this pull request Jan 16, 2023

TransportListTaskAction: wait for tasks to finish asynchronously #90977

Merged

arteam added a commit to arteam/elasticsearch that referenced this pull request Jan 19, 2023

Revert "Increase the minimum size of the management pool to 2 (elasti…

d20fff3

…c#90193)" This reverts commit beea2b0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase the minimum size of the management pool to 2 #90193

Increase the minimum size of the management pool to 2 #90193

arteam commented Sep 21, 2022 •

edited

Loading

elasticsearchmachine commented Sep 21, 2022

elasticsearchmachine commented Sep 21, 2022

DaveCTurner left a comment

henningandersen left a comment

droberts195 commented Sep 21, 2022

arteam commented Sep 21, 2022

elasticsearchmachine commented Sep 21, 2022

arteam commented Sep 21, 2022

Increase the minimum size of the management pool to 2 #90193

Increase the minimum size of the management pool to 2 #90193

Conversation

arteam commented Sep 21, 2022 • edited Loading

elasticsearchmachine commented Sep 21, 2022

elasticsearchmachine commented Sep 21, 2022

DaveCTurner left a comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

droberts195 commented Sep 21, 2022

arteam commented Sep 21, 2022

elasticsearchmachine commented Sep 21, 2022

💔 Backport failed

arteam commented Sep 21, 2022

arteam commented Sep 21, 2022 •

edited

Loading