Fix/flaky `split` round robin limited fds #6043

cre4ture · 2024-03-03T13:16:20Z

Addresses #6031

refactoring: split large function get_writer into two parts
robutify closing of fds by retry until it works or no fds to close left. Previous implemenation did only one try and failed then.
robustify test setup: add sleep 0.5 && to ensure that the limits are applied before split application starts.

update: I implemented the idea with pre_exec from @tertsdiepraam as replacement for the sleep 0.5 test-setup fix.
the results of 15254 (1620) runs can be seen here: https://github.com/cre4ture/coreutils/actions/runs/8155517787/job/22291219367?pr=10
There was no single case where the original implementation of split failed.

I proved that both changes on its own fix the flakiness of the test by running all tests on 22 runners, each ~50times:

Evaluation:
the reference contained 12 out of 22 cases where the split test failed.
the other two tests didn't contain a single case each where this test was failing.

Test runs:
reference: https://github.com/cre4ture/coreutils/actions/runs/8123936859/job/22204882466?pr=9
split robustification: https://github.com/cre4ture/coreutils/actions/runs/8124460686/job/22205966825?pr=11
test robustification: https://github.com/cre4ture/coreutils/actions/runs/8124437301/job/22205913310?pr=10

tests/by-util/test_split.rs

github-actions · 2024-03-03T13:47:23Z

GNU testsuite comparison:

Congrats! The gnu test tests/mv/backup-dir is no longer failing!
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

github-actions · 2024-03-03T22:22:14Z

GNU testsuite comparison:

Congrats! The gnu test tests/mv/backup-dir is no longer failing!
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

tertsdiepraam · 2024-03-04T10:19:55Z

If I understand this all correctly, I think the correct thing to do is set the limits on the process not with prlimit but with setlimit from pre_exec. Would you like to give that a shot?

cre4ture · 2024-03-05T15:16:49Z

update: I implemented the idea with pre_exec from @tertsdiepraam.
the results of 15*2*54 (1620) runs can be seen here: https://github.com/cre4ture/coreutils/actions/runs/8155517787/job/22291219367?pr=10
There was no single case where the original implementation of split failed.

github-actions · 2024-03-05T18:09:55Z

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

…time

tertsdiepraam

The test fix looks great! Now I just wonder about the split change. I can't think of a case where we'd have to close multiple fds to be able to open 1. Would this really make it more robust? Or has this to do with changing limits while running split?

BenWiederhake · 2024-03-06T09:36:55Z

Or has this to do with changing limits while running split?

I think that's precisely the case. The limit drastically shrunk while executing (from unlimited to 9), so split had to close many fds before being able to operate again. I don't know whether that's a scenario that we want to support, but given that cree4ture already wrote the code, this would be an easy win.

tertsdiepraam · 2024-03-06T10:33:11Z

Sure, but I think we then need a comment explaining why it works the way it does.

cre4ture · 2024-03-06T18:50:57Z

I think the split-side change is still usefull. Even though the test is also green without it.

Why?
I could imagine some (possibly exotic?) usecases for this.
E.g. split running in parallel to other processes (e.g. another split) doing similar stuff, sharing the same limits.
In this scenario, after closing one fd, the other process might "steel" the freed fd and open a file on its side.
Then it would be beneficial if split would be able to close another fd before cancellation.

I'm not sure wether the limits of a common parent process are actually "shared" as I was explaining it. Or if each child gets exactly half of it, or again the same amount of free fds as the parent?

But at least according to https://unix.stackexchange.com/a/34335 the limits can also be set per user or group.
With this, multiple processes share the same free, limited ressources I assume. Then, with parallel execution anything can happen. ;-)

So, when we agree that this features is usefull, we can either comment this, or try to explicitly test this. I'm just not yet sure how a deterministic test for this could look like.

tertsdiepraam · 2024-03-06T21:55:42Z

Alright, I'm convinced :)

Could you add a comment about it?

cre4ture · 2024-03-07T22:51:47Z

Alright, I'm convinced :)

Could you add a comment about it?

done :-)

tests/common/util.rs

cre4ture changed the title ~~Fix/flaky split round robin limited fds~~ Fix/flaky split round robin limited fds Mar 3, 2024

sylvestre reviewed Mar 3, 2024

View reviewed changes

tests/by-util/test_split.rs Outdated Show resolved Hide resolved

cre4ture force-pushed the fix/flaky_split_round_robin_limited_fds branch 3 times, most recently from 66d406c to eed5f1c Compare March 5, 2024 15:10

cre4ture force-pushed the fix/flaky_split_round_robin_limited_fds branch 2 times, most recently from ee9691b to a6b5699 Compare March 5, 2024 17:39

extend error message for case when writer instanciation fails second …

294c9de

…time

cre4ture force-pushed the fix/flaky_split_round_robin_limited_fds branch from a6b5699 to e69caed Compare March 5, 2024 19:43

tertsdiepraam reviewed Mar 5, 2024

View reviewed changes

split: close as much fds as needed for opening new one

dab02d0

cre4ture force-pushed the fix/flaky_split_round_robin_limited_fds branch from e69caed to 2889119 Compare March 7, 2024 22:51

cre4ture requested review from tertsdiepraam and sylvestre March 7, 2024 23:23

anastygnome reviewed Mar 8, 2024

View reviewed changes

tests/common/util.rs Outdated Show resolved Hide resolved

use std::command::pre_exec() to set limits on child before exec

db142f9

cre4ture force-pushed the fix/flaky_split_round_robin_limited_fds branch from 2889119 to db142f9 Compare March 8, 2024 19:30

cre4ture requested review from anastygnome and BenWiederhake March 8, 2024 21:21

anastygnome approved these changes Mar 9, 2024

View reviewed changes

sylvestre merged commit 991d718 into uutils:main Mar 9, 2024
62 checks passed

chenrui333 mentioned this pull request Mar 24, 2024

uutils-coreutils 0.0.25 Homebrew/homebrew-core#167053

Merged

moonfruit mentioned this pull request Mar 28, 2024

uutils-selected 0.0.25 moonfruit/homebrew-tap#92

Merged

BenWiederhake mentioned this pull request Mar 31, 2024

test_split::test_round_robin_limited_file_descriptors fails sporadically on x86_64-unknown-linux-musl #6031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/flaky `split` round robin limited fds #6043

Fix/flaky `split` round robin limited fds #6043

cre4ture commented Mar 3, 2024 •

edited

Loading

github-actions bot commented Mar 3, 2024

github-actions bot commented Mar 3, 2024

tertsdiepraam commented Mar 4, 2024

cre4ture commented Mar 5, 2024 •

edited

Loading

github-actions bot commented Mar 5, 2024

tertsdiepraam left a comment

BenWiederhake commented Mar 6, 2024

tertsdiepraam commented Mar 6, 2024

cre4ture commented Mar 6, 2024

tertsdiepraam commented Mar 6, 2024

cre4ture commented Mar 7, 2024

Fix/flaky split round robin limited fds #6043

Fix/flaky split round robin limited fds #6043

Conversation

cre4ture commented Mar 3, 2024 • edited Loading

github-actions bot commented Mar 3, 2024

github-actions bot commented Mar 3, 2024

tertsdiepraam commented Mar 4, 2024

cre4ture commented Mar 5, 2024 • edited Loading

github-actions bot commented Mar 5, 2024

tertsdiepraam left a comment

Choose a reason for hiding this comment

BenWiederhake commented Mar 6, 2024

tertsdiepraam commented Mar 6, 2024

cre4ture commented Mar 6, 2024

tertsdiepraam commented Mar 6, 2024

cre4ture commented Mar 7, 2024

Fix/flaky `split` round robin limited fds #6043

Fix/flaky `split` round robin limited fds #6043

cre4ture commented Mar 3, 2024 •

edited

Loading

cre4ture commented Mar 5, 2024 •

edited

Loading