Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Fix retry action #306

Merged
merged 12 commits into from
Aug 6, 2024
Merged

CI: Fix retry action #306

merged 12 commits into from
Aug 6, 2024

Conversation

larsoner
Copy link
Member

@larsoner larsoner commented Aug 5, 2024

Closes #301
First part of the suggestion from #301 (comment). Let's see if the default is to have errorexit on.

@larsoner
Copy link
Member Author

larsoner commented Aug 5, 2024

Confirmed errexit on at least. Now let's see if we can get it to fail three times...

@larsoner
Copy link
Member Author

larsoner commented Aug 5, 2024

Okay it exited with code 0 even though I passed a command that should fail, I think because of a bug in the logic about attempt count

@larsoner
Copy link
Member Author

larsoner commented Aug 5, 2024

@larsoner larsoner changed the title WIP: Fix retry action CI: Fix retry action Aug 5, 2024
@larsoner
Copy link
Member Author

larsoner commented Aug 5, 2024

Appears to have worked on the Pytest step, too.

Command exited with code 1 after 1 attempts.

I guess maybe you also want exit code 1 in retry_error_codes not just 134,139?

@larsoner
Copy link
Member Author

larsoner commented Aug 5, 2024

@mscheltienne
Copy link
Member

set -eo pipefail, set +e and set -e.. I completely missed that part, thanks!
Actually, I don't want the CIs to retry on exit-code 1, only on segmentation fault and python fatal error which still occur at least once every 20/30 runs probably during the tear-up/clean-ups. If the tests fails, it should be fixable in the tests, and there is no need to wait for 3 retries to get that information ;)

@mscheltienne mscheltienne enabled auto-merge (squash) August 5, 2024 19:50
@larsoner
Copy link
Member Author

larsoner commented Aug 5, 2024

Okay look at 18c2a49 then it had one or two jobs fail with exit 1

@larsoner
Copy link
Member Author

larsoner commented Aug 5, 2024

@larsoner
Copy link
Member Author

larsoner commented Aug 5, 2024

Maybe those tests could use https://pypi.org/project/pytest-retry/ since they're flaky but don't default segfault?

@mscheltienne mscheltienne merged commit 81bcef3 into mne-tools:main Aug 6, 2024
18 of 19 checks passed
@mscheltienne
Copy link
Member

mscheltienne commented Aug 6, 2024

Like this run specifically https://github.com/mne-tools/mne-lsl/actions/runs/10253602827/job/28366606652

Yes, in this one, an LSL stream from other tests was not properly terminated.. Probably one of the rarest flakiness.

Anyway, the idea was to have this small action to recover from hard crashes; and eventually pytest-retry, flaky or pytest-rerunfailures for flaky test, but it might not even be needed. For instance, it would not have salvage the test run with this additional LSL stream that was not properly terminated.

@mscheltienne
Copy link
Member

But this run https://github.com/mne-tools/mne-lsl/actions/runs/10253602827/job/28366606652#step:8:14480 now suggest that when an hard crash occurs, LSL streams are not properly terminated and are still lingering in the background, making the test suit fail on the next iteration when looking for a fix number of streams with resolve_streams().. I'll mark those test/part as xfail and will add a random uuid str at the end of the stream names to guarantee uniqueness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GitHub action to rerun a step upon failure with specific exit-codes is doing nothing
2 participants