Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Make tests less flaky #168

Merged
merged 81 commits into from
Nov 15, 2023
Merged

MAINT: Make tests less flaky #168

merged 81 commits into from
Nov 15, 2023

Conversation

larsoner
Copy link
Member

@larsoner larsoner commented Oct 18, 2023

Closes #166

  • Tests pass on my M1 laptop.
  • Fix logic for mne._fiff import (must check 1.6 before checking 1.5)
  • Treat warnings as errors as in MNE-Python
  • Add pre-commit
  • Enable pre-commit.ci
  • Enable debug LSL logging (library and MNE-lsl levels)
  • Change to test-specific Player use in fixtures
  • Try on my Linux machine to see if I can replicate the failures seen in Fix flaky tests #166
  • Add debug lines before all or most lib calls
  • Fix intermittent failures
  • Fix segfaults

@codecov
Copy link

codecov bot commented Oct 18, 2023

Codecov Report

Merging #168 (f5595cf) into main (278ae34) will increase coverage by 1.04%.
The diff coverage is 93.33%.

❗ Current head f5595cf differs from pull request most recent head cf18526. Consider uploading reports for the commit cf18526 to get more accurate results

@@            Coverage Diff             @@
##             main     #168      +/-   ##
==========================================
+ Coverage   72.27%   73.32%   +1.04%     
==========================================
  Files          29       29              
  Lines        3012     3033      +21     
  Branches      671      673       +2     
==========================================
+ Hits         2177     2224      +47     
+ Misses        736      714      -22     
+ Partials       99       95       -4     
Flag Coverage Δ
unittests 72.99% <90.00%> (+0.78%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
mne_lsl/lsl/load_liblsl.py 63.98% <ø> (+4.21%) ⬆️
mne_lsl/lsl/stream_outlet.py 95.87% <100.00%> (-2.02%) ⬇️
mne_lsl/player/player_lsl.py 100.00% <100.00%> (ø)
mne_lsl/player/_base.py 85.45% <85.71%> (+3.04%) ⬆️
mne_lsl/stream/_base.py 80.98% <91.66%> (+1.44%) ⬆️
mne_lsl/stream/stream_lsl.py 86.11% <93.93%> (+1.78%) ⬆️

... and 4 files with indirect coverage changes

@larsoner
Copy link
Member Author

Okay now the tests fail with at least more verbose information at the MNE-lsl and liblsl levels now, e.g., this one where there is a segfault:

mne_lsl/lsl/tests/test_stream_inlet.py::test_processing_flags[float32-flags0] PASSED
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:35       2| test: Started unicast udp server at port 16572 (addr 0x7f7a4de00018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at 127.0.0.1 port 16571 (addr 0x7f7a4ddef018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at 255.255.255.255 port 16571 (addr 0x7f7a4ddcd018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at 224.0.0.1 port 16571 (addr 0x7f7a4ddab018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at 224.0.0.183 port 16571 (addr 0x7f7a4bb44018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at 239.255.172.215 port 16571 (addr 0x7f7a4ddbc018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80] stream_outlet_impl.cpp:77       2| test: Trying to listen at address ''
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:35       2| test: Started unicast udp server at port 16572 (addr 0x7f7a4ddde018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:82    WARN| Could not bind multicast responder for ff02:113d:6fdd:2c17:a643:ffe2:1bd1:3cd2 to interface ::1 (Can't assign requested address)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at ff02:113d:6fdd:2c17:a643:ffe2:1bd1:3cd2 port 16571 (addr 0x7f7a4bb22018)
2023-10-20 15:02:55.632 (  62.938s) [            3D80]         udp_server.cpp:82    WARN| Could not bind multicast responder for ff05:113d:6fdd:2c17:a643:ffe2:1bd1:3cd2 to interface ::1 (Can't assign requested address)
2023-10-20 15:02:55.632 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at ff05:113d:6fdd:2c17:a643:ffe2:1bd1:3cd2 port 16571 (addr 0x7f7a4bb55018)
2023-10-20 15:02:55.632 (  62.938s) [            3D80]         tcp_server.cpp:160      1| Created IPv4 TCP acceptor for test @ port 16577
2023-10-20 15:02:55.632 (  62.938s) [            3D80]         tcp_server.cpp:171      1| Created IPv6 TCP acceptor for test @ port 16577
2023-10-20 15:02:57.830 (  65.136s) [IO_test         ]         tcp_server.cpp:270      1| Destructing session 0x7f7a4e753468
Fatal Python error: Segmentation fault
mne_lsl/lsl/tests/test_stream_inlet.py::test_processing_flags[float32-flags1] 

Thread 0x0000000106726600 (most recent call first):
  File 2023-10-20 15:02:58.473 (  65.778s) [R_test          ]      data_receiver.cpp:344    ERR| Stream transmission broke off (Input stream error.); re-connecting...
"/Users/runner/work/mne-lsl/mne-lsl/mne_lsl/lsl/stream_outlet.py", line 83 in __del__
  File "/Users/runner/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/Users/runner/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/pluggy/_callers.py", line 77 in _multicall

@mscheltienne do you want to look now that there is more information available? There is some logging stuff in there that is suspicious to my untrained eye like the "Stream transmission broke off". Locally running with stuff like pytest mne_lsl -k test_stream_add_reference_channels -sx you can see the output in real time during the run which is nice.

For now I added -s to the pytest CI run to maybe help diagnose the segfaults. I think to diagnose these we should add more and more logger.debug calls just before LSL lib.whatever calls, e.g. maybe something like lib.lsl_destroy_inlet or lib.lsl_destroy_outlet is trying to free some memory as it's being accessed by some other thread. Knowing which lib calls occur just before the segfault might tell us this.

FYI each test now uses its own Player. This seemed like a cleaner approach to help isolate tests from one another.

@larsoner
Copy link
Member Author

Can you confirm that you're losing samples in between good samples (i.e., not at the beginning or end)?

This part we can confirm because we hack in the increasing sample numbers as an EEG channel, and find that for example 70 samples in we get a 17-sample jump when we should get only 1-sample jumps.

And that this is not accompanied by logs of the stream breaking off and reconnecting?

This could be the culprit. Looking at https://github.com/mne-tools/mne-lsl/actions/runs/6865120608/job/18668396515?pr=168#step:9:1217 for example we see:

2023-11-14 14:45:33.212 ( 252.746s) [R_P_test_strea  ]      data_receiver.cpp:344    ERR| Stream transmission broke off (Input stream error.); re-connecting...

Any ideas on what could cause this part?

@cboulay
Copy link

cboulay commented Nov 14, 2023

Let's see if we can summon @tstenner in here who I know has worked on the "Stream transmission broke off" problem in the past. I've encountered it too but I never looked into the origin.

@cboulay
Copy link

cboulay commented Nov 14, 2023

Just pulling what I can from memory...

If you see this error repeated over-and-over then this is probably a protocol mismatch between the outlet and the inlet. There were a couple funky versions of liblsl that caused this. But you're only seeing this message once in a while so I don't think that's it.

Here is where that particular error is being generated. Unfortunately we have 2 functions that generate the same error string.

Again, this seems like a protocol problem. I would only expect that in the middle of normal transmission if there was an actual network problem, like the interface went down for a moment and there was an incomplete transmission.

@mscheltienne
Copy link
Member

We are using and limiting liblsl to recent version, IIRC, above 1.15 or 1.16, so I don't think we have a protocol mismatch.
In CIs, this Stream transmission broke off message is issued when the outlet is deleted (destroyed) before the inlet, which can happen at teardown sometimes (I thought I eliminated this problem but maybe not).

def test(...):
    outlet = StreamOutlet(...)
    inlet = StreamInlet(...)
    del outlet
    del inlet

Anyway, closing this for today 🙈 Back at it tomorrow ;)

@cboulay
Copy link

cboulay commented Nov 14, 2023

@larsoner , after 4487567 the run only had a couple segfaults.

For a while, we had the same problem: segfaults in macos only, and only on GitHub CI -- we couldn't reproduce locally.
https://github.com/sccn/liblsl/actions

This kind of went away on its own with some other changes. We still have segfaults, but it doesn't seem to be platform specific. They mostly happen during liblsl\testing\ext\bench_pushpull.cpp. This one is a little weird. In one of its parameterizations, it creates a single outlet and quickly creates up to 10 inlets for that outlet. Then it pushes some samples to the outlet and pulls them in each inlet.
https://github.com/sccn/liblsl/blob/master/testing/ext/bench_pushpull.cpp#L29-L53

I hope this was helpful somehow.

@larsoner larsoner marked this pull request as ready for review November 14, 2023 18:59
Copy link
Member Author

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a while, we had the same problem: segfaults in macos only, and only on GitHub CI -- we couldn't reproduce locally.

That seems to be the case for us, too! Good to know we're not alone. I managed to stabilize things at least somewhat. Got one green run, then re-ran CIs and at least don't seem to get segfaults anymore. I think the big gain probably came from using a block size of 64 vs 16... CIs tend to be slower at everything so I was thinking that would give everything a bit more time to process by reducing call overhead. But who knows...

@mscheltienne I suggest we merge this as-is and investigate the bad_gh_macos separately later. In the meantime this improves stuff a lot.

mne_lsl/conftest.py Show resolved Hide resolved
@@ -3,12 +3,14 @@
import time
from ctypes import byref, c_char_p, c_double, c_int, c_size_t, c_void_p
from functools import reduce
from threading import Lock
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and I added threading.Lock to help prevent segfaults.

Comment on lines +131 to +140
@property
def _obj(self):
if self.__obj is None:
raise RuntimeError("The StreamInlet has been destroyed.")
return self.__obj

@_obj.setter
def _obj(self, obj):
self.__obj = obj

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus some protection against errantly using self._obj after it has been destroyed

mne_lsl/player/player_lsl.py Show resolved Hide resolved
Comment on lines 92 to 93
# On macOS we get differences like -0.030293 vs -0.03031, -0.030286 vs -0.030313
atol = 0.0001 if platform.system() == "Darwin" else 0.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this but it seems necessary 🤷

mne_lsl/stream/tests/test_stream_lsl.py Outdated Show resolved Hide resolved
Comment on lines +31 to +34
bad_gh_macos = pytest.mark.skipif(
platform.system() == "Darwin" and os.getenv("GITHUB_ACTIONS", "") == "true",
reason="Unreliable on macOS CI",
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and I had to add this skip for the drop_channels test. Looking at the last few failures, other than the segfaults most were on this test and only on macOS.

@mscheltienne
Copy link
Member

Thanks a lot @larsoner; first green run in a long long time 🙈
I'm not sure how the threading.Lock is helping, could you detail your thought?

The same close_io magic for sphinx-gallery examples would be useful to prevent segfaults in the doc build.

One more CI rerun; still one failure on windows. The surprising part is that this match_stream_and_raw_data failures always occur in a function which adds or remove (pick) channels. They do not occur e.g. in test_stream. To be fair, that's also where 80-90% of the match_stream_and_raw_data are concentrated.. so maybe that's why it only fails those tests and I might have missed a couple of failures in test_stream or other non-pick related tests.

@mscheltienne
Copy link
Member

Happy to merge as is and to keep digging for the remaining points separately, that's already a huge improvements.
Do you think we should keep the LSL log level in test default to 2 or maybe do we bring it back to something less verbose and use the variable MNE_LSL_LOG_LEVEL when needed?

@larsoner
Copy link
Member Author

Do you think we should keep the LSL log level in test default to 2 or maybe do we bring it back to something less verbose and use the variable MNE_LSL_LOG_LEVEL when needed?

So far I'm happy with the CI output at least. We should probably make the LSL level configurable using an env var. Then we can have a default config for devs (maybe less verbose?) and set the verbosity in CIs however we want. But I'll merge this as-is and feel free to tweak those in a follow-up PR!

@larsoner larsoner merged commit 9c18ac0 into mne-tools:main Nov 15, 2023
19 checks passed
@larsoner larsoner deleted the flaky branch November 15, 2023 16:17
@tstenner
Copy link

Let's see if we can summon @tstenner in here who I know has worked on the "Stream transmission broke off" problem in the past. I've encountered it too but I never looked into the origin.

The three big causes are handshake failures (mainly weird decimal point issues for some europeans) before the first sample, the outlet getting closed before the input and the outlet disappearing before the inlet. The MacOS problems happened (as Chadwick pointed out) mostly in a synthetic benchmark, but only on the CI systems and never on one of my local machines. If anyone manages to get a stack trace or dump, I could investigate this further.

@mscheltienne
Copy link
Member

Thanks @cboulay @tstenner We will keep you updated if we get a stack trace or something that helps figuring out the remaining issues :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix flaky tests
4 participants