MAINT: Make tests less flaky #168

larsoner · 2023-10-18T19:05:41Z

Closes #166

codecov · 2023-10-18T19:12:26Z

Codecov Report

Merging #168 (f5595cf) into main (278ae34) will increase coverage by 1.04%.
The diff coverage is 93.33%.

❗ Current head f5595cf differs from pull request most recent head cf18526. Consider uploading reports for the commit cf18526 to get more accurate results

@@            Coverage Diff             @@
##             main     #168      +/-   ##
==========================================
+ Coverage   72.27%   73.32%   +1.04%     
==========================================
  Files          29       29              
  Lines        3012     3033      +21     
  Branches      671      673       +2     
==========================================
+ Hits         2177     2224      +47     
+ Misses        736      714      -22     
+ Partials       99       95       -4

Flag	Coverage Δ
unittests	`72.99% <90.00%> (+0.78%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
mne_lsl/lsl/load_liblsl.py	`63.98% <ø> (+4.21%)`	⬆️
mne_lsl/lsl/stream_outlet.py	`95.87% <100.00%> (-2.02%)`	⬇️
mne_lsl/player/player_lsl.py	`100.00% <100.00%> (ø)`
mne_lsl/player/_base.py	`85.45% <85.71%> (+3.04%)`	⬆️
mne_lsl/stream/_base.py	`80.98% <91.66%> (+1.44%)`	⬆️
mne_lsl/stream/stream_lsl.py	`86.11% <93.93%> (+1.78%)`	⬆️

... and 4 files with indirect coverage changes

larsoner · 2023-10-20T16:30:32Z

Okay now the tests fail with at least more verbose information at the MNE-lsl and liblsl levels now, e.g., this one where there is a segfault:

mne_lsl/lsl/tests/test_stream_inlet.py::test_processing_flags[float32-flags0] PASSED
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:35       2| test: Started unicast udp server at port 16572 (addr 0x7f7a4de00018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at 127.0.0.1 port 16571 (addr 0x7f7a4ddef018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at 255.255.255.255 port 16571 (addr 0x7f7a4ddcd018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at 224.0.0.1 port 16571 (addr 0x7f7a4ddab018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at 224.0.0.183 port 16571 (addr 0x7f7a4bb44018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at 239.255.172.215 port 16571 (addr 0x7f7a4ddbc018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80] stream_outlet_impl.cpp:77       2| test: Trying to listen at address ''
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:35       2| test: Started unicast udp server at port 16572 (addr 0x7f7a4ddde018)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:82    WARN| Could not bind multicast responder for ff02:113d:6fdd:2c17:a643:ffe2:1bd1:3cd2 to interface ::1 (Can't assign requested address)
2023-10-20 15:02:55.631 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at ff02:113d:6fdd:2c17:a643:ffe2:1bd1:3cd2 port 16571 (addr 0x7f7a4bb22018)
2023-10-20 15:02:55.632 (  62.938s) [            3D80]         udp_server.cpp:82    WARN| Could not bind multicast responder for ff05:113d:6fdd:2c17:a643:ffe2:1bd1:3cd2 to interface ::1 (Can't assign requested address)
2023-10-20 15:02:55.632 (  62.938s) [            3D80]         udp_server.cpp:89       2| test: Started multicast udp server at ff05:113d:6fdd:2c17:a643:ffe2:1bd1:3cd2 port 16571 (addr 0x7f7a4bb55018)
2023-10-20 15:02:55.632 (  62.938s) [            3D80]         tcp_server.cpp:160      1| Created IPv4 TCP acceptor for test @ port 16577
2023-10-20 15:02:55.632 (  62.938s) [            3D80]         tcp_server.cpp:171      1| Created IPv6 TCP acceptor for test @ port 16577
2023-10-20 15:02:57.830 (  65.136s) [IO_test         ]         tcp_server.cpp:270      1| Destructing session 0x7f7a4e753468
Fatal Python error: Segmentation fault
mne_lsl/lsl/tests/test_stream_inlet.py::test_processing_flags[float32-flags1] 

Thread 0x0000000106726600 (most recent call first):
  File 2023-10-20 15:02:58.473 (  65.778s) [R_test          ]      data_receiver.cpp:344    ERR| Stream transmission broke off (Input stream error.); re-connecting...
"/Users/runner/work/mne-lsl/mne-lsl/mne_lsl/lsl/stream_outlet.py", line 83 in __del__
  File "/Users/runner/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/Users/runner/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/pluggy/_callers.py", line 77 in _multicall

@mscheltienne do you want to look now that there is more information available? There is some logging stuff in there that is suspicious to my untrained eye like the "Stream transmission broke off". Locally running with stuff like pytest mne_lsl -k test_stream_add_reference_channels -sx you can see the output in real time during the run which is nice.

For now I added -s to the pytest CI run to maybe help diagnose the segfaults. I think to diagnose these we should add more and more logger.debug calls just before LSL lib.whatever calls, e.g. maybe something like lib.lsl_destroy_inlet or lib.lsl_destroy_outlet is trying to free some memory as it's being accessed by some other thread. Knowing which lib calls occur just before the segfault might tell us this.

FYI each test now uses its own Player. This seemed like a cleaner approach to help isolate tests from one another.

larsoner · 2023-11-14T16:43:49Z

Can you confirm that you're losing samples in between good samples (i.e., not at the beginning or end)?

This part we can confirm because we hack in the increasing sample numbers as an EEG channel, and find that for example 70 samples in we get a 17-sample jump when we should get only 1-sample jumps.

And that this is not accompanied by logs of the stream breaking off and reconnecting?

This could be the culprit. Looking at https://github.com/mne-tools/mne-lsl/actions/runs/6865120608/job/18668396515?pr=168#step:9:1217 for example we see:

2023-11-14 14:45:33.212 ( 252.746s) [R_P_test_strea  ]      data_receiver.cpp:344    ERR| Stream transmission broke off (Input stream error.); re-connecting...

Any ideas on what could cause this part?

cboulay · 2023-11-14T16:49:19Z

Let's see if we can summon @tstenner in here who I know has worked on the "Stream transmission broke off" problem in the past. I've encountered it too but I never looked into the origin.

cboulay · 2023-11-14T16:58:34Z

Just pulling what I can from memory...

If you see this error repeated over-and-over then this is probably a protocol mismatch between the outlet and the inlet. There were a couple funky versions of liblsl that caused this. But you're only seeing this message once in a while so I don't think that's it.

Here is where that particular error is being generated. Unfortunately we have 2 functions that generate the same error string.

Again, this seems like a protocol problem. I would only expect that in the middle of normal transmission if there was an actual network problem, like the interface went down for a moment and there was an incomplete transmission.

mscheltienne · 2023-11-14T17:02:28Z

We are using and limiting liblsl to recent version, IIRC, above 1.15 or 1.16, so I don't think we have a protocol mismatch.
In CIs, this Stream transmission broke off message is issued when the outlet is deleted (destroyed) before the inlet, which can happen at teardown sometimes (I thought I eliminated this problem but maybe not).

def test(...):
    outlet = StreamOutlet(...)
    inlet = StreamInlet(...)
    del outlet
    del inlet

Anyway, closing this for today 🙈 Back at it tomorrow ;)

cboulay · 2023-11-14T18:28:48Z

@larsoner , after 4487567 the run only had a couple segfaults.

For a while, we had the same problem: segfaults in macos only, and only on GitHub CI -- we couldn't reproduce locally.
https://github.com/sccn/liblsl/actions

This kind of went away on its own with some other changes. We still have segfaults, but it doesn't seem to be platform specific. They mostly happen during liblsl\testing\ext\bench_pushpull.cpp. This one is a little weird. In one of its parameterizations, it creates a single outlet and quickly creates up to 10 inlets for that outlet. Then it pushes some samples to the outlet and pulls them in each inlet.
https://github.com/sccn/liblsl/blob/master/testing/ext/bench_pushpull.cpp#L29-L53

I hope this was helpful somehow.

larsoner

For a while, we had the same problem: segfaults in macos only, and only on GitHub CI -- we couldn't reproduce locally.

That seems to be the case for us, too! Good to know we're not alone. I managed to stabilize things at least somewhat. Got one green run, then re-ran CIs and at least don't seem to get segfaults anymore. I think the big gain probably came from using a block size of 64 vs 16... CIs tend to be slower at everything so I was thinking that would give everything a bit more time to process by reducing call overhead. But who knows...

@mscheltienne I suggest we merge this as-is and investigate the bad_gh_macos separately later. In the meantime this improves stuff a lot.

mne_lsl/conftest.py

larsoner · 2023-11-14T18:42:52Z

mne_lsl/lsl/stream_inlet.py

@@ -3,12 +3,14 @@
 import time
 from ctypes import byref, c_char_p, c_double, c_int, c_size_t, c_void_p
 from functools import reduce
+from threading import Lock


... and I added threading.Lock to help prevent segfaults.

larsoner · 2023-11-14T18:43:10Z

mne_lsl/lsl/stream_inlet.py

+    @property
+    def _obj(self):
+        if self.__obj is None:
+            raise RuntimeError("The StreamInlet has been destroyed.")
+        return self.__obj
+
+    @_obj.setter
+    def _obj(self, obj):
+        self.__obj = obj
+


Plus some protection against errantly using self._obj after it has been destroyed

mne_lsl/player/player_lsl.py

larsoner · 2023-11-14T18:51:17Z

mne_lsl/utils/_tests.py

+    # On macOS we get differences like -0.030293 vs -0.03031, -0.030286 vs -0.030313
+    atol = 0.0001 if platform.system() == "Darwin" else 0.0


I don't love this but it seems necessary 🤷

mne_lsl/stream/tests/test_stream_lsl.py

larsoner · 2023-11-14T19:39:22Z

mne_lsl/stream/tests/test_stream_lsl.py

+bad_gh_macos = pytest.mark.skipif(
+    platform.system() == "Darwin" and os.getenv("GITHUB_ACTIONS", "") == "true",
+    reason="Unreliable on macOS CI",
+)


... and I had to add this skip for the drop_channels test. Looking at the last few failures, other than the segfaults most were on this test and only on macOS.

mscheltienne · 2023-11-15T10:12:51Z

Thanks a lot @larsoner; first green run in a long long time 🙈
I'm not sure how the threading.Lock is helping, could you detail your thought?

The same close_io magic for sphinx-gallery examples would be useful to prevent segfaults in the doc build.

One more CI rerun; still one failure on windows. The surprising part is that this match_stream_and_raw_data failures always occur in a function which adds or remove (pick) channels. They do not occur e.g. in test_stream. To be fair, that's also where 80-90% of the match_stream_and_raw_data are concentrated.. so maybe that's why it only fails those tests and I might have missed a couple of failures in test_stream or other non-pick related tests.

mscheltienne · 2023-11-15T10:14:45Z

Happy to merge as is and to keep digging for the remaining points separately, that's already a huge improvements.
Do you think we should keep the LSL log level in test default to 2 or maybe do we bring it back to something less verbose and use the variable MNE_LSL_LOG_LEVEL when needed?

This reverts commit 8ce0263.

larsoner · 2023-11-15T16:17:04Z

Do you think we should keep the LSL log level in test default to 2 or maybe do we bring it back to something less verbose and use the variable MNE_LSL_LOG_LEVEL when needed?

So far I'm happy with the CI output at least. We should probably make the LSL level configurable using an env var. Then we can have a default config for devs (maybe less verbose?) and set the verbosity in CIs however we want. But I'll merge this as-is and feel free to tweak those in a follow-up PR!

tstenner · 2023-11-15T18:25:13Z

Let's see if we can summon @tstenner in here who I know has worked on the "Stream transmission broke off" problem in the past. I've encountered it too but I never looked into the origin.

The three big causes are handshake failures (mainly weird decimal point issues for some europeans) before the first sample, the outlet getting closed before the input and the outlet disappearing before the inlet. The MacOS problems happened (as Chadwick pointed out) mostly in a synthetic benchmark, but only on the CI systems and never on one of my local machines. If anyone manages to get a stack trace or dump, I could investigate this further.

mscheltienne · 2023-11-16T10:16:56Z

Thanks @cboulay @tstenner We will keep you updated if we get a stack trace or something that helps figuring out the remaining issues :)

larsoner added 2 commits October 18, 2023 14:53

MAINT: Make tests less flaky

c259099

FIX: Fix version check

c785d7d

larsoner added 26 commits October 18, 2023 15:30

FIX: More

add96a6

FIX: More

1d2c089

FIX: pre-commit

46c8767

FIX: More

4a0db05

FIX: Revert

dab235e

FIX: More

6c4fd07

FIX: More

ac75c80

FIX: More

e4765bb

FIX: More

80aa3c6

FIX: Marks

8bd7b68

MAINT: Function-level player

c04b450

FIX: Catches

866132c

FIX: Name

7b697b7

FIX: More

a499700

FIX: Flexible

ee8371b

FIX: Color

5279dc4

FIX: Mre

20875ca

FIX: Logging

602fed4

FIX: Cruft

62ac638

FIX: Output during run

1f6267d

FIX: Cleanup

3cd6cba

FIX: Attr

438c932

FIX: 312

3c055cb

FIX: Safer repr

2f123eb

FIX: More

801681b

FIX: More

f5595cf

FIX: atol

64558be

larsoner added 2 commits November 14, 2023 12:17

FIX: Lock lower

4487567

FIX: Segfaults?

bfd7687

larsoner added 2 commits November 14, 2023 13:29

FIX: Chunk sizze

0a3b818

TST: Tempt fate

9ac04cd

larsoner marked this pull request as ready for review November 14, 2023 18:59

larsoner added 2 commits November 14, 2023 14:11

TST: Fate wins again

7883780

FIX: More

fa1c8be

larsoner commented Nov 14, 2023

View reviewed changes

trigger cis

956ca0e

mscheltienne added 8 commits November 15, 2023 11:33

fix set_channel_units

0579bd3

add TODO to comment

d7bdfc5

get rid of warnings from outlet.push_chunk

367c0b1

Merge branch 'main' into flaky

2b07c77

fix

2c4fab7

skip in macOS runner

8dd9b71

try sphinx-gallery reset_modules

8ce0263

Revert "try sphinx-gallery reset_modules"

60a3bc9

This reverts commit 8ce0263.

larsoner merged commit 9c18ac0 into mne-tools:main Nov 15, 2023
19 checks passed

larsoner deleted the flaky branch November 15, 2023 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: Make tests less flaky #168

MAINT: Make tests less flaky #168

larsoner commented Oct 18, 2023 •

edited

Loading

codecov bot commented Oct 18, 2023 •

edited

Loading

larsoner commented Oct 20, 2023

larsoner commented Nov 14, 2023

cboulay commented Nov 14, 2023

cboulay commented Nov 14, 2023 •

edited

Loading

mscheltienne commented Nov 14, 2023

cboulay commented Nov 14, 2023

larsoner left a comment

larsoner Nov 14, 2023

larsoner Nov 14, 2023

larsoner Nov 14, 2023

larsoner Nov 14, 2023

mscheltienne commented Nov 15, 2023

mscheltienne commented Nov 15, 2023

larsoner commented Nov 15, 2023

tstenner commented Nov 15, 2023

mscheltienne commented Nov 16, 2023

		# On macOS we get differences like -0.030293 vs -0.03031, -0.030286 vs -0.030313
		atol = 0.0001 if platform.system() == "Darwin" else 0.0

MAINT: Make tests less flaky #168

MAINT: Make tests less flaky #168

Conversation

larsoner commented Oct 18, 2023 • edited Loading

codecov bot commented Oct 18, 2023 • edited Loading

Codecov Report

larsoner commented Oct 20, 2023

larsoner commented Nov 14, 2023

cboulay commented Nov 14, 2023

cboulay commented Nov 14, 2023 • edited Loading

mscheltienne commented Nov 14, 2023

cboulay commented Nov 14, 2023

larsoner left a comment

Choose a reason for hiding this comment

larsoner Nov 14, 2023

Choose a reason for hiding this comment

larsoner Nov 14, 2023

Choose a reason for hiding this comment

larsoner Nov 14, 2023

Choose a reason for hiding this comment

larsoner Nov 14, 2023

Choose a reason for hiding this comment

mscheltienne commented Nov 15, 2023

mscheltienne commented Nov 15, 2023

larsoner commented Nov 15, 2023

tstenner commented Nov 15, 2023

mscheltienne commented Nov 16, 2023

larsoner commented Oct 18, 2023 •

edited

Loading

codecov bot commented Oct 18, 2023 •

edited

Loading

cboulay commented Nov 14, 2023 •

edited

Loading