Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couple of flaky pthread tests on Firefox #18307

Closed
RReverser opened this issue Dec 2, 2022 · 10 comments · Fixed by #18314
Closed

Couple of flaky pthread tests on Firefox #18307

RReverser opened this issue Dec 2, 2022 · 10 comments · Fixed by #18314

Comments

@RReverser
Copy link
Collaborator

I'm commonly seeing these 5 tests in particular failing on the CI in Firefox, and working after couple of attempts:

  • test_pthread_sbrk
  • test_pthread_safe_stack
  • test_pthread_run_script
  • test_pthread_run_on_main_thread_flood
  • test_pthread_run_on_main_thread

Here's an example log from recently seen failures (attaching as a file because I'm not sure about retention duration on CircleCI):
log.txt

@RReverser
Copy link
Collaborator Author

RReverser commented Dec 2, 2022

Maybe sbrk is related to #10006? It was closed as stale, but I guess it's still an issue that, for whatever reason, manifests in Firefox more frequently than elsewhere.

Doesn't explain others though.

@RReverser
Copy link
Collaborator Author

OTOH I wonder if it's something that would be accidentally fixed by my changes in #18305.

One of the things I noticed and had to do there is manually start executing messages only after Module variable has been definitely assigned. It became a pretty noticeable issue in Firefox due to my changes to postMessage scheduling in the same PR, but now I wonder if there was always a racy issue, just less noticeable before.

A bit suspiciously, that PR also passed on the first run, but it still might be all a coincidence...

@RReverser
Copy link
Collaborator Author

Although no, that should've affected only modularize builds, and those tests are not it...

RReverser added a commit that referenced this issue Dec 2, 2022
Extracted some changes out of #18305 that report Worker as ready only once runtime is initialized *and* `Module` variable is assigned.

Want to see if it incidentally also helps with #18307.
@RReverser
Copy link
Collaborator Author

OTOH I wonder if it's something that would be accidentally fixed by my changes in #18305.

Just tested that theory by extracting those changes separately and, no, that doesn't help. Must be some other reason, although I don't see what's common between specifically those 5 tests.

@kleisauke
Copy link
Collaborator

Looking at the log, I think(?) that test_pthread_run_on_main_thread makes the whole browser unresponsive, causing subsequent tests to fail as well. Which explains that this is only a issue on Firefox, because on Chrome this test is already disabled (see commit 6ea085c).

As for a possible fix, perhaps commit 25a6cb2 in PR #13007 would fix this? Just a wild guess, it's based upon that this test makes heavily use of those helpers in <emscripten/threading.h>, which would end-up calling emscripten_wait_for_call_v() in most cases.

That PR also includes some other goodies, such as commit fd05b36. Sorry for not further upstreaming those commits, I sometimes have limited spare-time. :(

@kleisauke
Copy link
Collaborator

... as always, I could certainly use some help in upstreaming these patches. FWIW, I'm currently tracking this in the "Upstream patches" card at https://github.com/users/kleisauke/projects/1.

@kleisauke
Copy link
Collaborator

As for a possible fix, perhaps commit 25a6cb2 in PR #13007 would fix this?

Though, I think the intention of PR #15681 was to deprecate emscripten_wait_for_call_v() in favor of the proposed APIs mentioned in comment #15705 (comment).

@RReverser
Copy link
Collaborator Author

As for a possible fix, perhaps commit 25a6cb2 in PR #13007 would fix this?

Though, I think the intention of PR #15681 was to deprecate emscripten_wait_for_call_v() in favor of the proposed APIs mentioned in comment #15705 (comment).

Ok I'm not familiar with that side of the things so tagging @tlively.

sbc100 added a commit that referenced this issue Dec 3, 2022
Hopefully this one test was causing all the flakiness.

Closes: #18307
sbc100 added a commit that referenced this issue Dec 3, 2022
Hopefully this one test was causing all the flakiness.

Closes: #18307
@tlively
Copy link
Member

tlively commented Dec 6, 2022

emscripten_wait_for_call_v should be implemented in terms of the new proxying API (with the system proxying queue), so it's not inherently bad to use, but should only be used for things that act like signal handlers. Looking at the test_pthread_run_on_main_thread test, it looks like it's calling printf, which internally acquires locks. My guess is that printfs are racing, and when one acquires the lock, the system proxying queue is pumped and another starts running in the middle of the first, so the main thread is deadlocking on itself.

@RReverser
Copy link
Collaborator Author

Looks like test_pthread_proxying_cpp is also flaky, probably for similar reasons. A bit hard to tell as that test seems to combine several in one file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants