Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: test_pthread_proxying_cpp #18353

Closed
RReverser opened this issue Dec 12, 2022 · 10 comments · Fixed by #18358
Closed

Flaky test: test_pthread_proxying_cpp #18353

RReverser opened this issue Dec 12, 2022 · 10 comments · Fixed by #18358

Comments

@RReverser
Copy link
Collaborator

Looks like test_pthread_proxying_cpp is flaky on the CI too, probably similar to #18307 / #18314.

@RReverser
Copy link
Collaborator Author

Probably related to #16569 too, although there it was reported only to hang with wasm2js. cc @tlively

@tlively
Copy link
Member

tlively commented Dec 12, 2022

Thanks, taking a look now.

@kleisauke
Copy link
Collaborator

kleisauke commented Dec 12, 2022

Interesting, so assuming this is related to these linked issues, it also affects Node.js and the new proxying API.

Perhaps we could experiment with removing those std::cout calls (to make sure it doesn't conflict with the printf-locks introduced in commit fdd486f)?

@RReverser
Copy link
Collaborator Author

(to make sure it doesn't conflict with the printf-locks introduced in commit fdd486f)?

@kleisauke Interesting catch. It started failing pretty suddenly and pretty frequently, so a recent change being a culprit seems likely.

@kleisauke
Copy link
Collaborator

FWIW, issue #16136 was the first one that reported this flaky behavior, a couple months after that commit was landed. At the time, I thought it only affected the browser tests (unfortunately, all those CI logs disappeared due to the retention time on CircleCI).

@kleisauke
Copy link
Collaborator

Given these tests started to fail more frequently, it could also be related to commit a37b4b2 and/or the current libc.need_locks logic.

I once had a idea to do this during compile-time, see for e.g. commit 27c2a3a. However, I abandoned it because that could lead to redundant file locking if no pthreads are running. Although in retrospect, I think this could resolve a similar issue in the Wasm Workers API (which I reproduced in PR #18201).

@kleisauke
Copy link
Collaborator

... let me know if help is needed with testing/implementation. Note that the above assumptions were made for triage purposes (and given that race conditions/deadlocks are generally difficult to debug).

@tlively
Copy link
Member

tlively commented Dec 12, 2022

I assume that this is due to my change to this test a few days ago (#18350), and not anything to do with musl or other system libraries.

@RReverser
Copy link
Collaborator Author

Whew, I started worrying this is due to my recent changes to Node.js pthreads.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 12, 2022

Interesting, so assuming this is related to these linked issues, it also affects Node.js and the new proxying API.

Perhaps we could experiment with removing those std::cout calls (to make sure it doesn't conflict with the printf-locks introduced in commit fdd486f)?

BTW a good way to avoid the C stdio subsystem completely in such tests its to use emscripten_log and/or emscripten_err which are defined in <emscripten/console.h>

tlively added a commit that referenced this issue Dec 12, 2022
The tests in test_pthread_proxying_cpp proxy functions that increment local
variables. For the async proxying functions, the proxying is followed by a wait
on a condition variable so the test can assert that the proxied work was
completed. However, the incrementing of the local variables was not protected by
the lock used with the condition variable, so it was possible for the variable
to be incremented and the condition variable notified after checking the wait
condition but before waiting. Since the condition variable would not be notified
again, the test would wait on the condition variable forever.

Fix the bug by taking the lock before incrementing the variable in tests where
this could cause problems.

Fixes #18353.
tlively added a commit that referenced this issue Dec 13, 2022
The tests in test_pthread_proxying_cpp proxy functions that increment local
variables. For the async proxying functions, the proxying is followed by a wait
on a condition variable so the test can assert that the proxied work was
completed. However, the incrementing of the local variables was not protected by
the lock used with the condition variable, so it was possible for the variable
to be incremented and the condition variable notified after checking the wait
condition but before waiting. Since the condition variable would not be notified
again, the test would wait on the condition variable forever.

Fix the bug by taking the lock before incrementing the variable in tests where
this could cause problems.

Fixes #18353.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants