-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sock_async_event: fix race-condition #13438
sock_async_event: fix race-condition #13438
Conversation
If a new event is fired during the execution of the event callback, `event->type` might change. However as `event->type` is set to 0 after the execution of the callback, that leads to it being 0 on the next round of the event handler's execution, leading in the event getting lost.
Backport for 2020.01 probably not needed as no support for TCP for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot reproduce the race conditions. Therefore I have set it to 'request changes' to avoid accidental merging.
I don't understand your reasoning for blocking this then... Even if you can't reproduce, isn't it obvious that a race condition can occur and that this PR is a fix for that? Why should a merge be bad, even if accidentally? |
(can't reproduce it with #13427 either btw, most likely because my latest fixes hardened the backend itself against the race condition. Nevertheless, it still can occur and should be fixed.) |
Maybe, I missed something but I can't reproduce the race condition with PR #13427, neither with Without this PR I can observe following behavior when I leave
I can reconnect afterwards. With this PR on top of PR #13427, I observer the following:
But, at least I can reconnect afterwards. |
As said in #13438 (comment), the behavior seems to be more strange with this PR than before. |
This should have been fixed with 24ce2ba. Are you sure you used the most recent version? |
So... to prevent the garbage output and still have the race condition you need to revert 54069b1... |
OK, that's my fault, the last commit in PR #13427 was 54069b1. I will check again. |
Here is what I got:
|
(edited comment as I copy-pasted something wrongly 😅) |
Note the version strings that should make clear that I use the version I am actually claiming to use. Is this justification enough to have a bug fix for my own code merged 😉? |
I can confirm that #13427 resetted to aaa644b gives an error and I can't reconnect:
I can further confirm that this PR on top of aaa644b gives some garbage but I can reconnect:
I can also confirm that this PR on top of the last commit of PR #13427 works as expected.
|
BTW, I saw this
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes sense. The problem desribed in #13438 (comment) seems not to be related to this PR. ACK
I am pretty sure this PR has nothing to do with it or in the worst case just reveals a problem with #13427. From a code perspective this PR should be ready to go. |
Yes, but would you try to reproduce the behavior described in #13438 (comment)? |
Already was able to :-). It is definitely something being wrong with #13427 (the |
Contribution description
If a new event is fired during the execution of the event callback,
event->type
might change. However asevent->type
is set to 0 after the execution of the callback, that leads to it being 0 on the next round of the event handler's execution, leading in the event getting lost.Testing procedure
make -C tests/gnrc_sock_async_event flash test
should still work with a board of your choice.tests/lwip
with lwip: provide sock_async support #13427 merged and start a TCP server withtcp server start 1337
netcat
(e.g. nc fe80::0200:dead:beef:affe%tapbr0 1337 when usingnative
)Ctrl+C
innetcat
is registered and printed to stdout. You can reconnect to the TCP server by starting a newnetcat
process.Ctrl+C
innetcat
is not registered. Reconnecting to the TCP server is not possible (accept will error with an-ENOMEM
).Issues/PRs references
Discovered in #13427.