Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[14385] Avoid data race on UDPChannelResource #2646

Merged
merged 4 commits into from
Apr 26, 2022
Merged

Conversation

MiguelCompany
Copy link
Member

@MiguelCompany MiguelCompany commented Apr 20, 2022

Description

This PR moves the close operation over the UDP socket from UDPChannelResource::release to the destructor of UDPChannelResource.

As the destructor is called after calling UDPChannelResource::clear, the receiving thread would have been already joined, so the thread sanitizer should not complain.

As a bonus, I introduced a small change on TCPChannelResourceBasic::disconnect that I think is responsible for leaving TCP sockets on a FIN_2 state sometimes.

@Mergifyio backport 2.5.x 2.3.x 2.1.x

Contributor Checklist

  • Commit messages follow the project guidelines.
  • The code follows the style guidelines of this project.
  • Tests that thoroughly check the new feature have been added/Regression tests checking the bug and its fix have been added.
  • N/A Any new/modified methods have been properly documented using Doxygen.
  • Fast DDS test suite has been run locally.
  • Changes are ABI compatible.
  • Changes are API compatible.
  • N/A Documentation builds and tests pass locally.
  • N/A New feature has been added to the versions.md file (if applicable).
  • N/A New feature has been documented/Current behavior is correctly described in the documentation.

Reviewer Checklist

  • Check contributor checklist is correct.
  • Check CI results: changes do not issue any warning.
  • Check CI results: failing tests are unrelated with the changes.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
@MiguelCompany MiguelCompany changed the title Avoid data race on UDPChannelResource [14385] Avoid data race on UDPChannelResource Apr 20, 2022
Comment on lines +49 to +50
asio::error_code ec;
socket()->close(ec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use close() instead? We are not performing any error checking here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because close() can throw, and we are inside a destructor 😨

@xbreak
Copy link

xbreak commented Apr 20, 2022

In UDPChannelResource::Receive() with the call socket()->receive_from(...): What happens now if there is nothing to receive?
When does call fail so the thread can detect that it should exit? The call has no timeout after all, and what previously triggered a read error due to closing the socket, no longer happens because thread is joined first?

@MiguelCompany
Copy link
Member Author

In UDPChannelResource::Receive() with the call socket()->receive_from(...): What happens now if there is nothing to receive? When does call fail so the thread can detect that it should exit? The call has no timeout after all, and what previously triggered a read error due to closing the socket, no longer happens because thread is joined first?

The trick is using shutdown, which happens to unblock the receive_from call, which returns an error

@xbreak
Copy link

xbreak commented Apr 21, 2022

In UDPChannelResource::Receive() with the call socket()->receive_from(...): What happens now if there is nothing to receive? When does call fail so the thread can detect that it should exit? The call has no timeout after all, and what previously triggered a read error due to closing the socket, no longer happens because thread is joined first?

The trick is using shutdown, which happens to unblock the receive_from call, which returns an error

Is that not also a data race then as ASIO document it to be thread unsafe?

@MiguelCompany
Copy link
Member Author

Is that not also a data race then as ASIO document it to be thread unsafe?

The docs do not explicitly mention shutdown, and in my local testing, the thread sanitizer did not complain. I did investigate it a bit further. and I think the race is on the socket descriptor. Calling close will mark the descriptor as ready to be reused, so if another socket is open, the same descriptor could be returned by the OS, and the recvfrom may receive data from the new socket. As the shutdown call is not freeing the resource (i.e. the descriptor) there is no race in this case.

@xbreak
Copy link

xbreak commented Apr 21, 2022

The docs do not explicitly mention shutdown, and in my local testing, the thread sanitizer did not complain. I did investigate it a bit further. and I think the race is on the socket descriptor. Calling close will mark the descriptor as ready to be reused, so if another socket is open, the same descriptor could be returned by the OS, and the recvfrom may receive data from the new socket. As the shutdown call is not freeing the resource (i.e. the descriptor) there is no race in this case.

It may or may not be safe for the tested platform and version, but from what I read it's explicitly unsafe as socket is documented as thread unsafe with the exception of a few enumerated operations. shutdown is not listed as safe and assuming so could break things for different OS or ASIO version.

The limitations of the synchronous interface is well known and most often the recommendation seems to be to use the async version to support cancellation or timeouts (see e.g. https://github.com/boostorg/asio/blob/develop/example/cpp11/timeouts/blocking_udp_client.cpp). This may come with an performance penalty though.

Excerpt from https://www.boost.org/doc/libs/1_76_0/doc/html/boost_asio/reference/ip__udp/socket.html:

Thread Safety
Distinct objects: Safe.
Shared objects: Unsafe.
Synchronous send, send_to, receive, receive_from, and connect operations are thread safe with respect to each other, if the underlying operating system calls are also thread safe. This means that it is permitted to perform concurrent calls to these synchronous operations on a single socket object. Other synchronous operations, such as open or close, are not thread safe.

Copy link
Contributor

@JLBuenoLopez JLBuenoLopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MiguelCompany
Copy link
Member Author

MiguelCompany commented Apr 26, 2022

@xbreak I opened chriskohlhoff/asio#1046 so the ASIO maintainers tell us whether the shutdown operation is thread-safe.

In the meantime, we are going to merge and backport this PR, since it solves the data race on the supported platforms (at least according to the sanitizers.

@MiguelCompany MiguelCompany merged commit 184e982 into master Apr 26, 2022
@MiguelCompany MiguelCompany deleted the hotfix/14367 branch April 26, 2022 09:35
@MiguelCompany
Copy link
Member Author

@Mergifyio backport 2.5.x 2.3.x 2.1.x

mergify bot pushed a commit that referenced this pull request Apr 26, 2022
* Refs #14367. Closing socket on UDPChannelResource destructor.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

* Refs #14367. Do not close socket on UDPChannelResource::release.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

* Refs #14367. Bonus: do not call socket->release() on TCP disconnect.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

* Refs #14367. Linters.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
(cherry picked from commit 184e982)
mergify bot pushed a commit that referenced this pull request Apr 26, 2022
* Refs #14367. Closing socket on UDPChannelResource destructor.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

* Refs #14367. Do not close socket on UDPChannelResource::release.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

* Refs #14367. Bonus: do not call socket->release() on TCP disconnect.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

* Refs #14367. Linters.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
(cherry picked from commit 184e982)
mergify bot pushed a commit that referenced this pull request Apr 26, 2022
* Refs #14367. Closing socket on UDPChannelResource destructor.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

* Refs #14367. Do not close socket on UDPChannelResource::release.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

* Refs #14367. Bonus: do not call socket->release() on TCP disconnect.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

* Refs #14367. Linters.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
(cherry picked from commit 184e982)
@mergify
Copy link
Contributor

mergify bot commented Apr 26, 2022

backport 2.5.x 2.3.x 2.1.x

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants