Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastDDS nodes cannot communicate if another user run node in the past #3535

Closed
1 task done
ksuszka opened this issue May 24, 2023 · 1 comment · Fixed by #3924
Closed
1 task done

FastDDS nodes cannot communicate if another user run node in the past #3535

ksuszka opened this issue May 24, 2023 · 1 comment · Fixed by #3924
Assignees
Labels
in progress Issue or PR which is being reviewed

Comments

@ksuszka
Copy link

ksuszka commented May 24, 2023

Is there an already existing issue for this?

  • I have searched the existing issues

Expected behavior

Nodes using FastDDS should communicate correctly if run under the same user on the same machine.

Current behavior

Nodes using FastDDS on single machine cannot communicate if some other node was run in the past as other user.

Steps to reproduce

Open three terminals.

In the first terminal run:

docker run -it --ipc=host -u 1000 -e HOME=/tmp --name=first --rm ros:humble bash -c "for i in {1..150}; do (ros2 topic hz /foo &); done; sleep infinity"

In the second terminal run:

docker kill first
docker run -it --ipc=host -u 1001 -e HOME=/tmp --name=second --rm ros:humble ros2 topic pub /foo std_msgs/msg/String "data: foo"

In the third terminal run:

docker exec -it second bash -c ". /opt/ros/humble/setup.bash && ros2 topic hz /foo"

You should see stats for messages received from the publisher process, but you will most likely see "WARNING: topic [/foo] does not appear to be published yet"

Fast DDS version/commit

ros-humble-fastrtps-cmake-module/now 2.2.0-2jammy.20230112.142430 amd64 [installed,local]
ros-humble-fastrtps/now 2.6.4-1jammy.20230117.223829 amd64 [installed,local]
ros-humble-rmw-fastrtps-cpp/now 6.2.2-1jammy.20230117.225910 amd64 [installed,local]
ros-humble-rmw-fastrtps-shared-cpp/now 6.2.2-1jammy.20230117.225455 amd64 [installed,local]
ros-humble-rosidl-typesupport-fastrtps-c/now 2.2.0-2jammy.20230112.145514 amd64 [installed,local]
ros-humble-rosidl-typesupport-fastrtps-cpp/now 2.2.0-2jammy.20230112.145146 amd64 [installed,local

Platform/Architecture

Other. Please specify in Additional context section.

Transport layer

Default configuration, UDPv4 & SHM

Additional context

The issue is caused by the fact that the first process creates /dev/shm/* files as the first user. It is killed next but /dev/shm files remain. And now, if we try to do something as the second user it silently fails to communicate as it cannot access /dev/shm files.

The most disappointing thing is that it doesn't report any failure, it just silently ignores the fact that it doesn't work. This is the definition of unreliability.

XML configuration file

No response

Relevant log output

No response

Network traffic capture

No response

@Mario-DL
Copy link
Member

Mario-DL commented Sep 1, 2023

Hi @ksuszka

Thanks for the detailed report and reproducer. This is a known issue, abruptly killing the first container provokes that the shared memory segments are not properly cleaned in /dev/shm, hence reusing the node name with a non priviledged user, causes the issue. We will work on showing an informative error message.
BTW, we have recently worked in improving the shm transport #3763

@Mario-DL Mario-DL added in progress Issue or PR which is being reviewed and removed triage Issue pending classification labels Sep 1, 2023
@Mario-DL Mario-DL self-assigned this Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress Issue or PR which is being reviewed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants