Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launching several of nodes at the same time causes 100% CPU load and hangup #408

Closed
Yadunund opened this issue Jan 10, 2025 · 16 comments
Closed
Labels
bug Something isn't working

Comments

@Yadunund
Copy link
Member

It has been reported that starting many nodes at the same time in separate processes (order of hundreds), can lead to a spike in CPU usage and hangups.

Zettascale has identified the cause of the issue to related to re-scanning the list of network interfaces on the host machine for each node and short timeouts while doing do. A fix is underway.

@Yadunund Yadunund added the bug Something isn't working label Jan 10, 2025
@YuanYuYuan
Copy link
Contributor

YuanYuYuan commented Jan 16, 2025

The network interface scan has been significantly improved by eclipse-zenoh/zenoh#1704.

@JEnoch
Copy link
Contributor

JEnoch commented Jan 16, 2025

eclipse-zenoh/zenoh#1717 will also significantly redeuce CPU usage at launch time.

@Tacha-S
Copy link

Tacha-S commented Jan 22, 2025

It seems like there has been significant improvement with Zenoh. 🎉

However, when running the following command:

ros2 launch turtlebot4_gz_bringup turtlebot4_gz.launch.py slam:=true nav2:=true rviz:=true

the following errors are still appearing:

...
[joint_state_publisher-4] [ERROR][rmw_zenoh_cpp]: topic name /rosout not found in topic_map. Report this.
[static_transform_publisher-45] [ERROR][rmw_zenoh_cpp]: topic name /parameter_events not found in topic_map. Report this.
[rviz2-57] [ERROR][rmw_zenoh_cpp]: topic name /controller_manager/load_controller not found in topic_map. Report this.
[rviz2-57] [ERROR][rmw_zenoh_cpp]: qos :2:,10:,:,:,, not found in for topic type controller_manager_msgs::srv::dds_::LoadController_. Report this.
[static_transform_publisher-45] [ERROR][rmw_zenoh_cpp]: topic name /controller_manager/list_controllers not found in topic_map. Report this.
[static_transform_publisher-45] [ERROR][rmw_zenoh_cpp]: qos :2:,10:,:,:,, not found in for topic type controller_manager_msgs::srv::dds_::ListControllers_. Report this.
...

Do you know anything about this?

@Yadunund
Copy link
Member Author

@Tacha-S could you provide more details. Specifically,

  • OS version
  • Version of ROS 2 and how you installed it (ie source install vs binaries)
  • Version of rmw_zenoh and how you installed it (source vs binaries)
  • Detailed steps on how you built your workspace to run that command.
  • Any environment variables that may have been exported

@Tacha-S
Copy link

Tacha-S commented Jan 24, 2025

  • OS version
    • Ubuntu 24.04
  • Version of ROS 2 and how you installed it (ie source install vs binaries)
    • Jazzy (binaries)
  • Version of rmw_zenoh and how you installed it (source vs binaries)
  • Detailed steps on how you built your workspace to run that command.
sudo apt install ros-jazzy-turtlebot4-simulator ros-jazzy-irobot-create-nodes
ros2 launch turtlebot4_gz_bringup turtlebot4_gz.launch.py slam:=true nav2:=true rviz:=true
  • Any environment variables that may have been exported
export RMW_IMPLEMENTATION=rmw_zenoh_cpp
export ZENOH_ROUTER_CONFIG_URI=$HOME/zenoh_router_config.json5
export ZENOH_SESSION_CONFIG_URI=$HOME/zenoh_session_config.json5
export ROS_DOMAIN_ID=67

configs

@mpollayil
Copy link

Do you have any updates on this? We are also experiencing the same issue.

Image

This happens, for example, when we launch some nav2 nodes (but it's not related to nav2).

Any help or insights would be appreciated!

@kulkarni-raunak
Copy link

We are having the same issues on a risc-v board. Is this related to starting processes independent of each other ?

@boxanm
Copy link

boxanm commented Feb 4, 2025

We had a similar issue seeing many topic name /... not found in topic map error messages on our robot when launching complex software packages (SLAM, arm control) through ROS2 launch files. We solved the problems by upgrading our hardware to a more powerful computer. Providing Zenoh with few extra CPU cores did really improved our experience with the RMW.

@Yadunund
Copy link
Member Author

Yadunund commented Feb 4, 2025

#446 should fix these issues. Could you compiling from source against that branch and try again?

@Yadunund
Copy link
Member Author

Yadunund commented Feb 4, 2025

@Tacha-S I just tried your example with the latest jazzy branch of rmw_zenoh built from source and I am unable to reproduce the problem any longer.

I'll close this ticket now but let me know if this is still an issue and we can re-open it.

@Yadunund Yadunund closed this as completed Feb 4, 2025
@Tacha-S
Copy link

Tacha-S commented Feb 5, 2025

It's not resolved. 😢

[static_transform_publisher-6] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/get_parameters not found in topic_map. Report this.
[static_transform_publisher-6] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/get_parameters not found in topic_map. Report this.
[static_transform_publisher-6] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/get_parameter_types not found in topic_map. Report this.
[static_transform_publisher-6] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/get_parameter_types not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /controller_manager/configure_controller not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: qos :2:,10:,:,:,, not found in for topic type controller_manager_msgs::srv::dds_::ConfigureController_. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /controller_manager/switch_controller not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: qos :2:,10:,:,:,, not found in for topic type controller_manager_msgs::srv::dds_::SwitchController_. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/describe_parameters not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/describe_parameters not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/get_parameters not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/get_parameters not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/get_parameter_types not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/get_parameter_types not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/get_parameter_types not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/get_parameter_types not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /controller_manager/configure_controller not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: qos :2:,10:,:,:,, not found in for topic type controller_manager_msgs::srv::dds_::ConfigureController_. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /controller_manager/switch_controller not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: qos :2:,10:,:,:,, not found in for topic type controller_manager_msgs::srv::dds_::SwitchController_. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/describe_parameters not found in topic_map. Report this.
[parameter_bridge-18] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/describe_parameters not found in topic_map. Report this.
[collision_monitor-54] [ERROR][rmw_zenoh_cpp]: topic name /controller_manager/configure_controller not found in topic_map. Report this.
[collision_monitor-54] [ERROR][rmw_zenoh_cpp]: qos :2:,10:,:,:,, not found in for topic type controller_manager_msgs::srv::dds_::ConfigureController_. Report this.
[collision_monitor-54] [ERROR][rmw_zenoh_cpp]: topic name /controller_manager/switch_controller not found in topic_map. Report this.
[collision_monitor-54] [ERROR][rmw_zenoh_cpp]: qos :2:,10:,:,:,, not found in for topic type controller_manager_msgs::srv::dds_::SwitchController_. Report this.
[static_transform_publisher-6] [ERROR][rmw_zenoh_cpp]: topic name /spawner_diffdrive_controller/list_parameters not found in topic_map. Report this.

@Tacha-S
Copy link

Tacha-S commented Feb 5, 2025

@boxanm I am using 12th Gen Intel(R) Core(TM) i7-12700, did you solve it using a more powerful CPU?

@Yadunund
Copy link
Member Author

Yadunund commented Feb 7, 2025

@Tacha-S apart from those error messages, is the application working?

From a recent discussion with Zettascale we learnt that Zenoh does not guarantee ordering of liveliness tokens so it is possible for the deletion token to be received before the one for its creation. This is especially possible for short-lived nodes like the controller spawner whose job is to load a controller into the controller manager (iirc). Hence, i've also opened #454 to update the behavior to not print this error any longer.

@Tacha-S
Copy link

Tacha-S commented Feb 7, 2025

I see, that makes sense.
In that case, it's quite possible that the error log appeared, but the system was actually functioning correctly.
It seems difficult to verify everything, but I'll check the pub/sub of the important topics.
Thank you for sharing the information.

@Yadunund
Copy link
Member Author

Yadunund commented Feb 7, 2025

@Tacha-S glad to know that the system might have been functioning correctly. Keep us posted if you see any issues otherwise.

@Tacha-S
Copy link

Tacha-S commented Feb 12, 2025

#455 sometimes occurs, but not always.
In addition, during a launch that puts a higher load on the CPU, the following error occurred:

[create-3] thread '<unnamed>' panicked at /home/gisen/.cargo/git/checkouts/zenoh-cc237f2570fab813/e4ea6f0/commons/zenoh-runtime/src/lib.rs:154:21:
[create-3] The Thread Local Storage inside Tokio is destroyed. This might happen when Zenoh API is called at process exit, e.g. in the atexit handler. Calling the Zenoh API at process exit is not supported and should be avoided.
[create-3] note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants