Performance using UCLIENT_PROFILE_MULTITHREAD #311

ted-miller · 2022-02-17T16:53:30Z

Micro-XRCE-DDS-Client: v2.1.0
Platform: Yaskawa Motoman YRC1000

I am running into an issue with performance when using multiple threads to publish concurrently. Have you guys done any performance benchmarks comparing single-threaded and multi-threaded communication?

I've got one task that is running an executor. It has a timer to publish a bunch of topics (joint_states, tf, etc), and also contains an action server (FollowJointTrajectory). With only that task is publishing, the executor works great. The timer triggers the publishers at a nice consistent interval of 40 ms.

However, when I receive an action goal, I start another task to publish the action-feedback messages. When this task sends a feedback message at the same time as one of the other publishers, the tasks "lock up" for about 1 second. I have traced it down to the uxr_lock function in multithread.c.

I am using a recursive mutex semaphore for the lock. I can see that one task is sitting inside uxr_lock until the other task calls uxr_unlock. So, it appears that the lock is working to protect data from multiple threads (as intended), but it's causing my publish rates to plummet.

So, I'm wondering if this is the expected behavior. Is there something I can do to improve upon this? Or do I need to rethink my approach to get all communication onto the same thread? (Or implement my own lock to prevent concurrent calls to publish. ) If you've got some benchmark data from your own testing, I'd be interested to see it.

Thanks!

The text was updated successfully, but these errors were encountered:

pablogs9 · 2022-02-18T07:23:30Z

Hello @ted-miller, this second seems to be related to the time that the micro-ROS RWM waits for confirmation from the agent when it publishes a topic.

In summary, when the micro-ROS RMW publishes a reliable topic it waits for RMW_UXRCE_PUBLISH_RELIABLE_TIMEOUT milliseconds to retrieve the ACKNACK from the micro-ROS Agent. We can see the action feedback publishing in the same way. This RMW_UXRCE_PUBLISH_RELIABLE_TIMEOUT is 1000 by default.

So this can be:

Some kind of interlock
A lack of confirmation from the agent side

Some ideas:

If you compile micro-ROS with RMW_UXRCE_PUBLISH_RELIABLE_TIMEOUT set to 500 does the lock time is reduced?
You also can use the Timing APIs for setting this timeout at runtime.
What happens if you set the publisher to best effort?
Which transport are you using?
Can you share your micro-ROS application so we can reproduce the issue?

ted-miller · 2022-02-18T16:34:10Z

Hi @pablogs9. Thanks for the reply.

This second seems to be related to the time that the micro-ROS RWM waits for confirmation from the agent when it publishes a topic. If you compile micro-ROS with RMW_UXRCE_PUBLISH_RELIABLE_TIMEOUT set to 500 does the lock time is reduced?

Yep. You're right about that.

Which transport are you using?

UDP

What happens if you set the publisher to best effort?

The delay goes away entirely. It also dramatically improved the amount of time it takes for me to publish all the topics. (Which makes sense if it's not waiting for confirmation.) I'm really liking this performance gain. Now I'm wondering if I can combine the best-effort policy with a TCP transport for reliability. I want to see if that can give me a good balance.

While changing my publishers over to best-effort, I didn't find a way to adjust the qos settings for the action-feedback-publisher. I see that the only qos setting is for the allocator. Is there a way to update update qos settings after initialization? Or would I need to make a custom fork of rcl to add a new rclc_action_server_init function?

Can you share your micro-ROS application so we can reproduce the issue?

Unfortunately my application is very heavily tied to specific hardware. I haven't taken the time to make it runnable on a PC.

But, given some previous issues that I had, I'm pretty sure that it occurs any time you have multiple threads publishing at the same time. So, I should be able to create a small test app to reproduce.

gavanderhoorn · 2022-02-18T17:09:26Z

I would be interested to know why the Agent takes so long to ack the reliable transmission (edit: in this particular case).

My assumption would be that should not be necessary, and perhaps points to something we're doing incorrectly, or something we've configured incorrectly.

pablogs9 · 2022-02-21T06:30:42Z

I'll wait for a reproducible use case.

gavanderhoorn · 2022-02-21T09:31:58Z

I was not implying this was a task for you @pablogs9. I just made a 'mental note' -- but as a comment.

ted-miller · 2022-02-21T20:23:32Z

I'm not entirely confident that this code is reproducing the issue exactly. But, the symptom is similar enough that I'm hoping to blame the difference on the platform differences.
(The 'difference' being the frequency of occurrence and the amount of time that the thread is blocked.)

I modified the complex_msg_publisher application from micro-ROS-demos/rclc. The main function creates a node and then starts a thread. That thread will create 3 publishers that publish in an infinite loop, with 10 ms between publishes. The main thread waits 6 seconds and then creates a single publisher with 20 ms between publishes.

(BTW: Semi-off-topic... If you attempt to create publishers at the same time on different threads, they will fail. This is the reason for the 6 second delay. The number 6 was an arbitrary choice; I did not measure.)

Before and after each publish, I broadcast a UDP text message. I am measuring execution time by monitoring the broadcast messages.

Open the wireshark capture
Add a display column for data.text
Set filter to data contains "Thread 1"
Change Time Display format to Seconds since previously displayed packet
Scroll through the list looking for abnormally large numbers
Repeat with filter data contains "Thread 2"

The attached capture was started at an arbitrary point after starting the demo application. The first anomaly occurs between packet 56 - 124. On "Thread 1", the publish routine takes nearly half a second.

multithread publisher.zip

pablogs9 · 2022-02-22T06:49:05Z

Hello, regarding the multithread entity creation: it is not safe to create entities concurrently as stated in the API documentation.

@Acuadros95 can you reproduce this issue?

Acuadros95 · 2022-02-22T08:30:38Z

I could not replicate this, had to modify a few things to compile the code though:
multithread.publisher.zip (Updated)

Create publishers on main thread, then start publish loops.
Use internal uxr_millis() and printf() to debug publish elapsed time instead of UDP debug mesages.

You can also check the publish rate with ros2 commands: ros2 topic hz
This will give you the max/min elapsed time between each msg on a topic:

root@mydevice:/uros_ws# ros2 topic hz /complex_message_1_1
average rate: 85.340
      min: 0.006s max: 0.020s std dev: 0.00068s window: 2839

Had to modify some flags on colcon.meta to compile the workspace, but nothing relevant.

Are you using the released v2.1.0 client commit? If thats the case you are missing relevant updates:

Fix for a bug with the message confirmation timeouts: Fix run session timeouts #299
Fix interlock issues: Fix multithread interlock #303

Both commits could help to solve your problem, please try again with the latest commit.

If the problem persist, share the agent output on debug mode (Launch with flag -v6) so we can discard problems on the Agent side.

gavanderhoorn · 2022-02-22T08:41:22Z

thanks @Acuadros95, we can try with post v2.1.0.

btw:

You can also check the publish rate with ros2 commands: ros2 topic hz

ros2 topic hz is really inconsistent in the rates it reports.

So much so that we have multiple Q&As on ROS Answers about this, and people writing a simple node to subscribe and calculate the message rate themselves.

gavanderhoorn · 2022-02-22T08:42:29Z

Just to clarify: you're basically suggesting to test with develop, correct?

Acuadros95 · 2022-02-22T08:43:25Z

Didn't know about that

ros2 topic hz is really inconsistent in the rates it reports.

And yes, you can use the current develop branch

Just to clarify: you're basically suggesting to test with develop, correct?

ted-miller · 2022-02-22T19:21:24Z

Both commits could help to solve your problem, please try again with the latest commit.

The initial testing looks good. My small test app (using my hardware) is working great with these update.

If the problem persist, share the agent output on debug mode (Launch with flag -v6) so we can discard problems on the Agent side.

If I run into any problems, I'll be sure to get some verbose output.

Thanks for the quick support!

gavanderhoorn · 2022-02-23T08:29:51Z

Just to get it clear: @ted-miller: what was the solution in the end?

Did you change everything to best-effort, or did the changes on the develop branch (with regular guaranteed QoS) solve the problems you were seeing?

pablogs9 · 2022-02-23T08:39:59Z

FYI: We will release v2.1.1 incorporating those fixes to the stable version in the next weeks.

ted-miller · 2022-02-23T15:03:27Z

I used the changes on the develop branch with default qos to resolve the issue.

I will certainly use some best-effort publishers going forward. But, they were not used for this test.

pablogs9 assigned Acuadros95 Feb 22, 2022

ted-miller closed this as completed Feb 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance using UCLIENT_PROFILE_MULTITHREAD #311

Performance using UCLIENT_PROFILE_MULTITHREAD #311

ted-miller commented Feb 17, 2022

pablogs9 commented Feb 18, 2022

ted-miller commented Feb 18, 2022

gavanderhoorn commented Feb 18, 2022 •

edited

Loading

pablogs9 commented Feb 21, 2022

gavanderhoorn commented Feb 21, 2022 •

edited

Loading

ted-miller commented Feb 21, 2022

pablogs9 commented Feb 22, 2022

Acuadros95 commented Feb 22, 2022 •

edited

Loading

gavanderhoorn commented Feb 22, 2022

gavanderhoorn commented Feb 22, 2022

Acuadros95 commented Feb 22, 2022

ted-miller commented Feb 22, 2022

gavanderhoorn commented Feb 23, 2022

pablogs9 commented Feb 23, 2022

ted-miller commented Feb 23, 2022

Performance using UCLIENT_PROFILE_MULTITHREAD #311

Performance using UCLIENT_PROFILE_MULTITHREAD #311

Comments

ted-miller commented Feb 17, 2022

pablogs9 commented Feb 18, 2022

ted-miller commented Feb 18, 2022

gavanderhoorn commented Feb 18, 2022 • edited Loading

pablogs9 commented Feb 21, 2022

gavanderhoorn commented Feb 21, 2022 • edited Loading

ted-miller commented Feb 21, 2022

pablogs9 commented Feb 22, 2022

Acuadros95 commented Feb 22, 2022 • edited Loading

gavanderhoorn commented Feb 22, 2022

gavanderhoorn commented Feb 22, 2022

Acuadros95 commented Feb 22, 2022

ted-miller commented Feb 22, 2022

gavanderhoorn commented Feb 23, 2022

pablogs9 commented Feb 23, 2022

ted-miller commented Feb 23, 2022

gavanderhoorn commented Feb 18, 2022 •

edited

Loading

gavanderhoorn commented Feb 21, 2022 •

edited

Loading

Acuadros95 commented Feb 22, 2022 •

edited

Loading