-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving jitter through the driver and middleware stack for data processors #41
Improving jitter through the driver and middleware stack for data processors #41
Conversation
The driver now gives up ownership of the PointCloud message in the pointcloud_processor at publish time to allow for middleware optimizations to include the elimination of unnecessary copies. Several tests on my machine show order of magnitude jitter improvements through the driver + middleware stack both out-of-process and while running as a component to sub-millisecond levels.
This looks good, it'll go in as is. |
I believe it has to do with cutting down on the number of data copies and allocations once the data leaves the driver. As the code was written prior, the I have to play a bit with the intra-process comms. I have not gotten this to give us true zero-copy yet but I haven't looked that hard. I will spend some time with that next week as it is relevant to what we are currently working on. |
That begs the question to me then: why does ROS2 even allow you to publish a non-pointer type? It seems like that's a reasonable burden to put on developers for that much increase in stability. |
I certainly like the smart pointers as a matter of taste as ownership semantics are clearly communicated. It is a facility provided to us by the language and we should exploit it. I suppose the same performance could be achieved by casting a non-pointer type to an r-value reference, however, that can silently degrade to an l-value reference and you would not realize it until the performance became a problem. The smart pointers make the intention clear. |
I should also note that I have done a bit of tuning on my machine (DDS, UDP buffer sizes, etc.), however, the delta shown in those plots are simply as a result of the change in the way we are publishing through the driver. |
Let me know if you want me to handle the merge. |
@wjwwood maybe something to consider in your API review for rclcpp. I think it would make alot of sense to force the use of smart pointers if this is the kind if impact it can make. |
The reason why we allow a reference is to allow the user to have messages on the stack if they want or to reuse a message loop to loop. Because you give up ownership you need to allocate a new one each time. You could also potentially get a similar performance improvement by using the middleware publisher allocation feature which lets the middleware preallocate resources and reuse them on each publish call rather than needing to allocate resources each publish. Not all the middleware implementations leverage this optimization but they could in the future. @mjcarroll worked on that feature maybe he could point you to an example, I’m on my phone atm and don’t have an example at hand. For the zerocopy intraprocess you just need to publish and subscribe with unique ptr and ensure there’s only one publisher and subscription pair. One further thing you could do is use the loaned message api where you can get a message from the middleware, fill it out, and then publish it (returning ownership to the middleware). This can allow the middleware to do zero copy, even Between processes. Iceoryx does this, @Karsten1987 gave a ROSCon talk about this. |
Ah, I do remember us talking about that. Thanks for the reminder. Now that I'm in front of a computer, I'm scouring all of the navigation repos for instances where we don't do that to update to work with unique_ptrs at the bare minimum. We play around with too many pointclouds/images/maps to not use that boost. I appreciate the detailed response. I think I've covered the major bases with tickets where updates are required. |
Hey @wjwwood thanks for your comments. Quick point of clarity on the zero-copy strategy...
This makes sense that zero-copy would work in this scenario. I assume when there is a 1-N (producer->consumer) topology, subscriptions to a Also, something basic related to this that maybe you can help with. Is it a true statement that to get zero-copy the This AM, I wanted to see if I could get this to work correctly. However, I ran into a problem. The driver is implemented as a managed (lifecycle) node. As the driver was coming up and an external maanger was pushing it through its FSM, I got the following exception:
I got the exact same behavior when 1) running the driver standalone (not in a component container) and driving the FSM via My direct question is: is it possible to do zero-copy / intraprocess comms with managed nodes? If yes, I think there is a fundamental piece I am missing. Any insights you can lend are really appreciated. Pinging @mjcarroll on this too as I think he is pretty deep on these topics as well -- Hi Mike :-) The one particular line of the error above that has me puzzled is:
I know what it is trying to say, but, is there a way that I can communicate to the middleware something like: "hey, do intraprocess if you can but for things like messages whose QoS preclude it, just take the interprocess approach." All we really care about are zero-copy semantics for the data, those with
Yes. I am looking forward to getting Iceoryx set up. It gives us the best of both worlds: Zero-copy / low latency data transformation pipelines with 3D LiDAR scans and the loose coupling / modularity of using out-of-process nodes. |
@tpanzarella your charts are very compelling, I've be interested to see what that looks like with ROS1's Ouster Drivers. I think showing a side by side of "this than that" could be a really good propaganda piece for ROS2 docs / a Box Robotics blog post. |
In both the
pointcloud_processor
andscan_processor
(the latter is really insignificant) this patch has the driver give up ownership of the message at the time of data publishing. This allows for middleware optimizations to include the elimination of unnecessary copies. Several tests on my machine show order of magnitude jitter improvements through the driver + middleware stack both out-of-process and while running as a component to sub-millisecond levels. Here are some data:NOTE:
msg_stamp
here is the time stamped on the message by the driver (so, how consistent is the LiDAR at delivering the data).recv_stamp
here is the time received by a subscriber (so, how long spent in the driver and through the ROS2 middleware before a subscriber can work with the data). I'm usingTIME_FROM_ROS_RECEPTION
as my timestamp mode so I can truly measure time spent in the driver and ROS2 middleware and not worry about in-LiDAR or network latency.Before this patch, here is what the data looked like for the pointcloud processor (these are for interprocess communication tests over 1,000 messages -- the effect is similar when run as a component):
Here is a quantile plot of the same:
Here are the end-to-end numbers for this same set of data (again, before this patch):
Here is what the data look like with this patch:
(Not perfect, there is much more we can do, but significantly improved in terms of jitter)
In terms of the end-to-end numbers, prior to this patch, the median end-to-end pointcloud delivery latency was 18.164 ms, however, the median absolute deviation (MAD) was 4.256 ms. With this patch, the median latency was 23.706 ms, but the MAD was only .252 ms. So, the patch significantly lessens the jitter which makes the system more stable. We can always tune to lessen absolute latency. Indeed, in my setup, the DDS could be further tuned.