Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds design document for Deadline, Liveliness, and Lifespan. #212
Adds design document for Deadline, Liveliness, and Lifespan. #212
Changes from 1 commit
70cf9aa
05ec0a5
33431f5
26c08be
2715a7b
de871b0
50ebf99
3f9aad6
f29b7be
2de4243
3e74964
2b4b937
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a behavior that DDS ensures (like in RTI's request/reply API)? or is it something that we would ensure in
rcl
and above? (presumably by emulating the behavior of deadline QoS for topics)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I looked before I did not see anything in the DDS RPC spec about deadline behavior for the request/reply pattern. This is something that would need to be tracked in the ROS layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjwwood, @nburek: There is LatencyBudged which is unfortunately just a hint to the DDS implementation and not enforced.
Maybe it makes sense to use separate two semantically different things by calling them differently in ROS 2:
The notation of a latency budget could then implemented enforceable on the ROS 2 level for pub/sub, RPC and actions as an additional QOS. Which could also be made even more useful by allowing the user to select between sender timestamp and data timestamp (stored in ROS 2 stamped types) for discerning if the latency budget was exceeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DDS QoS settings such as deadline and liveliness map to instances. So this sentence is only true if one keeps the semantics consistent with DDS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relates to a couple of your comments above. I need to do a little more research to figure out the best approach for addressing your concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you maybe clarify why would we be tied to a particular DDS implementation? Because some (e.g. OpenSplice, RTI Connext Micro) do not implement https://www.omg.org/spec/DDS-RPC/About-DDS-RPC/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining deadline for Services in terms of the underlying topics ties the rmw implementation to the DDS style of using multiple pub/sub channels for RPC. So it's not that it ties you to a particular DDS vendor, but rather that it ties you to a DDS style implementation of RMW because it is defining implementation details when it doesn't need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are multiple publishers but each providing a subset of the necessary data model?
In DDS liveliness information is provided with the SampleInfo
Also, lifecycle impacts data buffering:
https://community.rti.com/static/documentation/connext-dds/5.2.0/doc/api/connext_dds/api_cpp2/classdds_1_1core_1_1policy_1_1WriterDataLifecycle.html#details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be covered in a follow up issue (the same one as for deadline or in a separate one).
I not sure what @deeplearningrobotics is speaking about here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can create a follow up issue here as well, but this sounds like a message composition feature to me. I know that DDS allows you to compose data from multiple publishers into a single subscription, but does ROS have plans to allow that as well? If not, then I don't think it makes sense to open an issue to update the design for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nburek I'm not sure what you mean by composition, since you can definitely have multiple publishers per subscription (many to one/many).
I think @deeplearningrobotics means that liveliness is based on the instance, so in the case that we have one instance (most of the time, i.e. with key-less types) the liveliness isn't lost unless all the publishers stop asserting liveliness. If at least one is still asserting it, then it will not drop. So it may not be a reliable way to detect that an individual publisher is gone, but instead that all publishers for a particular instance is gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may just need rewording here, because we're not doing anything special or different from DDS, we're just using a single instance. But that does not indicate a single publisher, which is kind of what this verbiage seems to be saying to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump, if we can address this we can merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a change to the wording to indicate that the subscriber receives an event when there are no publishers alive to continue asserting liveliness on the topic. I did not specifically mention instances since it's assumed through the doc that ROS 2 only supports single instances and that multiple instances of a topic based on a key need further work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this second option dangerous. Experience shows that if users have the option they will use it and they will ignore warnings on the status events, out of ignorance or out of not understanding the limitation. I'd vote for raising an exception as being the only option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising an exception (or returning a non-SUCCESS error code as the case may be) then requires every rmw implementation to implement every QoS policy or risk nothing being able to run. Yes, you now have the risk of users ignoring the warnings, but it also allows the application to actually run in cases where you don't actually care.
For example, I imagine that a large number of commonly used Nodes are going to provide LIVELINESS_MANUAL_TOPIC to allow the most verbose tracking of liveliness by consumers even though it isn't really required for them to run. There may be an rmw implementation that a user wants to use for their particular use case that does not implement the liveliness QoS policy. The user may have decided that they don't actually need liveliness tracked in their system but do need some other feature offered by this particular rmw implementation. In that scenario, throwing an exception means that the user will not be able to use any of those common Nodes that offer that offer liveliness because the thrown exception would prevent them from starting up. On the other hand, going the route of only making this a warning allows the Nodes to still run without the use of liveliness and the user can still use them.
I guess the tradeoff here is that by not using exceptions we allow Nodes to be more portable out of the box when the implementor doesn't fully handle the situation when an rmw implementation doesn't support a QoS policy. However, it does push some burden onto the end user of the Node to understand the impact of their rmw choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the feature is not implemented uniformly across middleware implementations, then I question it's value in the first place. Making the API optional just makes user code more complicated.
In the case of emitting warnings, I think that's simply not good enough. If I write a program and expect the deadline (for example) to be respected, but it fails to do so, only emitting a warning, then my program may no longer be valid. Instead, I'd like the API to throw (or return an error code), and if you want to write code that can use it optionally, then add an API call which can be used to check and see if it is not implemented, in which case the user can avoid calling it, or configure their system differently. Or they can just call it and check for the error state, catch it / handle it and continue with an alternative code path that doesn't require it.
In general, however, I would prefer to avoid optional components of the API due to the complexity they add at all layers of the system. Instead we should aim to only include features in the API that are useful enough to require all middleware implementations to support one way or the other. This contributes to my personal preference of including as few QoS settings as we can manage and still do useful stuff. The breadth of configuration is useful, but makes portability difficult and implementations complicated. Having optional API's try's to address this, but fails to avoid the complexity in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be one way or the other and guaranteed by rmw, because I don't think it will acceptable to say "sometimes you'll get all the events, but in other situations you may not". The easy solution is to only ever have the latest event available. Again we can discuss this in more detail in a live meeting.