Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky lifecycle node tests #1606

Merged
merged 5 commits into from
Mar 30, 2021
Merged

Fix flaky lifecycle node tests #1606

merged 5 commits into from
Mar 30, 2021

Conversation

jacobperron
Copy link
Member

Apparently, the topics and services that LifecycleNode provides are not
available immediately after creating a node.

Fix flaky tests by accounting for some delay between the LifecycleNode
constructor and queries about its topics and services.

Fixes #1605

Apparently, the topics and services that LifecycleNode provides are not
available immediately after creating a node.

Fix flaky tests by accounting for some delay between the LifecycleNode
constructor and queries about its topics and services.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@jacobperron jacobperron self-assigned this Mar 30, 2021
Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

rclcpp_lifecycle/test/test_lifecycle_node.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit inline, but I don't feel strongly about it. Seems good to me with green CI.

rclcpp_lifecycle/test/test_lifecycle_node.cpp Outdated Show resolved Hide resolved
Signed-off-by: Jacob Perron <jacob@openrobotics.org>
Signed-off-by: Jacob Perron <jacob@openrobotics.org>
Signed-off-by: Jacob Perron <jacob@openrobotics.org>
Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@jacobperron
Copy link
Member Author

jacobperron commented Mar 30, 2021

PR job failure looks unrelated (a test failure in rclcpp). Here's CI for rclcpp_lifecycle:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status (unrelated CMake warning)

@jacobperron jacobperron merged commit f986ca3 into master Mar 30, 2021
@delete-merged-branch delete-merged-branch bot deleted the jacob/fix_1605 branch March 30, 2021 21:42
@jacobperron
Copy link
Member Author

jacobperron commented Mar 30, 2021

I just noticed that this rclcpp test is also flaky due to a similar pattern of creating a ROS entity and then immediately checking that it exists:

auto publisher = node->create_publisher<test_msgs::msg::BasicTypes>(topic_name, qos);
// List should have one item
auto publisher_list = node->get_publishers_info_by_topic(fq_topic_name);
ASSERT_EQ(publisher_list.size(), (size_t)1);

Seems like this pattern is not reliable since the switch to CycloneDDS. I'm not sure we guarantee anywhere that the graph query API will contain entities that were created right before the query, even if they were created by the same node making the query.

@clalancette
Copy link
Contributor

Seems like this pattern is not reliable since the switch to CycloneDDS. I'm not sure we guarantee anywhere that the graph query API will contain entities that were created right before the query, even if they were created by the same node making the query.

Hm, good point. @eboasson if you get a chance, could you share some thoughts here? We don't have any guarantees around this, but the behavior is different than we used to see. So we're wondering if this is expected, and we should update the tests, or if it indicates an issue elsewhere.

@eboasson
Copy link
Contributor

eboasson commented Apr 1, 2021

@clalancette I checked, the reason is that the Cyclone RMW layer updates the graph layer asynchronously from the discovery thread. There's no particular reason the create/destroy subscription and publisher operations can't update the graph synchronously, one would just have to make sure to filter out any local entities in the discovery thread.

@clalancette
Copy link
Contributor

@clalancette I checked, the reason is that the Cyclone RMW layer updates the graph layer asynchronously from the discovery thread. There's no particular reason the create/destroy subscription and publisher operations can't update the graph synchronously, one would just have to make sure to filter out any local entities in the discovery thread.

Thanks for looking into it, it is appreciated. So it sounds this isn't unexpected, it is just different from what we had before. I guess in order to be robust to different implementations, the tests really shouldn't expect that the topics and services are immediately available. @jacobperron does that make sense to you?

@jacobperron
Copy link
Member Author

Makes sense to me. I'm working on adding API to rcl to make this kind of testing pattern easier, e.g. rcl_wait_for_publishers(). Stay tuned...

@jacobperron
Copy link
Member Author

Stay tuned...

ros2/rcl#907

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test_lifecycle_node tests
5 participants