Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when calling get_unread_count() and destroying subscription #646

Closed
mauropasse opened this issue Nov 30, 2022 · 4 comments · Fixed by #657
Closed

Deadlock when calling get_unread_count() and destroying subscription #646

mauropasse opened this issue Nov 30, 2022 · 4 comments · Fixed by #657
Labels

Comments

@mauropasse
Copy link
Contributor

I've identified a deadlock when a subscription (having assigned an on_new_message_callback) is being destroyed, while is also getting a message from a publisher. I'm using the latest master branch of FastDDS.

Attached, an example to reproduce the issue and the resulting cleaned backtrace.

FastDDS-example-and-backtrace.zip

You can run the example with GDB, wait until it deadlocks, and ctrl+c to check where happened.

The issue seems to be in

return subscriber_info_->data_reader_->get_unread_count(true);

According to @jefferyyjhsu, one mutex belongs to rmw_fastrtps while the other to fast-dds.

This issue happens since this PR got merged: https://github.com/ros2/rmw_fastrtps/pull/633/files#diff-26246d1a82280de3805240670a0e83425979cd7598ac0eaad21db840ebab7d7dR118-R122

Besides this, I can also see that in custom_service_info.hpp , there is no check about if on_new_request_cb_ exists before calling it:

void
on_data_available(
eprosima::fastdds::dds::DataReader *) final
{
std::unique_lock<std::mutex> lock_mutex(on_new_request_m_);
auto unread_requests = get_unread_resquests();
if (0u < unread_requests) {
on_new_request_cb_(user_data_, unread_requests);
}
}

Let me know if you need any more details.

fujitatomoya added a commit to fujitatomoya/ros2_test_prover that referenced this issue Dec 5, 2022
@fujitatomoya
Copy link
Collaborator

I cannot reproduce the deadlock from sample code, but core crash as following.

stacktrace with debug build
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/root/ros2_ws/colcon_ws/install/prover_rclcpp/lib/prover_rclcpp/rmw_fastrtps_64'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fff50c71950 in ?? ()
[Current thread is 1 (Thread 0x7f2c32304640 (LWP 37624))]
(gdb) bt
#0  0x00007fff50c71950 in ?? ()
#1  0x000055df7a3cc575 in std::function<void (unsigned long)>::operator()(unsigned long) const (this=0x55df7bc2fda0, __args#0=1)
    at /usr/include/c++/11/bits/std_function.h:590
#2  0x000055df7a3c852c in rclcpp::SubscriptionBase::set_on_new_message_callback(std::function<void (unsigned long)>)::{lambda(unsigned long)#1}::operator()(unsigned long) const (__closure=0x55df7bc2fda0, number_of_messages=1) at /root/ros2_ws/colcon_ws/install/rclcpp/include/rclcpp/rclcpp/subscription_base.hpp:328
#3  0x000055df7a3dda9a in std::__invoke_impl<void, rclcpp::SubscriptionBase::set_on_new_message_callback(std::function<void (unsigned long)>)::{lambda(unsigned long)#1}&, unsigned long>(std::__invoke_other, rclcpp::SubscriptionBase::set_on_new_message_callback(std::function<void (unsigned long)>)::{lambda(unsigned long)#1}&, unsigned long&&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#4  0x000055df7a3d7741 in std::__invoke_r<void, rclcpp::SubscriptionBase::set_on_new_message_callback(std::function<void (unsigned long)>)::{lambda(unsigned long)#1}&, unsigned long>(rclcpp::SubscriptionBase::set_on_new_message_callback(std::function<void (unsigned long)>)::{lambda(unsigned long)#1}&, unsigned long&&) (__fn=...)
    at /usr/include/c++/11/bits/invoke.h:111
#5  0x000055df7a3d378a in std::_Function_handler<void (unsigned long), rclcpp::SubscriptionBase::set_on_new_message_callback(std::function<void (unsigned long)>)::{lambda(unsigned long)#1}>::_M_invoke(std::_Any_data const&, unsigned long&&) (__functor=..., __args#0=@0x7f2c32301ef0: 1) at /usr/include/c++/11/bits/std_function.h:290
#6  0x000055df7a3cc575 in std::function<void (unsigned long)>::operator()(unsigned long) const (this=0x55df7bc3cbe8, __args#0=1)
    at /usr/include/c++/11/bits/std_function.h:590
#7  0x000055df7a3cc871 in rclcpp::detail::cpp_callback_trampoline<void const*, unsigned long, void> (user_data=0x55df7bc3cbe8)
    at /root/ros2_ws/colcon_ws/install/rclcpp/include/rclcpp/rclcpp/detail/cpp_callback_trampoline.hpp:61
#8  0x00007f2c399a3dd5 in SubListener::on_data_available(eprosima::fastdds::dds::DataReader*) ()
   from /root/ros2_ws/colcon_ws/install/rmw_fastrtps_shared_cpp/lib/librmw_fastrtps_shared_cpp.so
#9  0x00007f2c38fe0961 in eprosima::fastdds::dds::DataReaderImpl::InnerDataReaderListener::on_data_available (this=0x55df7bc4ad28, writer_guid=..., first_sequence=..., 
    last_sequence=..., should_notify_individual_changes=@0x7f2c3230204e: false)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/subscriber/DataReaderImpl.cpp:904
#10 0x00007f2c38ec6a22 in eprosima::fastrtps::rtps::StatefulReader::NotifyChanges (this=0x55df7bc470d0, prox=0x55df7bc4faa0)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/reader/StatefulReader.cpp:1168
#11 0x00007f2c38ec666b in eprosima::fastrtps::rtps::StatefulReader::change_received (this=0x55df7bc470d0, a_change=0x55df7bc2d8a0, prox=0x55df7bc4faa0, 
    unknown_missing_changes_up_to=0) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/reader/StatefulReader.cpp:1099
#12 0x00007f2c38ec4bf8 in eprosima::fastrtps::rtps::StatefulReader::processDataMsg (this=0x55df7bc470d0, change=0x55df7bc0a730)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/reader/StatefulReader.cpp:597
#13 0x00007f2c38e0a429 in eprosima::fastrtps::rtps::StatefulWriter::intraprocess_delivery (this=0x55df7bc12ca0, change=0x55df7bc0a730, reader_proxy=0x55df7bc4e940)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/writer/StatefulWriter.cpp:473
#14 0x00007f2c38e0aef5 in eprosima::fastrtps::rtps::StatefulWriter::deliver_sample_to_intraprocesses (this=0x55df7bc12ca0, change=0x55df7bc0a730)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/writer/StatefulWriter.cpp:625
#15 0x00007f2c38e111dd in eprosima::fastrtps::rtps::StatefulWriter::deliver_sample_nts (this=0x55df7bc12ca0, cache_change=0x55df7bc0a730, group=..., 
    locator_selector=..., max_blocking_time=...) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/writer/StatefulWriter.cpp:2070
#16 0x00007f2c39299730 in eprosima::fastdds::rtps::FlowControllerImpl<eprosima::fastdds::rtps::FlowControllerSyncPublishMode, eprosima::fastdds::rtps::FlowControllerFifoSchedule>::add_new_sample_impl<eprosima::fastdds::rtps::FlowControllerSyncPublishMode> (this=0x55df7bab61a0, writer=0x55df7bc12ca0, change=0x55df7bc0a730, 
    max_blocking_time=...) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:1151
#17 0x00007f2c392968c6 in eprosima::fastdds::rtps::FlowControllerImpl<eprosima::fastdds::rtps::FlowControllerSyncPublishMode, eprosima::fastdds::rtps::FlowControllerFifoSchedule>::add_new_sample (this=0x55df7bab61a0, writer=0x55df7bc12ca0, change=0x55df7bc0a730, max_blocking_time=...)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:998
#18 0x00007f2c38e0a168 in eprosima::fastrtps::rtps::StatefulWriter::unsent_change_added_to_history (this=0x55df7bc12ca0, change=0x55df7bc0a730, max_blocking_time=...)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/writer/StatefulWriter.cpp:434
#19 0x00007f2c38e485c2 in eprosima::fastrtps::rtps::WriterHistory::notify_writer (this=0x55df7bc0ff20, a_change=0x55df7bc0a730, max_blocking_time=...)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/history/WriterHistory.cpp:123
--Type <RET> for more, q to quit, c to continue without paging--
#20 0x00007f2c38f8d581 in eprosima::fastrtps::rtps::WriterHistory::add_change_with_commit_hook<eprosima::fastdds::dds::DataWriterImpl::perform_create_new_change(eprosima::fastrtps::rtps::ChangeKind_t, void*, eprosima::fastrtps::rtps::WriteParams&, const InstanceHandle_t&)::<lambda(eprosima::fastdds::dds::DataWriterImpl::CacheChange_t&)> >(eprosima::fastrtps::rtps::CacheChange_t *, eprosima::fastrtps::rtps::WriteParams &, struct {...}, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1, 1000000000> > >) (this=0x55df7bc0ff20, a_change=0x55df7bc0a730, wparams=..., pre_commit=..., max_blocking_time=...)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/include/fastdds/rtps/history/WriterHistory.h:183
#21 0x00007f2c38f8d1c3 in eprosima::fastdds::dds::DataWriterHistory::add_pub_change_with_commit_hook<eprosima::fastdds::dds::DataWriterImpl::perform_create_new_change(eprosima::fastrtps::rtps::ChangeKind_t, void*, eprosima::fastrtps::rtps::WriteParams&, const InstanceHandle_t&)::<lambda(eprosima::fastdds::dds::DataWriterImpl::CacheChange_t&)> >(eprosima::fastrtps::rtps::CacheChange_t *, eprosima::fastrtps::rtps::WriteParams &, struct {...}, std::unique_lock<std::recursive_timed_mutex> &, const std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1, 1000000000> > > &) (this=0x55df7bc0ff20, change=0x55df7bc0a730, wparams=..., 
    pre_commit=..., lock=..., max_blocking_time=...) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/publisher/DataWriterHistory.hpp:139
#22 0x00007f2c38f86e70 in eprosima::fastdds::dds::DataWriterImpl::perform_create_new_change (this=0x55df7bc0fa70, change_kind=eprosima::fastrtps::rtps::ALIVE, 
    data=0x7f2c32302da0, wparams=..., handle=...) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/publisher/DataWriterImpl.cpp:947
#23 0x00007f2c38f87470 in eprosima::fastdds::dds::DataWriterImpl::create_new_change_with_params (this=0x55df7bc0fa70, changeKind=eprosima::fastrtps::rtps::ALIVE, 
    data=0x7f2c32302da0, wparams=...) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/publisher/DataWriterImpl.cpp:1021
#24 0x00007f2c38f86709 in eprosima::fastdds::dds::DataWriterImpl::create_new_change (this=0x55df7bc0fa70, changeKind=eprosima::fastrtps::rtps::ALIVE, 
    data=0x7f2c32302da0) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/publisher/DataWriterImpl.cpp:868
#25 0x00007f2c38f84fe4 in eprosima::fastdds::dds::DataWriterImpl::write (this=0x55df7bc0fa70, data=0x7f2c32302da0)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/publisher/DataWriterImpl.cpp:548
#26 0x00007f2c38f7ece1 in eprosima::fastdds::dds::DataWriter::write (this=0x55df7bc096d0, data=0x7f2c32302da0)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/publisher/DataWriter.cpp:84
#27 0x00007f2c399ed445 in rmw_fastrtps_shared_cpp::__rmw_publish(char const*, rmw_publisher_s const*, void const*, rmw_publisher_allocation_s*) ()
   from /root/ros2_ws/colcon_ws/install/rmw_fastrtps_shared_cpp/lib/librmw_fastrtps_shared_cpp.so
#28 0x00007f2c39ac6213 in rmw_publish (publisher=0x55df7bc06950, ros_message=0x7f2c32303310, allocation=0x0)
    at /root/ros2_ws/colcon_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/rmw_publish.cpp:34
#29 0x00007f2c3a291f77 in rcl_publish (publisher=0x55df7bc06690, ros_message=0x7f2c32303310, allocation=0x0)
    at /root/ros2_ws/colcon_ws/src/ros2/rcl/rcl/src/rcl/publisher.c:249
#30 0x000055df7a3d1ab6 in rclcpp::Publisher<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void> >::do_inter_process_publish (this=0x55df7ba4b730, 
    msg=...) at /root/ros2_ws/colcon_ws/install/rclcpp/include/rclcpp/rclcpp/publisher.hpp:453
#31 0x000055df7a3ce850 in rclcpp::Publisher<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void> >::publish<std_msgs::msg::String_<std::allocator<void> > > (this=0x55df7ba4b730, msg=...) at /root/ros2_ws/colcon_ws/install/rclcpp/include/rclcpp/rclcpp/publisher.hpp:297
#32 0x000055df7a3caf3d in MinimalPublisher::MinimalPublisher(rclcpp::NodeOptions)::{lambda()#1}::operator()() const (__closure=0x55df7bc141a0)
    at /root/ros2_ws/colcon_ws/src/ros2/ros2_test_prover/prover_rclcpp/src/rmw_fastrtps_646.cpp:18
#33 0x000055df7a435b80 in rclcpp::GenericTimer<MinimalPublisher::MinimalPublisher(rclcpp::NodeOptions)::{lambda()#1}, (void*)0>::execute_callback_delegate<MinimalPublisher::MinimalPublisher(rclcpp::NodeOptions)::{lambda()#1}, (void*)0>() (this=0x55df7bc14170) at /root/ros2_ws/colcon_ws/install/rclcpp/include/rclcpp/rclcpp/timer.hpp:244
#34 0x000055df7a432e87 in rclcpp::GenericTimer<MinimalPublisher::MinimalPublisher(rclcpp::NodeOptions)::{lambda()#1}, (void*)0>::execute_callback() (
    this=0x55df7bc14170) at /root/ros2_ws/colcon_ws/install/rclcpp/include/rclcpp/rclcpp/timer.hpp:230
#35 0x00007f2c3acbdb95 in rclcpp::Executor::execute_timer (timer=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<rclcpp::WallTimer<MinimalPublisher::MinimalPublisher(rclcpp::NodeOptions)::{lambda()#1}, (void*)0>, std::allocator<rclcpp::WallTimer<MinimalPublisher::MinimalPublisher(rclcpp::NodeOptions)::{lambda()#1}, (void*)0> >, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<rclcpp::WallTimer<MinimalPublisher::MinimalPublisher(rclcpp::NodeOptions)::{lambda()#1}, (void*)0>, std::allocator<rclcpp::WallTimer<MinimalPublisher::MinimalPublisher(rclcpp::NodeOptions)::{lambda()#1}, (void*)0> >, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr<rclcpp::TimerBase> (use count 3, weak count 1) = {...})
    at /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:656
#36 0x00007f2c3acbc5c7 in rclcpp::Executor::execute_any_executable (this=0x55df7bbe4a50, any_exec=...)
    at /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:518
#37 0x00007f2c3accd8a1 in rclcpp::executors::SingleThreadedExecutor::spin (this=0x55df7bbe4a50)
    at /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executors/single_threaded_executor.cpp:37
#38 0x000055df7a3c4d2e in operator() (__closure=0x55df7bc144c8) at /root/ros2_ws/colcon_ws/src/ros2/ros2_test_prover/prover_rclcpp/src/rmw_fastrtps_646.cpp:63
#39 0x000055df7a3c5558 in std::__invoke_impl<void, main(int, char**)::<lambda()> >(std::__invoke_other, struct {...} &&) (__f=...)
--Type <RET> for more, q to quit, c to continue without paging--
    at /usr/include/c++/11/bits/invoke.h:61
#40 0x000055df7a3c551b in std::__invoke<main(int, char**)::<lambda()> >(struct {...} &&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#41 0x000055df7a3c54c8 in std::thread::_Invoker<std::tuple<main(int, char**)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x55df7bc144c8)
    at /usr/include/c++/11/bits/std_thread.h:253
#42 0x000055df7a3c549c in std::thread::_Invoker<std::tuple<main(int, char**)::<lambda()> > >::operator()(void) (this=0x55df7bc144c8)
    at /usr/include/c++/11/bits/std_thread.h:260
#43 0x000055df7a3c5480 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<main(int, char**)::<lambda()> > > >::_M_run(void) (this=0x55df7bc144c0)
    at /usr/include/c++/11/bits/std_thread.h:211
#44 0x00007f2c3a0d92b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#45 0x00007f2c39e49b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#46 0x00007f2c39edba00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

@mauropasse
Copy link
Contributor Author

Hi @fujitatomoya, yes I forgot to mention. This example I created to reproduce easily the error you just got. The fix is ros2/rclcpp#2024 .
With that fix applied, then you should get the deadlock I mention in this issue.
Thanks for trying!

@MiguelCompany
Copy link
Collaborator

@mauropasse Please see #657 for a possible fix of the deadlock.

@mauropasse
Copy link
Contributor Author

Hi, I tested it today and it fixes the issue!
Thanks @MiguelCompany

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants