Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdp discovery coredump #3770

Closed
1 task done
midding opened this issue Aug 2, 2023 · 2 comments
Closed
1 task done

pdp discovery coredump #3770

midding opened this issue Aug 2, 2023 · 2 comments
Labels
in progress Issue or PR which is being reviewed need more info Issue that requires more info from contributor

Comments

@midding
Copy link

midding commented Aug 2, 2023

Is there an already existing issue for this?

  • I have searched the existing issues

Expected behavior

PDPListener::onNewCacheChangeAdded
{
...
std::unique_lock<std::recursive_mutex> lock(*parent_pdp_->getMutex());
...
if (pdata == nullptr)
{
    ...
    parent_pdp_->assignRemoteEndpoints(pdata);
    ...
    lock.unlock();
}
else
{
    pdata->updateData(temp_participant_data_);
    ...
    lock.unlock();
} 

But if I change it this way, it is possible to encounter deadlock problems when communicating within a process. Hope to get a good solution, thanks

Current behavior

PDPListener::onNewCacheChangeAdded
{
...
std::unique_lock<std::recursive_mutex> lock(*parent_pdp_->getMutex());
...
if (pdata == nullptr)
{
   ...
   lock.unlock();
    ...
    parent_pdp_->assignRemoteEndpoints(pdata);  // use parent_pdp_ outside the lock
}
...
}
else
{
    pdata->updateData(temp_participant_data_);  // // update parent_pdp_ inside the lock
    ...
    lock.unlock();
} 

Steps to reproduce

uint32_t times = 1000;
while (times--) {
    DomainParticipant* participant_ =
        DomainParticipantFactory::get_instance()->create_participant_with_profile(0, "participant_profile51");
    if (participant_ == nullptr) {
        continue;
    }
    TypeSupport m_type(new HelloWorldPubSubType());
    m_type.register_type(participant_);
    evbs::edds::dds::Subscriber* subscriber_ = participant_->create_subscriber(SUBSCRIBER_QOS_DEFAULT, nullptr);
    if (subscriber_ == nullptr) {
        continue;
    }
    Topic* topic_(participant_->create_topic("HelloWorldTopic", "HelloWorld", TOPIC_QOS_DEFAULT));
    if (topic_ == nullptr) {
        continue;
    }
    DataReader* reader_(subscriber_->create_datareader(topic_, DATAREADER_QOS_DEFAULT));
    if (reader_ == nullptr) {
        continue;
    }
    participant_->delete_contained_entities();
    DomainParticipantFactory::get_instance()->delete_participant(participant_);
}

Using the above code, the PDP concurrency problem can be replicated by running it across multiple processes

Fast DDS version/commit

lastest master

Platform/Architecture

Ubuntu Focal 20.04 amd64

Transport layer

Default configuration, UDPv4 & SHM

Additional context

No response

XML configuration file

<?xml version='1.0' encoding='utf-8'?>
<dds>
  <profiles>
    <transport_descriptors>
      <transport_descriptor>
        <transport_id>Udp4LoTransport</transport_id>
        <type>UDPv4</type>
        <maxMessageSize>65000</maxMessageSize>
        <maxInitialPeersRange>43</maxInitialPeersRange>
        <sendBufferSize>65000</sendBufferSize>
        <receiveBufferSize>65000</receiveBufferSize>
        <TTL>10</TTL>
      </transport_descriptor>
    </transport_descriptors>
    <participant profile_name="participant_profile51">
      <domainId>0</domainId>
      <rtps>
        <useBuiltinTransports>false</useBuiltinTransports>
        <userTransports>
          <transport_id>Udp4LoTransport</transport_id>
        </userTransports>
        <builtin>
          <discovery_config>
            <discoveryProtocol>SIMPLE</discoveryProtocol>
            <EDP>STATIC</EDP>
            <static_edp_xml_config>file://participant_profile51WrapperDiscoverConfig.xml</static_edp_xml_config>
          </discovery_config>
        </builtin>
        <participantID>1</participantID>
        <name>participant_51</name>
      </rtps>
    </participant>
  </profiles>
</dds>

<?xml version='1.0' encoding='utf-8'?>
<staticdiscovery>
  <participant local_name="participant_51">
    <name>participant_51</name>
    <reader>
      <userId>110</userId>
      <entityID>10</entityID>
      <topicName>HelloWorldTopic</topicName>
      <topicDataType>HelloWorld</topicDataType>
      <topicKind>NO_KEY</topicKind>
    </reader>
  </participant>
</staticdiscovery>

Relevant log output

No response

Network traffic capture

No response

@midding midding added the triage Issue pending classification label Aug 2, 2023
@Mario-DL
Copy link
Member

Hi @midding
Thanks for using Fast DDS and for the report. We were already aware of the issue and we recently introduced several fixes in this regard. Please check if the issue persists with #4220.

I tested with the reproducer you provided and I can no longer reproduce the race condition across multiple processes.

@Mario-DL Mario-DL added in progress Issue or PR which is being reviewed need more info Issue that requires more info from contributor and removed triage Issue pending classification labels Jan 31, 2024
@Mario-DL
Copy link
Member

I'm afraid that according to our CONTRIBUTING.md guidelines, I am closing this issue due to inactivity. In addition, we recently merged the PR fixing a pdp data in this regard.
Please, feel free to reopen it if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress Issue or PR which is being reviewed need more info Issue that requires more info from contributor
Projects
None yet
Development

No branches or pull requests

2 participants