Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance of SDF Element Pointer Construction #1478

Open
arjo129 opened this issue Sep 5, 2024 · 7 comments
Open

Poor performance of SDF Element Pointer Construction #1478

arjo129 opened this issue Sep 5, 2024 · 7 comments
Labels
bug Something isn't working major

Comments

@arjo129
Copy link

arjo129 commented Sep 5, 2024

Environment

  • OS Version: Ubuntu 22.04
  • Source

Description

  • Expected behavior: Destroying sdf::Element should be relatively fast.
  • Actual behavior: sdf::Element takes forever to destroy.

Steps to reproduce

Here is a minimal program:

#include <sdf/Model.hh>
#include <sdf/Root.hh>

void deserialize()
{
    std::string sdf = "<?xml version=\"1.0\" ?><sdf version='1.6'></sdf>";
    sdf::Root root;
    sdf::Errors errors = root.LoadSdfString(sdf);
    if (!root.Model())
    {
        return;
    }
    sdf::Model _model = *root.Model();
    return;
}

int main(int argc, char **argv)
{
    std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
    deserialize();
    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();

    std::cout << "Whole function call took: " <<
        std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count()
        << "[µs]" << std::endl;
}

On my 8 core intel i7-11850H @ 2.50GHz with 64GB of ram I get the following slow speed:

Whole function call took: 108759[µs]

I.E: It takes 0.1s to parse, validate and destroy what is effectively an empty xml file. We use this function extensively in Gazebo. For instance, when we serialize and deserialize model components. In fact this exact set of steps runs everytime a model is deserialized, hence when we try to run 3k_worlds.sdf where we load 3k shapes, we end up spending 300s just to deserialize the initial scene to the user. In fact when we run perf on the simulated world during, we see a large amount of time being spent inside the sdf library. Additionally, if we were to exit early (as done in this hack) instead of trying to deserialize we see a massive speedup (22s to load vs 300s, I suspect there is additional overhead caused by this bug inside renderutils as well). There are other places where we create and destroy, sdf elements within gazebo fairly frequently, I think its important we address this issue as it can lead to a much better user experience for end-users.

Output

@arjo129 arjo129 added bug Something isn't working major labels Sep 5, 2024
@arjo129 arjo129 changed the title Really poor performance of SDF Element Pointer Destruction Poor performance of SDF Element Pointer Destruction Sep 5, 2024
@luca-della-vedova
Copy link
Member

I wonder, do you have a flame graph for this test as well? It would be nice to figure out which function calls are the bottleneck (your comment implies that it's the destruction?)

@arjo129
Copy link
Author

arjo129 commented Sep 6, 2024

Sure. Here it is: https://drive.google.com/file/d/1TS4VsLKVfTsSNSTF0d-q88Ndq2tMXG_t/view?usp=drive_link

I seem to be missing some frame information, not sure exactly why but, the destructor does take a lot of time.

@luca-della-vedova
Copy link
Member

I'm not really an expert on sdformat but it seems that the bulk of the time spent in the destruction of XML Elements is actually spent in... Construction of XML Elements? It would be interesting to setup a breakpoint and step through the code to understand what is the path that leads to construction within the destructor

@arjo129 arjo129 changed the title Poor performance of SDF Element Pointer Destruction Poor performance of SDF Element Pointer Construction Sep 6, 2024
@arjo129
Copy link
Author

arjo129 commented Sep 6, 2024

I've been unable to find a way to reproduce the destructor issues. It is possible that the flamegraph is not 100% accurate. However, based on GDB, I've found that the construction itself is very expensive as we reparse the specification every single time. By cacheing the initial parse of the spec as done in #1479 , we can significantly improve performance of repeated construction.

@scpeters
Copy link
Member

scpeters commented Sep 9, 2024

From reading through the SDFormat parsing code, I see that there is a critical loop in the readXml function. The readXml function is an internal function called by the readDoc functions and called by itself recursively. An ElementPtr _sdf argument is passed to readXml, and a for loop iterates over the child elements of _sdf. Each child element with a matching SDF description is then cloned with Element::Clone and then passed recursively to readXml.

So Element::Clone is worth some scrutiny, and as implied by the name, it includes some dynamic memory allocation. There are a few places where we could call std::vector::reserve before calling vector::push_back in a loop (see #1480), which would help a small amount. Ultimately, I think a bigger issue is that each Element stores a copy of its schema in private member variables like elementDescriptions and attributes rather than holding a reference to a canonical representation of the schema. Changing this would be a bigger endeavor but could reduce the amount of memory allocations needed during Element::Clone.

@arjo129
Copy link
Author

arjo129 commented Sep 9, 2024

Hey Steve, I agree that Element::Clone() also needs some scrutiny. I also noticed that we reparse attributes after cloning. I've got a draft PR up for that.

@scpeters
Copy link
Member

scpeters commented Sep 9, 2024

Hey Steve, as I agree that Element::Clone() also needs some scrutiny. I also noticed that we reparse attributes after cloning. I've got a draft PR up for that.

yeah, thanks I just noticed your PR (#1479) after posting my comment, and because I didn't look into the calls to Param::SetParentElement in Element::Clone, I didn't notice that Param::SetParentElement calls Param::Reparse, which you are tinkering with in #1479. Note also that the Param::Reparse method was added in #589.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working major
Projects
Status: Inbox
Development

No branches or pull requests

3 participants