Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in a series of micro-optimizations in rcutils #374

Merged
merged 8 commits into from
Aug 17, 2022

Conversation

clalancette
Copy link
Contributor

@clalancette clalancette commented Aug 14, 2022

These were all discovered as part of optimizing the logging implementation. By themselves, none of these changes make a huge change to performance, but when combined with some of the other optimizations (like in #372 and another upcoming optimization), they can make a small but measurable difference.

The optimizations are as follows:

  1. Make hash_map_find() use a shift rather than a divide to find modulo (only works with power-of-two capacities).
  2. When increasing the size of a char_array, add at least 1.5x the size to avoid another reallocation soon after.
  3. If the hash_map is empty, don't bother computing the key hash when we don't need it. In the corner case of an empty hash_map, the key hash computation can actually dominate the computation time.
  4. When using glibc, use the memrchr function that is available with _GNU_SOURCE. This uses optimized instructions where possible. On all other platforms, fall back to the old implementation.
  5. Make a couple of global variables in logging static. This should allow the optimizer more freedom. (this is honestly more a code-cleanliness change rather than performance, but I included it here anyway)

That is, avoid a modulus operator (which uses an expensive divide)
and instead use AND.  The downside is that this only works for
power-of-two capacities, which we now enforce in this implementation.

It turns out that we only use the hash_map in one place (in rcl
logging), and that was already a power-of-two so this restriction
should have limited practical effect.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
This is a pretty common optimization and reduces the number
of times we invoke the reallocation logic.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
That is, for unset(), get(), key_exists(), and next_key(), if the
hash map is empty we can skip the potentially expensive step of
computing the key hash.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
In particular, when using glibc we can use memrchr, which is
a faster version of what we are looking to do.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
We have functions that can manipulate it, so there is no
reason to expose it directly.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
There is no reason to expose it publically.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
@clalancette
Copy link
Contributor Author

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@gavanderhoorn

This comment was marked as off-topic.

@clalancette
Copy link
Contributor Author

Edit: /ignore: the divisor is not known at compile-time. That prevents any such optimisation.

Yeah, exactly. I should point out that I put in these optimizations as I was actually profiling logging in rcutils. In this particular case, I saw that the divide was taking up a not-insignificant amount of time, which is why I added the optimization here.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
@fujitatomoya
Copy link
Collaborator

@ros-pull-request-builder retest this please

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
@fujitatomoya
Copy link
Collaborator

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@clalancette
Copy link
Contributor Author

Thanks for the detailed reviews! With approvals and CI being green, I'm going to go ahead and merge this one.

@clalancette clalancette merged commit 53f952f into rolling Aug 17, 2022
@clalancette clalancette deleted the clalancette/micro-optimizations branch August 17, 2022 12:40
@@ -156,7 +156,7 @@ rcutils_get_zero_initialized_hash_map();
* ```
*
* \param[inout] hash_map rcutils_hash_map_t to be initialized
* \param[in] initial_capacity the amount of initial capacity for the hash_map
* \param[in] initial_capacity the amount of initial capacity for the hash_map - this must be greater than zero and a power of 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a silly thing to do, now everyone that calls it has to do this calculation (it isn't trivial btw). Instead this function should take this as a request and document that it's rounding up to the nearest power of 2, not require the user to do this...

See what @methylDragon is having to do in his new pr:

https://github.com/ros2/rosidl/pull/728/files#diff-f7fe239aa4ebdf12a882356e1ef293b766aced118d5a9b59e487efb08f08c41aR45-R57

btw, the example, a few lines above uses 10, which isn't a power of two 🤦‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a silly thing to do, now everyone that calls it has to do this calculation (it isn't trivial btw). Instead this function should take this as a request and document that it's rounding up to the nearest power of 2, not require the user to do this...

That's fair. There were only two users of this before, and they were both trivial to fix, so I just did it this way. But I'd be fine modifying this so it does the calculation internally.

btw, the example, a few lines above uses 10, which isn't a power of two man_facepalming

I just missed that example in the documentation. All of the code does it properly. I can fix that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #420

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants