-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in a series of micro-optimizations in rcutils #374
Conversation
That is, avoid a modulus operator (which uses an expensive divide) and instead use AND. The downside is that this only works for power-of-two capacities, which we now enforce in this implementation. It turns out that we only use the hash_map in one place (in rcl logging), and that was already a power-of-two so this restriction should have limited practical effect. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
This is a pretty common optimization and reduces the number of times we invoke the reallocation logic. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
That is, for unset(), get(), key_exists(), and next_key(), if the hash map is empty we can skip the potentially expensive step of computing the key hash. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
In particular, when using glibc we can use memrchr, which is a faster version of what we are looking to do. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
We have functions that can manipulate it, so there is no reason to expose it directly. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
There is no reason to expose it publically. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
This comment was marked as off-topic.
This comment was marked as off-topic.
Yeah, exactly. I should point out that I put in these optimizations as I was actually profiling logging in rcutils. In this particular case, I saw that the divide was taking up a not-insignificant amount of time, which is why I added the optimization here. |
Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
@ros-pull-request-builder retest this please |
Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
Thanks for the detailed reviews! With approvals and CI being green, I'm going to go ahead and merge this one. |
@@ -156,7 +156,7 @@ rcutils_get_zero_initialized_hash_map(); | |||
* ``` | |||
* | |||
* \param[inout] hash_map rcutils_hash_map_t to be initialized | |||
* \param[in] initial_capacity the amount of initial capacity for the hash_map | |||
* \param[in] initial_capacity the amount of initial capacity for the hash_map - this must be greater than zero and a power of 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a silly thing to do, now everyone that calls it has to do this calculation (it isn't trivial btw). Instead this function should take this as a request and document that it's rounding up to the nearest power of 2, not require the user to do this...
See what @methylDragon is having to do in his new pr:
btw, the example, a few lines above uses 10, which isn't a power of two 🤦♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a silly thing to do, now everyone that calls it has to do this calculation (it isn't trivial btw). Instead this function should take this as a request and document that it's rounding up to the nearest power of 2, not require the user to do this...
That's fair. There were only two users of this before, and they were both trivial to fix, so I just did it this way. But I'd be fine modifying this so it does the calculation internally.
btw, the example, a few lines above uses 10, which isn't a power of two man_facepalming
I just missed that example in the documentation. All of the code does it properly. I can fix that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #420
These were all discovered as part of optimizing the logging implementation. By themselves, none of these changes make a huge change to performance, but when combined with some of the other optimizations (like in #372 and another upcoming optimization), they can make a small but measurable difference.
The optimizations are as follows:
memrchr
function that is available with _GNU_SOURCE. This uses optimized instructions where possible. On all other platforms, fall back to the old implementation.