-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashMap starts the search even if the map is empty #38880
Comments
That function checks for zero capacity the first thing it does. The only work that is done between the Now, if the map was empty (but had nonzero capacity) we'd have a problem. In Ferris' example, were hashmaps with nonzero capacity but zero length used? You can make that happen with Bailing out on |
How much "small" is the negative impact in the regular case needs to be benchmarked. |
Yes. It's probably negligible. |
@Manishearth That really depends on the use case. If you have a large vector of mostly empty Anyways, wasn't there some talk about falling back on linear search/storage for small maps/sets? That would probably be more useful in practice. |
I wanted to move the capacity check from search_hashed up in the call stack, notably to the get* and contain methods (and there it can become a len = 0 check, solving this), as entry and insert reserve at least 1 space upfront anyway (so it's never really capacity 0). This saves 1 branch offsetting the extra ones required by #38368 . Probably not worth the trouble though :) |
So I've had a look into this issue.
Computing the hash can be quite costly, even for integer keys on an empty
I think it should be possible to make an early
This works well for small collections (e.g. integer keys up to about 16 items), however the problem is that different types require different amounts of time to compute the hash, and and also different amounts of time to compute equality. This is not necessarily consistent between values (e.g. strings or lists). I think I can implement this for types with simple equality checks, but it cannot be used in general. (The empty check can be used in general, so I propose we implement both.) |
This addresses issue rust-lang#38880
…ap-38880, r=arthurprs Early exit for empty HashMap (issue rust-lang#38880) Addresses issue rust-lang#38880 by checking if the HashMap is empty before computing the value of the hash. Before (integer keys) ``` running 4 tests test empty_once ... bench: 13 ns/iter (+/- 0) test empty_100 ... bench: 1,367 ns/iter (+/- 35) test exist_once ... bench: 14 ns/iter (+/- 0) test exist_100 ... bench: 1,518 ns/iter (+/- 40) ``` After ``` running 4 tests test empty_once ... bench: 2 ns/iter (+/- 0) test empty_100 ... bench: 221 ns/iter (+/- 0) test exist_once ... bench: 15 ns/iter (+/- 0) test exist_100 ... bench: 1,515 ns/iter (+/- 92) ``` When the HashMap is not empty, the performance remains the same, and when it is empty the performance is significantly improved.
Given that #48035 landed, is there anything else pending from this ticket? |
For this issue
there is nothing left to do. For the linear search
I don't know if it is possible: The time to compute equality and the time to compute a hash varies with different types (these two time determine the threshold at which it becomes faster to hash than to linear search). Not only that, but moreover, it could vary with different architectures. This means that determining the threshold is probably unfeasible in any kind of generality. (You might be able to say that for a particular processor architecture, when the entire map is in the L1 cache, and your keys are 32-bit integers, then it is faster to do a linear search if the size of your map is up to 27 entries (for example), and after that, it is faster to hash the integer key and do a regular lookup. However, if any of these assumptions change then it might not be possible to guarantee a faster lookup for a map with 27 entries. The one feature that you really want from your HashMap is that the lookup time is bounded (with a certain probability) and it's very hard to do a linear search and guarantee that you will be at least as fast as a hash-lookup.) The only thing that could certainly be implemented is if the hashmap contains only one element, then just do a single comparison rather than a hash lookup and then the comparison. i.e. hashmaps with size 1 don't need to compute the hash. |
I didn't realize this issue wasn't closed by #48035, and I also think this should be closed since it would be more appropriate to open a new issue for future optimizations. |
Yes, I agree. It wasn't closed because I wrote "Addresses issue #38880" in the pull request not "Fixes #38880" which would have caused GitHub to auto close it. (At the time I thought linear search was more feasible and might be implemented as an extension to the empty-hashmap case, and so it seemed like it could make sense for the issue to remain open a little while longer. Clearly that was not to be.) |
It was mentioned in the latest Ferris Makes Emulators episode that
contains
showed up in the profiling results, although a lot of the times the map itself was empty.I think the extra checking is worth it before starting the search and that way the hashing is also skipped.
The text was updated successfully, but these errors were encountered: