-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use heterogenous lookup for hash index #796
Conversation
88255af
to
677e1e8
Compare
Note: I highly recommend to look at the commits individually when reviewing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor things.
Also, why not add robin-map as a submodule instead? Copying over everything makes keeping track of new releases quite hard.
Edit: A changelog entry seems to be missing as well.
b960a51
to
adf2f3f
Compare
I updated the PR to add robin-map as a subtree and added a CHANGELOG entry. |
git-subtree-dir: aux/robin-map git-subtree-split: e4316eb284d87f37b93334808bd4aa7e7d819c85
The library was selected because it is stand-alone, reasonably fast [1], a smaller dependency than competing implementations from e.g. abseil or folly, and published under a permissive license. [1] https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html
The previously used `steady_map` had disastrous lookup and insertion performance, resulting in wait times from several minutes to several hours (!) when importing non-tiny data sets. After switching to a hash map implementation supporting heterogenous lookups instead, we observe import times that are on par with the regular index.
Adjust the hash computation for data view to give the same results as the hash for data by default.
Some notes:
transparent_unordered_map
to work, but it breaks serialization due to the different key type. Since the API was a bit weird anyways, I gave up on that approach.hash<data>()
andhash<data_view>()
give different result, not completely sure why yet.