Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use heterogenous lookup for hash index #796

Merged
merged 9 commits into from
Mar 16, 2020
Merged

Use heterogenous lookup for hash index #796

merged 9 commits into from
Mar 16, 2020

Conversation

lava
Copy link
Member

@lava lava commented Mar 11, 2020

Some notes:

  • I tried to get the transparent_unordered_map to work, but it breaks serialization due to the different key type. Since the API was a bit weird anyways, I gave up on that approach.
  • The unit tests are actually failing because hash<data>() and hash<data_view>() give different result, not completely sure why yet.
  • Apart from that, lookups with and without hash index now perform at the same speed in my benchmarking, but hash indices still don't make things faster. That said, I don't know if performance was the main motivation for introducing hash indices, and I didn't go beyond the obvious optimizations so there is probably still room for further improvements.
  • The hash map is not properly integrated into the build system yet; locally I hard-coded the include to my global CXXFLAGS.

@lava lava force-pushed the story/ch13760 branch 6 times, most recently from 88255af to 677e1e8 Compare March 12, 2020 15:05
@lava lava marked this pull request as ready for review March 12, 2020 15:06
@lava
Copy link
Member Author

lava commented Mar 12, 2020

Note: I highly recommend to look at the commits individually when reviewing.

@lava lava requested review from tobim, dominiklohmann and mavam and removed request for tobim March 12, 2020 15:06
libvast/vast/view.hpp Show resolved Hide resolved
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things.

Also, why not add robin-map as a submodule instead? Copying over everything makes keeping track of new releases quite hard.

Edit: A changelog entry seems to be missing as well.

libvast/test/view.cpp Show resolved Hide resolved
libvast/vast/hash_index.hpp Outdated Show resolved Hide resolved
@lava lava force-pushed the story/ch13760 branch 2 times, most recently from b960a51 to adf2f3f Compare March 13, 2020 13:21
@lava
Copy link
Member Author

lava commented Mar 13, 2020

Also, why not add robin-map as a submodule instead? Copying over everything makes keeping track of new releases quite hard.

Edit: A changelog entry seems to be missing as well.

I updated the PR to add robin-map as a subtree and added a CHANGELOG entry.

lava and others added 9 commits March 16, 2020 11:39
This reverts commits 92a17df
and 33d441c (which were
themselves reverting the introduction of the hash index), with
some minor additional conflict resolution.
git-subtree-dir: aux/robin-map
git-subtree-split: e4316eb284d87f37b93334808bd4aa7e7d819c85
The library was selected because it is stand-alone, reasonably
fast [1], a smaller dependency than competing implementations
from e.g. abseil or folly, and published under a permissive license.

  [1] https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html
The previously used `steady_map` had disastrous lookup and
insertion performance, resulting in wait times from several
minutes to several hours (!) when importing non-tiny data sets.

After switching to a hash map implementation supporting heterogenous
lookups instead, we observe import times that are on par with the
regular index.
Adjust the hash computation for data view to give
the same results as the hash for data by default.
@lava lava merged commit 8d8d002 into master Mar 16, 2020
@lava lava deleted the story/ch13760 branch March 16, 2020 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants