Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimize the is_permutation family and _Hash::operator== for multicon…
…tainers (microsoft#423) * Optimize the is_permutation family and _Hash::operator== for multicontaniers slightly. <xutility> 4660: _Find_pr is a helper for is_permutation, so move it down to that area. 4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions. 4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized. 4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format. 4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment. 4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us. <xhash> 1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it. 1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts. Improvements over the old implementation: * For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large. * Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths. * We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range. The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range. Performance results: ``` Benchmark Before (ns) After (ns) Percent Better HashRandomUnequal<unordered_multimap>/1 18.7 11.7 59.83% HashRandomUnequal<unordered_multimap>/10 137 97 41.24% HashRandomUnequal<unordered_multimap>/100 1677 1141 46.98% HashRandomUnequal<unordered_multimap>/512 10386 7036 47.61% HashRandomUnequal<unordered_multimap>/4096 173807 119391 45.58% HashRandomUnequal<unordered_multimap>/32768 2898405 1529710 89.47% HashRandomUnequal<unordered_multimap>/100000 27441112 18602792 47.51% HashRandomUnequal<hash_multimap>/1 18.9 11.8 60.17% HashRandomUnequal<hash_multimap>/10 138 101 36.63% HashRandomUnequal<hash_multimap>/100 1613 1154 39.77% HashRandomUnequal<hash_multimap>/512 10385 7178 44.68% HashRandomUnequal<hash_multimap>/4096 171718 120115 42.96% HashRandomUnequal<hash_multimap>/32768 3352231 1510245 121.97% HashRandomUnequal<hash_multimap>/100000 26532471 19209741 38.12% HashRandomEqual<unordered_multimap>/1 16 9.4 70.21% HashRandomEqual<unordered_multimap>/10 126 89.2 41.26% HashRandomEqual<unordered_multimap>/100 1644 1133 45.10% HashRandomEqual<unordered_multimap>/512 10532 7183 46.62% HashRandomEqual<unordered_multimap>/4096 174580 120029 45.45% HashRandomEqual<unordered_multimap>/32768 3031653 1455416 108.30% HashRandomEqual<unordered_multimap>/100000 26100504 19240571 35.65% HashRandomEqual<hash_multimap>/1 15.9 9.38 69.51% HashRandomEqual<hash_multimap>/10 123 94.1 30.71% HashRandomEqual<hash_multimap>/100 1645 1151 42.92% HashRandomEqual<hash_multimap>/512 10177 7144 42.46% HashRandomEqual<hash_multimap>/4096 172994 121381 42.52% HashRandomEqual<hash_multimap>/32768 3045242 1966513 54.85% HashRandomEqual<hash_multimap>/100000 26013781 22025482 18.11% HashUnequalDifferingBuckets<unordered_multimap>/2 5.87 3.41 72.14% HashUnequalDifferingBuckets<unordered_multimap>/10 12 3.39 253.98% HashUnequalDifferingBuckets<unordered_multimap>/100 106 3.41 3008.50% HashUnequalDifferingBuckets<unordered_multimap>/512 691 3.46 19871.10% HashUnequalDifferingBuckets<unordered_multimap>/4096 6965 3.47 200620.46% HashUnequalDifferingBuckets<unordered_multimap>/32768 91451 3.46 2642992.49% HashUnequalDifferingBuckets<unordered_multimap>/100000 290430 3.52 8250752.27% HashUnequalDifferingBuckets<hash_multimap>/2 5.97 3.4 75.59% HashUnequalDifferingBuckets<hash_multimap>/10 11.8 3.54 233.33% HashUnequalDifferingBuckets<hash_multimap>/100 105 3.54 2866.10% HashUnequalDifferingBuckets<hash_multimap>/512 763 3.46 21952.02% HashUnequalDifferingBuckets<hash_multimap>/4096 6862 3.4 201723.53% HashUnequalDifferingBuckets<hash_multimap>/32768 94583 3.4 2781752.94% HashUnequalDifferingBuckets<hash_multimap>/100000 287996 3.43 8396284.84% ``` Benchmark code: ``` #undef NDEBUG #define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS #include <assert.h> #include <benchmark/benchmark.h> #include <hash_map> #include <random> #include <stddef.h> #include <unordered_map> #include <utility> #include <vector> using namespace std; template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; next(b.begin(), b.size() - 1)->second = 1u; for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; for (auto &&_ : state) { (void)_; assert(a == b); } } BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) { std::unordered_multimap<unsigned, unsigned> a; std::unordered_multimap<unsigned, unsigned> b; const auto range0 = static_cast<ptrdiff_t>(state.range(0)); for (ptrdiff_t idx = 0; idx < range0; ++idx) { a.emplace(0, 1); b.emplace(1, 0); } a.emplace(1, 0); b.emplace(0, 1); for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_MAIN(); * Apply a bunch of code review comments from Casey. * clang-format * Apply @miscco's code deduplication idea for <xhash>. * Fix code review comments from Stephan: comments and add DMIs.
- Loading branch information