Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: using murmur hash for float64 khash-tables #36729

Merged
merged 8 commits into from
Nov 14, 2020

Conversation

realead
Copy link
Contributor

@realead realead commented Sep 29, 2020

The currently used hash-function can lead to many collisions (see #28303 or this comment) for series like 0.0, 1.0, 2.0, ... n.

This PR uses a specialization (for a simple double-value) of murmur2-hash, which is used in stdc++ and libc++ and more or less state of the art.

An alternative would be to use Python's _Py_HashDouble, but because it has the property: hash(1.0)=1, hash(2.0)=2 and so on: there is no desirable avalanche effect , which is not an issue for Python's dict implementation, but problematic for khash as it uses a different strategy for collision handling. See #13436 as _Py_HashDouble was replaced through the simple hash-function used until now.

@realead
Copy link
Contributor Author

realead commented Sep 29, 2020

Running the asv_bench yield the following results:

+        979±30μs          163±1ms   166.84  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+     1.07±0.01ms          163±2ms   152.35  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+      6.16±0.1ms          163±1ms    26.53  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+     7.03±0.07ms          166±3ms    23.62  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+      84.7±0.6ms          163±1ms     1.92  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+        86.1±2ms          163±3ms     1.90  indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+        81.4±2ms          146±2ms     1.79  series_methods.IsInFloat64.time_isin_many_different
+      16.1±0.4ms       22.0±0.6ms     1.36  io.hdf.HDFStoreDataFrame.time_read_store_table
+        39.0±1ms       50.1±0.7ms     1.28  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, 2)
+      16.6±0.2ms       21.3±0.3ms     1.28  reindex.DropDuplicates.time_frame_drop_dups_int(True)
+     10.5±0.07ms       13.3±0.1ms     1.27  index_object.Indexing.time_get_loc_non_unique('Float')
+      11.1±0.5ms       14.0±0.7ms     1.26  algorithms.Duplicated.time_duplicated(False, 'first', 'float')
+         831±8μs       1.03±0.1ms     1.24  groupby.GroupByMethods.time_dtype_as_field('float', 'min', 'direct')
+      11.2±0.1ms       13.6±0.1ms     1.21  index_object.Indexing.time_get_loc_non_unique('Int')
+        420±10ms          505±9ms     1.20  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+        424±10ms         499±10ms     1.18  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+        426±20ms         501±10ms     1.18  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+        544±10ms         638±40ms     1.17  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+         548±7ms         642±20ms     1.17  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+      3.98±0.2μs       4.60±0.3μs     1.15  index_cached_properties.IndexCache.time_values('DatetimeIndex')
+        538±20ms          619±8ms     1.15  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+         551±7ms          631±6ms     1.14  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+         548±7ms          625±8ms     1.14  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+        14.2±1μs         16.1±3μs     1.14  index_cached_properties.IndexCache.time_engine('PeriodIndex')
+        172±10μs         195±50μs     1.13  ctors.SeriesConstructors.time_series_constructor(<function no_change at 0x7f0581527b80>, False, 'int')
+      4.19±0.3μs       4.72±0.2μs     1.13  index_cached_properties.IndexCache.time_is_all_dates('PeriodIndex')
+     1.72±0.09ms       1.93±0.3ms     1.12  ctors.SeriesConstructors.time_series_constructor(<function list_of_str at 0x7f0581527c10>, False, 'int')
+        20.8±1μs         23.1±2μs     1.11  index_cached_properties.IndexCache.time_engine('TimedeltaIndex')
+      1.77±0.1ms       1.96±0.2ms     1.11  ctors.SeriesConstructors.time_series_constructor(<function list_of_str at 0x7f0581527c10>, False, 'float')
-      8.13±0.8μs       7.33±0.1μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 10000, None)
-        22.8±1μs       20.5±0.3μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 3000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-     2.43±0.07ms      2.12±0.03ms     0.88  tslibs.fields.TimeGetDateField.time_get_date_field(10000, 'woy')
-      7.37±0.5μs       6.38±0.4μs     0.87  index_cached_properties.IndexCache.time_inferred_type('MultiIndex')
-        28.1±2μs       24.2±0.5μs     0.86  tslibs.resolution.TimeResolution.time_get_resolution('D', 100, None)
-        742±90ns         614±10ns     0.83  tslibs.timestamp.TimestampProperties.time_microsecond(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
-      65.6±0.2ms       49.3±0.3ms     0.75  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, -2)
-      6.46±0.2ms      4.07±0.08ms     0.63  series_methods.ValueCounts.time_value_counts('float')
-        78.3±2ms       48.6±0.2ms     0.62  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, 2)
-        82.9±1ms         36.3±1ms     0.44  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, 0)
-         122±2ms       48.9±0.5ms     0.40  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, 2)
-         136±3ms       49.0±0.3ms     0.36  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, -2)
-         220±3ms       49.1±0.7ms     0.22  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, -2)
-         171±3ms       32.8±0.5ms     0.19  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, 0)
-         292±7ms         35.9±2ms     0.12  hash_functions.UniqueAndFactorizeArange.time_factorize(11)
-         298±5ms         35.5±3ms     0.12  hash_functions.UniqueAndFactorizeArange.time_factorize(8)
-        278±10ms       31.8±0.3ms     0.11  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, 0)
-        328±40ms         33.0±2ms     0.10  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, 0)
-         277±7ms         24.4±3ms     0.09  hash_functions.UniqueAndFactorizeArange.time_unique(11)
-         297±6ms         24.1±2ms     0.08  hash_functions.UniqueAndFactorizeArange.time_unique(8)
-      1.26±0.01s         42.0±2ms     0.03  hash_functions.UniqueAndFactorizeArange.time_factorize(10)
-      1.25±0.02s       24.2±0.9ms     0.02  hash_functions.UniqueAndFactorizeArange.time_unique(10)
-      2.52±0.07s         36.3±1ms     0.01  hash_functions.UniqueAndFactorizeArange.time_factorize(9)
-      3.95±0.04s         56.5±1ms     0.01  hash_functions.GH28303Example.time_groupby

The example from the #28303 is hash_functions.GH28303Example.time_groupby and is now about 100 times faster.

Curiosly, the measurement

+        979±30μs          163±1ms   166.84  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')

actually shows how much better the murmur2-hash is:

  • there are 1e6 look-ups (when hashmap is built), i.e. about 1 ns per look-up for the old hash-function and about 200ns per look-up when murmur2 is used.
  • there are 1e6 double elements in the map, that means 8-16 MB of data => no longer in L3 cache.
  • Latency of RAM in my machine is about 80-100ns + prefetcher has no chance for a good hash-function to improve something - about every look up with the murmur2 hash produces a cache-miss, as one would expect from a good hasher.
  • The result of the old function can be only explained only if it always had a cache-hit and prefetcher could easily guess the next access

Note: the above is a just an educated guess, I still have to profile/debug to be sure. For example it looks as if, while the table is built up, a rehashing happens. This would explain why 160ms and not 80ms (latency of my machine * 1e6) are needed to build up the hash map.

This behavior of the old hash function is an advantage in this special case, but for other cases as shown by hash_functions examples quite a problem for other cases, probably not only by a constant factor but also in terms of a worse running time (maybe even somewhere in the direction of O(n^2)).

Also compared to Int64 and UInt64 the old Float64-behavior is a outlier (130ms vs 1ms), the explanation is the same as above:

              --                                                         index_structure                 
              ------------------------------------------ ------------------------------------------------
                             index_dtype                  unique_monotonic_inc   nonunique_monotonic_inc 
              ========================================== ====================== =========================
                pandas.core.indexes.numeric.Int64Index          185±10μs                 130±5ms         
               pandas.core.indexes.numeric.UInt64Index         524±200μs                 133±7ms         
               pandas.core.indexes.numeric.Float64Index         193±20μs                979±30μs      
              ========================================== ====================== =========================

@realead realead changed the title using murmur hash for float64 khash-tables PERF: using murmur hash for float64 khash-tables Sep 29, 2020
self.s.isin(self.values)


class GH28303Example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give this a descriptive name and then in a comment # see GH#28303

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jreback jreback added the Performance Memory or execution speed performance label Sep 29, 2020
@jreback
Copy link
Contributor

jreback commented Sep 29, 2020

@realead looks really interesting. let me have a closer look soonish.

@realead realead force-pushed the gh_28303_float_hash branch from 72eea8e to ee8dd93 Compare September 30, 2020 20:17
@jbrockmendel
Copy link
Member

which is used in stdc++ and libc++ and more or less state of the art.

is there any chance we can get it via from libcpp.foo cimport bar?

[...] actually shows how much better the murmur2-hash is: [...]

I'd like to understand this, but it may just be above my pay-grade. Let me know if I've got any of this right: So the naive interpretation of the asv result is that the murmur implementation is slower, but you're saying that interpretation is incorrect. Instead, the murmur implementation is so much faster that the [mumble] is using a slower cache level, making it look like its slower?

@realead
Copy link
Contributor Author

realead commented Sep 30, 2020

@jbrockmendel

is there any chance we can get it via from libcpp.foo cimport bar?

One could replace khint32_t PANDAS_INLINE murmur2_32_32to32(khint32_t k1, khint32_t k2) with

    #include <cstdint>
    #include <functional>
    khint32_t my_hash(double d){
         std::hash<double> op;
         size_t res = op(d);
         #if INTPTR_MAX == INT64_MAX
              // 64-bit build, probably safer not just to cast to 32bit...
             return res^(res>>32);
        #elif INTPTR_MAX == INT32_MAX
             // 32-bit build, good to go:
             return res;
        #else
            #error Unknown pointer size or missing size macros!
        #endif
    }

and build every pyx-file which cimports khash.pxd as c++-extension, e.g. by adding

# distutils: language = c++

to these files. However, C++11 is needed, so this option might be needed to be added as well.

@realead
Copy link
Contributor Author

realead commented Sep 30, 2020

@jbrockmendel

Instead, the murmur implementation is so much faster that the [mumble] is using a slower cache level, making it look like its slower?

I've said the murmur hash is better (not faster):)

In a nutshell: Not having cache misses while building the hash-table,only shows that the works isn't done "properly" and the paycheck will come later, when this hash-table is used. (One must know, that in the example we are talking about, only building of the hash-table is measured)


Here more details, I hope they make things clearer and not worse...

From a good hash we would expect the following behavior:

  • the probability for hitting a bucket in a hash-map is (almost) the same for all buckets and all hash-map-sizes
  • changing a bit in the input would completely change the whole resulting hash, so knowing the hash(1.0) doesn't help to know hash(2.0).

That means accessing buckets for hash(1.0), hash(2.0) and so one with the new function will lead to cache misses, so be relatively slow.

The old hash has a behavior similar to hash(1.0)=1, hash(2.0)=2, hash(3.0)=3 and so on, so we access the buckets in an order which optimizes the utilization of the L1-cache and which is 100 times faster, i.e. latency RAM/latency L1. We basically just copy the memory - you cannot beat that.

However in indexing.NumericSeriesIndexing.time_loc_slice we just build the hash-table, but don't use it.

If we look up series 1.0, 2.0, 3.0... the old hash-function would be very fast again: we just look at one bucket after another utilizing the L1 cache, while the new hash-function would be getting cache-misses and being a constant factor of about 100 slower.

However, if we are unlucky and have to look up a key which is not in the hash-table, but hits the first bucket, the old hash function becomes a problem: first bucket is not empty, but wrong key, so we need to look into the second bucket (that is how khash resolves collisions) which is not empty, but also wrong key - and so almost all n-buckets will be checked leading to O(n) for the look up (situation is not much better for key+1, which will hit the second bucket). That or something similar is at core of #28303. So such hash-map is not much different from a simple unsorted vector with linear search.

In the meantime, the new hash function would make the look-up in O(1) even if with a bigger constant.

Obviously, we could also construct a series, which would trigger the same behavior also for the new hash-function, but the performance degradation will not happen for such "usual" series like 1.0, 2.0, and so on.

@realead realead force-pushed the gh_28303_float_hash branch from 5cb5b3b to 0b5a7c7 Compare September 30, 2020 23:31
@pep8speaks
Copy link

pep8speaks commented Sep 30, 2020

Hello @realead! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-13 21:18:23 UTC

@realead realead force-pushed the gh_28303_float_hash branch from 0b5a7c7 to 8048f97 Compare October 1, 2020 19:24
@realead
Copy link
Contributor Author

realead commented Oct 1, 2020

is there any chance we can get it via from libcpp.foo cimport bar?

After thinking about it for another while: the issue with cpp hash function, not only that it means to switch to another language, but the behavior of the hash function will depend on platform (Linux - gcc and libstdc++, macOS - clang + libc++, Windows + MSVC (doesn't use murmur-hash IIRC)) and build (different hash-functions for 32bit and 64bit), so the behavior of the hash-map can be different on different platform.

@jreback jreback added this to the 1.2 milestone Oct 2, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @realead lgtm. can you add a note in the Perf section of 1.2, ok to be reasonably specific (e.g. perf increase on groupby of floats vs ints).

@jreback
Copy link
Contributor

jreback commented Oct 2, 2020

cc @jbrockmendel @TomAugspurger ok here?

@jbrockmendel
Copy link
Member

@realead are the slower-here asvs in indexing.NumericSeriesIndexing measuring the wrong thing? or if not wrong, then incomplete?

@realead realead force-pushed the gh_28303_float_hash branch from 8048f97 to 821966e Compare October 3, 2020 07:39
@realead
Copy link
Contributor Author

realead commented Oct 3, 2020

@jbrockmendel In my opinion indexing.NumericSeriesIndexing.time_loc_slice is a legit test as it is: it measures the performance of loc_slice. This is an impementation detail, that for some inputs it is more or less building of a hash-table.

However, measuring the performance of building the hash-table alone is a legit test as well. An example: as noted here:

... it looks as if, while the table is built up, a rehashing happens. This would explain why 160ms and not 80ms (latency of my machine * 1e6) are needed to build up the hash map.

Given the size of SIZE_HINT_LIMIT as

SIZE_HINT_LIMIT = (1 << 20) + 7

rehashing should not happen for 1e6 elements.

Once this bug is fixed, indexing.NumericSeriesIndexing.time_loc_slice will become twice as fast and a good regression test.

However, indexing.NumericSeriesIndexing.time_loc_slice isn't a good test for assessment of the hash-table's performance, because it measures only one aspect (how fast the table is built) and not the other (maybe even more important aspect) - the time needed for the look-up. Hash-table is a trade-off between fast creation and fast look-up.

@realead
Copy link
Contributor Author

realead commented Oct 3, 2020

@jreback whatsnew added.

@TomAugspurger
Copy link
Contributor

Not really qualified to comment here. My only question would be whether this should be controlled by an option since (IIUC) the performance depends on the access pattern. But I haven't followed things closely.

@jbrockmendel
Copy link
Member

Does this render #8524 unnecessary?

@realead
Copy link
Contributor Author

realead commented Oct 4, 2020

I wasn't aware that pandas' khash isn't the last version and thus uses double hashing (which makes at least some things I said in the above comments wrong).

Sorry for the long text, which will come now, I hope it will help us to reach a decision. There are already the most important points:

  • I would like to try to improve this PR, so such series like 1.0, 2.0, 3.0, ... keep their out-of-this world performance (as "requested" by @TomAugspurger, I hope this can be done without adding some special options)
  • @jbrockmendel, so far this PR is somewhat orthogonal to switching to khash 0.2.8 with quadratic probing, but has the potential to make it obsolete, if improvements I have in mind will work out (not sure about this though).
  • we should try to improve double hashing in this khash-version, before considering switching to quadratic probing (i.e. khash 0.2.8), for which we will see O(n*sqrt(n))-behavior instead of O(n) at least for some "quite usual" series.

Here are descriptions what linear probing, double hashing and quadratic probing do.

Cost per element for building a hash map or looking-up are:

  • calculating hash
  • getting element from RAM (all above latency due to cache misses)
  • checking equality

While checking for equality can be costly (strings, PyObjects) and even dominate, for floats and ints it is almost free - we aren't going to consider these costs.

Linear probing:

When we create hash-table/look-up series of elements which are random, the best strategy is linear probing with a very simple hash function. For integers, let's say hash(x)=x%m where m is the number of buckets in the hash-table. Linear probing means, that if x%m=i, and bucket i is alredy occupied, we look ati+1, i+2 and so one until we find a free one. Because data is random, the probablity is very hight, that after one-three probing steps we will find a free bucket. However due to the cache, we don't have to pay for the look-up in bucket i+1- it was fetched together with bucket i, the same goes for for i+1. That means the costs for one look-up are (hash-function cost in this case is almost nothing, so we neglect it):

1*T(latency)

The behavior is even better for series like 1,2,3,4,5,...,n<m, with subsequent look-up series 1,2,3,...,n: We acces one bucket after another, the prefetcher recognizes the pattern and gets the memory in advance, thus we have only latency of the L1 cache - we are on par with just using an index-vector.

Good: We have the performance of a vector for very simple (but probably pretty usual for pandas, let's call them "important") pattern of 1,2,3..n, but also flexibility to be able to handle other series quite efficiently.

This is also what khash-comments say:

* Allow to optionally use linear probing which usually has better
performance for random input. Double hashing is still the default as it
is more robust to certain non-random input.

There a however some non-random inputs which make the above strategy degenerate to a O(n^2) running time: 0, m, 2*m,... and so on.

Single hashing:

So we could keep linear probing and take a proper/strong hash-function like murmur2, which (almost) guarantees:

  • there are very few hash-collisions
  • "avalance effect": changing a bit, changes the whole hash. That means 1 and 2, will have completely different hashes and never be next to each other.

The second property is "good" because having multiple occupied buckets next to eachother and linear probing is a combination for an O(n^2)-desaster.

The costs are now:

1*T(calc_hash)+1*T(latency)

So we paid T(calc_hash) but are as safe from O(n^2) as one can be. For large tables, T(calc_hash)<<T(latency), so the price isn't that high.

However, we aren't 100% happy, because now for "important" series 1,2,3,4,5 ... we pay T(latency RAM) and no longer T(latency L1) which is about factor 50-100. So in this (and some other cases) we cannot match the performance of a vector.

indexing.NumericSeriesIndexing.time_loc_slice is such an example, where we aren't happy with single hashing.

Double hashing (current state in pandas):

The idea is to be optimistic about the data and use a simple hash-function without "avalance effect", thus 1 and 2 will land in the neighboring buckets once again. But then, if our believe in the goodness of the data was betrayed and we have a collision, we get into the panic mode, and instead of probing the next bucket we use a second (more or less strong) hash to calculate the step length and jump somewhere for the next probing, if not successful we take the same step length and jump again (we must and can(!) ensure that the step is chosen in such a way that all buckets can be probed).

Good news: we are on par with a vector for "important" series like 1,2,3,4.... But there is an additional price for random inputs and crowded hash-maps, for which let's say 50% of look-ups are collisions:

1*T(latency)+0.5*(1*T(calc_hash)+1*T(latency))

So we have paid 0.5*T(latency) but saved 0.5*T(calc_hash) compared to strong-single-hash + linear probing. For bigger tables this means we are loosing some performance for random input, but win performance for "important" series.

We can conclude:

In theory, double hashing should be more robust than quadratic probing.
However, my implementation is probably not for large hash tables, because
the second hash function is closely tied to the first hash function,
which reduce the effectiveness of double hashing.

Quadratic probing:

Obviously it pains us to pay this price for random series. Quadratic probing let's do another trade-off: differently to linear probing we probe not neighbors 1,2,3,4... but neighbors 1,3,7,15,31 and so on. The advantages:

  • we don't pay additonal latency-cost for the first jumps, so in the normal case we aren't worse than the linear probing
  • in the worst case of all neibors-buckets being occupied, we need sqrt(n) jumps and not n-jumps as linear probing to achieve a free bucket.

So the cost for random series is:

1*T(latency)

we are on par with vector for "important" series and we can handle bad "occupied neighbors"-case in O(n*sqrt(n)) which is worse than O(n), but not as catastrophic as O(n^2) of linear probing.

Before switching to quadratic probing (as proposed in #8524), one should consider the following:

  • it will not solve (all) problems with float64: right now the hash-function is such, that there are many collisions (see PERF: Always using panda's hashtable approach, dropping np.in1d #36611 (comment)), quadratic probing cannot help (but double hashing does, as it uses the whole first 32bit-hash and not just the value modulo m to calculate the step!)
  • For cases where costs for checking equality are high (large similar strings, big PyObjects), T(latency) doesn't play that big role, thus we are having the costs in bad cases (O(n*logn) vs. O(n)) without having any benifits for random case.
  • The O(n*log(n)) will probably be observable for quite some cases which aren't 1,2,3,... but something similar.

What we should try:

  • change the current float64-hash function to something that doesn't have collisions for 1.0, 2.0, 3.0 ..., but places them into neighboring buckets (so we still have good speed for "important" series).
  • change the second hash in khash-library to a stronger one - the murmur2.
  • try a slightly different probing strategy: for the first 4-8 probing steps use linear probing (due to cache it is free lunch) if not successful go into panic-mode, use second hash and take cache-misses to avoid O(n^2).

@jreback
Copy link
Contributor

jreback commented Oct 4, 2020

@realead thanks for the detailed cases and analysis. looking forward to your experimentation. If possible, can you capture the analysis in a doc / notebook / gist that we can stick somewhere, so the next time we need to look at this there will be enough info. i think a notebook might be the best here.

@realead
Copy link
Contributor Author

realead commented Oct 12, 2020

Sorry it took so long, here is the first part of my analysis/experiments: pleadings for a stronger hash.

In a nutshell:

  • the currently used hash-functions for float64, (u)int64 (and probably PyObjects) are quite weak.
  • this weak hash-functions leads to regular pattern of occupied buckets, which while quite fast for some special this leads to catastrophic performance for other "usual" series.
  • the freakish performance of weak hashes with "interesting" series like 1,2,3,4,... is due to less cache misses, yet:
    - and is only significant for sizes around 1e6: for smaller sizes the cache is enough also for stronger hashes, for bigger sizes cache is not big enough for weak hashes.
    - there is already special handling of such "intersting" series in some workflows (see e.g. test cases indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc'), so possible the performance of hash-tables for such "interesting series" not very important?
  • My proposal: to have a strong hash (murmur2) as hash for float64 and as secondary hash (47125d4#diff-75baab7a2979bf70f29f7e096b686cabR177) and to accept some performance hit. It also should be cosidered to use strong hash for int64, uint64 and PyObject.

While for random series, the performance of both, the current (weak) hash-function and the proposed murmur2-hash function, is on par, the weak hash-function seems to be order of magnitude faster, e.g. for asv-test indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc').

However when we see the performance for different sizes (asv-tests hash_functions.NumericSeriesIndexing) :

[ 87.50%] ··· ========================================== ============ ============ ============ ========== =========
              --                                                                      N                             
              ------------------------------------------ -----------------------------------------------------------
                             index_dtype                    10000        100000       500000     1000000    5000000 
              ========================================== ============ ============ ============ ========== =========
                pandas.core.indexes.numeric.Int64Index    53.5±0.7μs   75.1±0.4μs    193±5μs     318±10μs   309±3ms 
               pandas.core.indexes.numeric.UInt64Index     75.1±2μs     361±8μs     5.94±0.2ms   18.7±2ms   331±7ms 
               pandas.core.indexes.numeric.Float64Index    55.7±3μs    69.4±0.7μs    206±8μs     339±20μs   479±4ms 
              ========================================== ============ ============ ============ ========== =========
  • there is an anomaly for UInt64Index (18ms vs 0.3ms for Int64Index), this is however unrelated to hash-tables: perf shows that the time is spent in _aligned_contig_cast_ulong_to_double. So we no longer consider UInt64 case.
  • from 1e6 to 5e6 the performance decreases by factor 100.

The second point is easiest to explain, when looking at the Int64-case (see HashEmulator.ipynb.zip for an emulation of the khash-lib behavior), we can see, that the elements are inserted in the following order in the following buckets:

0, 2049,     2*2049. ..... until 1<<21,
1, 2049+1, 2*2049+1, .... until 1<<21,
2, 2049+2, 2*2049+2, .... until 1<<21

the interesting thing is that when the first row is done, the first bucket in the second row is still in L2 cache (around 64Kb are needed for ca. 1000 such "heads" at 0, 2049, 2*2049 and so on). However, for 5e10 elements, there are 8 times so many heads, that means 512Kb cache are used => L3 cache which has 10 times the lattency of the L2-cache.

The bigger problem of such regular behavior of the hash, is the pattern of the occupated buckets (1=occupied bucket, 0=free bucket):

grafik

There are 512 occupied buckets, then 512 non-occupied buckets, then 512 occupied buckets and so on. This regular pattern means, that there are step-sizes for which we will iterate over all (or many) such islands until we can resolve a collision, which will lead to a suplinear-behavior of the hash-table.

The situation is even worse for Float64: there will be an occupied bucket-island consisting of 5e5 (out of 2e6) buckets (but other occupied islands are relatively small) - no wonder there are some bugs/series showning degenerated performance:

grafik

As comparison, occupied buckets with murmur-hash:

grafik

There is no pattern and no large occupied bucket-islands, thus the collision resolution never take at most around 20 steps and are done after 2 steps on average.

This PR currently proposes using murmur2-hash for float64 and for secondary hash-functio. The performance for this special case is somewhat worse:

 ========================================== ========== ========== ============ ========== ==========
              --                                                                    N                            
              ------------------------------------------ --------------------------------------------------------
                             index_dtype                   10000      100000      500000     1000000    5000000  
              ========================================== ========== ========== ============ ========== ==========
                pandas.core.indexes.numeric.Int64Index    54.0±3μs   74.1±3μs    221±7μs     352±10μs   308±4ms  
               pandas.core.indexes.numeric.UInt64Index    77.5±4μs   366±10μs   5.71±0.3ms   17.4±1ms   328±8ms  
               pandas.core.indexes.numeric.Float64Index   55.4±2μs   73.3±4μs    281±6μs     653±30μs   640±20ms 
              ========================================== ========== ========== ============ ========== ==========

i.e.:

       before           after         ratio
     [d5cddbda]       [47125d44]
+        339±20μs         653±30μs     1.92  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 1000000)
+         206±8μs          281±6μs     1.37  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 500000)
+         479±4ms         640±20ms     1.34  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)

but it is not that much worse. On the other hand, the advantage is that the danger of superlinear behavior is much smaller.


Right now panda's khash-version uses the most robust probing strategy: double-hashing. Using murmur2 as secondary hash will improve the robustness even more. However, only with stronger first hash using more performant but less robust probing strategies (such as quadratic and linear) can be taken into consideration (which probably should not be part of this PR).

Another option is only to replace the second-hash trough murmur2, which would add robustness and be less intrusive than the proposal of this RP (on the other hand PR-proposal gives more safety, that the performance will not take a hit for some more or less "usual" series), the results of the run for only-second-hash-murmur2 can be found at the end.


Here are the asv-bench timings for RP (strong first and second hashes)

       before           after         ratio
     [d5cddbda]       [47125d44]
+      55.0±0.8ms          134±2ms     2.43  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 1000000)
+      81.3±0.6μs        155±0.9μs     1.91  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+         289±6μs          542±6μs     1.87  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+        45.8±1ms         83.4±1ms     1.82  series_methods.IsInFloat64.time_isin_many_different
+         292±9μs         530±10μs     1.82  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 1000000)
+         274±3μs          493±2μs     1.80  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+     1.92±0.02ms      3.43±0.06ms     1.79  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+      7.13±0.1ms       12.4±0.5ms     1.73  hash_functions.UniqueAndFactorizeArange.time_unique(6)
+     7.09±0.08ms       12.3±0.3ms     1.73  hash_functions.UniqueAndFactorizeArange.time_unique(5)
+     3.32±0.05ms       5.74±0.2ms     1.73  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 100000)
+     1.71±0.01ms      2.95±0.04ms     1.73  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+      44.3±0.6μs       74.5±0.6μs     1.68  indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+      7.56±0.2ms       12.3±0.3ms     1.63  hash_functions.UniqueAndFactorizeArange.time_unique(4)
+      17.2±0.5ms       27.3±0.9ms     1.59  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 19)
+      7.33±0.2ms       11.2±0.2ms     1.53  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 18)
+      10.8±0.3ms       16.5±0.4ms     1.53  hash_functions.UniqueAndFactorizeArange.time_factorize(5)
+      8.32±0.3ms       12.4±0.4ms     1.50  hash_functions.UniqueAndFactorizeArange.time_unique(14)
+      8.29±0.2ms       12.3±0.4ms     1.49  hash_functions.UniqueAndFactorizeArange.time_unique(15)
+      11.2±0.2ms       16.4±0.3ms     1.46  hash_functions.UniqueAndFactorizeArange.time_factorize(4)
+      7.69±0.2ms       10.8±0.8ms     1.41  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 1000000)
+      51.1±0.8ms       71.1±0.9ms     1.39  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 20)
+      44.6±0.9ms       60.9±0.3ms     1.36  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 20)
+         193±9μs          262±6μs     1.36  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 500000)
+      12.3±0.3ms       16.6±0.3ms     1.35  hash_functions.UniqueAndFactorizeArange.time_factorize(15)
+     3.92±0.04ms      5.14±0.08ms     1.31  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 17)
+      4.47±0.1ms      5.84±0.08ms     1.31  algorithms.Factorize.time_factorize(False, False, 'boolean')
+      18.3±0.1ms       23.7±0.1ms     1.30  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, 2)
+     1.22±0.03ms      1.55±0.03ms     1.28  algorithms.Factorize.time_factorize(True, True, 'boolean')
+         116±2ms         146±10ms     1.26  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, 2)
+         199±2ms          250±5ms     1.26  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+      3.79±0.2ms       4.73±0.2ms     1.25  algorithms.Duplicated.time_duplicated(False, False, 'int')
+     17.5±0.03ms       21.7±0.3ms     1.24  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, -2)
+      38.7±0.3ms       48.1±0.3ms     1.24  series_methods.IsInFloat64.time_isin_nan_values
+         471±2ms          585±8ms     1.24  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)
+      9.99±0.4ms       12.4±0.5ms     1.24  hash_functions.UniqueAndFactorizeArange.time_unique(13)
+         195±2ms          242±4ms     1.24  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+     4.05±0.04ms      5.02±0.07ms     1.24  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, -2)
+      23.7±0.3ms       29.1±0.6ms     1.23  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 19)
+     3.27±0.04ms      4.01±0.05ms     1.23  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 80000)
+      39.4±0.2ms       48.2±0.2ms     1.22  series_methods.IsInFloat64.time_isin_few_different
+      5.50±0.1ms      6.73±0.06ms     1.22  algorithms.Factorize.time_factorize(False, True, 'boolean')
+         202±4ms          247±4ms     1.22  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+     4.74±0.08ms      5.78±0.05ms     1.22  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 0)
+     2.06±0.05ms      2.52±0.03ms     1.22  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 16)
+     2.24±0.03ms      2.74±0.03ms     1.22  series_methods.IsInForObjects.time_isin_long_series_short_values
+      9.12±0.4ms       11.1±0.2ms     1.22  index_object.IntervalIndexMethod.time_intersection(100000)
+      6.10±0.2ms       7.38±0.2ms     1.21  algorithms.Duplicated.time_duplicated(False, 'last', 'float')
+     2.85±0.09ms      3.44±0.05ms     1.21  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 70000)
+      8.21±0.3ms       9.89±0.7ms     1.20  algorithms.Duplicated.time_duplicated(False, False, 'float')
+     3.52±0.04ms      4.24±0.03ms     1.20  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 80000)
+      2.29±0.1ms       2.76±0.1ms     1.20  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 500000)
+         566±6μs         680±20μs     1.20  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 8000)
+      8.15±0.3ms       9.78±0.3ms     1.20  algorithms.Factorize.time_factorize(False, False, 'datetime64[ns, tz]')
+         612±5μs         732±10μs     1.19  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 8000)
+      17.7±0.1ms       21.0±0.1ms     1.19  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, 2)
+      61.3±0.8ms       72.9±0.9ms     1.19  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 750000)
+     4.97±0.03ms      5.90±0.03ms     1.19  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, 0)
+     4.75±0.08ms      5.64±0.07ms     1.19  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 0)
+         268±4ms          318±6ms     1.19  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+      90.2±0.4ms        107±0.4ms     1.18  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, -2)
+     2.64±0.04ms      3.12±0.05ms     1.18  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 70000)
+         265±2ms          313±4ms     1.18  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+        96.3±1ms          114±2ms     1.18  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 900000)
+      19.5±0.4ms       23.0±0.3ms     1.18  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, -2)
+      5.44±0.1ms      6.39±0.08ms     1.17  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, -2)
+     6.07±0.06ms       7.12±0.1ms     1.17  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 0)
+         265±4ms          311±3ms     1.17  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+     6.00±0.04ms       7.03±0.1ms     1.17  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 0)
+         265±5ms          311±5ms     1.17  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+         265±5ms          310±1ms     1.17  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+      14.2±0.3ms       16.6±0.2ms     1.17  hash_functions.UniqueAndFactorizeArange.time_factorize(13)
+      19.2±0.1ms       22.5±0.3ms     1.17  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, 2)
+         553±3μs         646±10μs     1.17  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 7000)
+      10.5±0.3ms       12.2±0.6ms     1.17  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 18)
+         525±4μs         612±10μs     1.17  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 7000)
+         105±1ms          122±5ms     1.16  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 900000)
+      23.2±0.5ms         26.8±1ms     1.16  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'object'>, 18)
+      22.7±0.1μs       26.1±0.4μs     1.15  index_object.Indexing.time_get_loc_non_unique_sorted('Float')
+     2.82±0.03ms       3.25±0.1ms     1.15  groupby.TransformBools.time_transform_mean
+      11.2±0.1ms      12.9±0.09ms     1.15  categoricals.Isin.time_isin_categorical('int64')
+      5.32±0.2ms       6.10±0.1ms     1.15  algorithms.Factorize.time_factorize(False, False, 'uint')
+      12.7±0.5μs       14.6±0.7μs     1.14  index_cached_properties.IndexCache.time_is_all_dates('TimedeltaIndex')
+      22.9±0.2ms       26.1±0.3ms     1.14  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, -2)
+     2.11±0.05ms      2.41±0.05ms     1.14  timeseries.DatetimeIndex.time_add_timedelta('tz_naive')
+      59.9±0.9ms       68.3±0.8ms     1.14  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 750000)
+      20.6±0.2ms       23.4±0.4ms     1.14  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 2)
+     21.3±0.05ms       24.2±0.5ms     1.14  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 2)
+      64.9±0.5ms       73.3±0.8ms     1.13  hash_functions.IsinWithArange.time_isin(<class 'object'>, 1000, -2)
+      14.3±0.1ms       16.2±0.2ms     1.13  categoricals.Isin.time_isin_categorical('object')
+         254±3μs          287±6μs     1.13  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 2000)
+      59.9±0.7ms         67.4±1ms     1.13  gil.ParallelGroupbyMethods.time_loop(4, 'mean')
+      23.0±0.2ms       25.8±0.2ms     1.12  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 2)
+      22.1±0.4ms       24.7±0.4ms     1.12  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 2)
+     4.46±0.03ms      4.95±0.05ms     1.11  index_object.Indexing.time_get_loc_non_unique('Float')
+      61.1±0.6ms         67.6±2ms     1.11  gil.ParallelGroupbyMethods.time_loop(4, 'min')
+        60.3±2ms       66.6±0.8ms     1.10  gil.ParallelGroupbyMethods.time_loop(4, 'sum')
+      2.16±0.2μs       2.39±0.2μs     1.10  index_cached_properties.IndexCache.time_values('DatetimeIndex')
+         649±2ms          714±2ms     1.10  join_merge.I8Merge.time_i8merge('inner')
-        985±60μs         895±70μs     0.91  ctors.SeriesConstructors.time_series_constructor(<function list_of_str at 0x7f63734ef9d0>, False, 'float')
-      9.15±0.1μs       8.29±0.1μs     0.91  tslibs.timestamp.TimestampProperties.time_is_quarter_end(tzfile('/usr/share/zoneinfo/US/Central'), 'B')
-        316±10μs          286±5μs     0.91  groupby.GroupByMethods.time_dtype_as_group('float', 'head', 'direct')
-      8.04±0.6μs       7.25±0.6μs     0.90  index_cached_properties.IndexCache.time_engine('DatetimeIndex')
-     2.97±0.03ms      2.66±0.02ms     0.90  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 16)
-      2.91±0.1ms      2.61±0.04ms     0.90  series_methods.IsIn.time_isin('object')
-      1.99±0.3μs       1.76±0.2μs     0.88  index_cached_properties.IndexCache.time_inferred_type('PeriodIndex')
-         666±5ms         586±10ms     0.88  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)
-      4.00±0.2μs       3.50±0.2μs     0.88  index_cached_properties.IndexCache.time_values('UInt64Index')
-      4.12±0.3μs       3.60±0.2μs     0.87  index_cached_properties.IndexCache.time_inferred_type('MultiIndex')
-      6.53±0.7μs         5.65±1μs     0.87  index_cached_properties.IndexCache.time_shape('CategoricalIndex')
-        17.3±1μs       14.8±0.2μs     0.85  tslibs.tz_convert.TimeTZConvert.time_tz_localize_to_utc(1, tzlocal())
-     1.57±0.04ms      1.33±0.02ms     0.85  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 15)
-      2.13±0.2μs       1.81±0.2μs     0.85  index_cached_properties.IndexCache.time_values('PeriodIndex')
-      9.51±0.3ms       7.95±0.2ms     0.84  frame_methods.MaskBool.time_frame_mask_floats
-      19.3±0.3ms       16.1±0.3ms     0.84  join_merge.Merge.time_merge_dataframe_integer_2key(True)
-      5.43±0.4μs       4.45±0.2μs     0.82  index_cached_properties.IndexCache.time_shape('MultiIndex')
-      4.46±0.6μs       3.61±0.4μs     0.81  index_cached_properties.IndexCache.time_shape('DatetimeIndex')
-      30.1±0.2ms       23.3±0.2ms     0.78  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, -2)
-        7.86±1μs       5.99±0.6μs     0.76  index_cached_properties.IndexCache.time_shape('TimedeltaIndex')
-      3.71±0.5μs       2.78±0.2μs     0.75  index_cached_properties.IndexCache.time_shape('PeriodIndex')
-         981±5μs          728±9μs     0.74  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 14)
-         370±3μs          274±3μs     0.74  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 12)
-      8.65±0.4ms       6.40±0.1ms     0.74  join_merge.Merge.time_merge_dataframe_integer_2key(False)
-         308±4μs          204±2μs     0.66  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 11)
-      35.6±0.1ms       23.4±0.3ms     0.66  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, 2)
-        5.42±1μs       3.56±0.2μs     0.66  index_cached_properties.IndexCache.time_values('TimedeltaIndex')
-      25.9±0.4ms       16.6±0.5ms     0.64  hash_functions.UniqueAndFactorizeArange.time_factorize(7)
-     3.08±0.05ms      1.95±0.03ms     0.63  series_methods.ValueCounts.time_value_counts('float')
-      26.3±0.6ms       16.4±0.5ms     0.62  hash_functions.UniqueAndFactorizeArange.time_factorize(12)
-         696±7μs          431±6μs     0.62  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 13)
-     1.20±0.01ms         665±10μs     0.56  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 8000)
-      23.0±0.4ms       12.5±0.3ms     0.54  hash_functions.UniqueAndFactorizeArange.time_unique(12)
-      23.1±0.6ms       12.3±0.3ms     0.53  hash_functions.UniqueAndFactorizeArange.time_unique(7)
-        546±10μs          281±4μs     0.51  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 12)
-      38.3±0.2ms       17.0±0.2ms     0.45  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, 0)
-         477±8μs          204±2μs     0.43  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 11)
-         389±5μs          163±3μs     0.42  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 10)
-        57.5±2ms       22.9±0.3ms     0.40  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, 2)
-     1.42±0.03ms          532±5μs     0.37  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 2000)
-        888±10μs          332±6μs     0.37  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 1300)
-      62.3±0.5ms       23.2±0.1ms     0.37  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, -2)
-         738±5μs          273±3μs     0.37  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 2000)
-     1.44±0.01ms         531±10μs     0.37  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 2000)
-         996±9μs          343±4μs     0.34  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 1300)
-      5.13±0.1ms      1.68±0.04ms     0.33  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 7000)
-     5.97±0.09ms       1.94±0.2ms     0.33  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 8000)
-         617±7μs          199±5μs     0.32  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 1000)
-     5.05±0.06ms      1.57±0.02ms     0.31  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 7000)
-     6.23±0.05ms      1.83±0.05ms     0.29  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 8000)
-        69.9±2ms         18.7±1ms     0.27  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 70000)
-        91.9±3ms       23.0±0.4ms     0.25  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 80000)
-       102±0.3ms      23.9±0.09ms     0.23  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, -2)
-      3.11±0.03s          679±8ms     0.22  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 900000)
-      78.3±0.4ms       16.6±0.2ms     0.21  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, 0)
-      3.13±0.01s          635±5ms     0.20  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 900000)
-        83.0±2ms       16.0±0.3ms     0.19  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 70000)
-         120±6ms       21.8±0.6ms     0.18  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 80000)
-      2.80±0.02s          473±4ms     0.17  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 750000)
-      3.55±0.02s          527±3ms     0.15  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 750000)
-         131±2ms       16.5±0.3ms     0.13  hash_functions.UniqueAndFactorizeArange.time_factorize(11)
-         132±1ms       16.3±0.4ms     0.12  hash_functions.UniqueAndFactorizeArange.time_factorize(8)
-       128±0.7ms       15.7±0.1ms     0.12  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, 0)
-         125±2ms       12.1±0.4ms     0.10  hash_functions.UniqueAndFactorizeArange.time_unique(11)
-         129±1ms       12.3±0.4ms     0.10  hash_functions.UniqueAndFactorizeArange.time_unique(8)
-         553±4ms       16.5±0.3ms     0.03  hash_functions.UniqueAndFactorizeArange.time_factorize(10)
-        549±20ms       12.4±0.4ms     0.02  hash_functions.UniqueAndFactorizeArange.time_unique(10)
-      1.80±0.05s       27.8±0.8ms     0.02  hash_functions.Float64GroupIndex.time_groupby
-      1.14±0.01s       16.5±0.4ms     0.01  hash_functions.UniqueAndFactorizeArange.time_factorize(9)
-      1.04±0.01s       12.4±0.4ms     0.01  hash_functions.UniqueAndFactorizeArange.time_unique(9)

Only second-hash is strong (murmur2):

       before           after         ratio
     [9cb37237]       [18cf44ae]
+      1.77±0.1μs       2.74±0.7μs     1.55  index_cached_properties.IndexCache.time_values('DatetimeIndex')
+     3.36±0.08ms      4.96±0.07ms     1.48  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 100000)
+         472±3ms          628±3ms     1.33  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)
+      17.6±0.6ms       23.0±0.9ms     1.31  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
+      2.96±0.2μs       3.78±0.5μs     1.28  index_cached_properties.IndexCache.time_shape('DatetimeIndex')
+     3.52±0.06ms      4.46±0.04ms     1.27  algorithms.Duplicated.time_duplicated(False, False, 'uint')
+     3.98±0.04ms      5.03±0.09ms     1.26  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, -2)
+     4.94±0.08ms       6.20±0.2ms     1.26  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, 0)
+         608±9ms         759±70ms     1.25  frame_methods.Nunique.time_frame_nunique
+     2.01±0.03ms      2.49±0.05ms     1.24  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 16)
+         142±7ms          175±9ms     1.24  gil.ParallelReadCSV.time_read_csv('datetime')
+      38.8±0.2ms       47.8±0.3ms     1.23  series_methods.IsInFloat64.time_isin_few_different
+     4.68±0.04ms      5.71±0.04ms     1.22  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 0)
+      6.38±0.3ms       7.76±0.4ms     1.22  algorithms.Duplicated.time_duplicated(False, False, 'datetime64[ns]')
+      11.2±0.1ms       13.6±0.2ms     1.22  categoricals.Isin.time_isin_categorical('int64')
+         290±3μs          352±6μs     1.21  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+     2.24±0.05ms      2.71±0.06ms     1.21  series_methods.IsInForObjects.time_isin_long_series_short_values
+       115±0.5ms        138±0.4ms     1.21  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, 2)
+      54.6±0.8ms         65.6±1ms     1.20  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 1000000)
+      6.09±0.4μs         7.30±1μs     1.20  index_cached_properties.IndexCache.time_engine('PeriodIndex')
+        145±10ms          174±8ms     1.20  gil.ParallelReadCSV.time_read_csv('float')
+      1.86±0.2μs       2.23±0.3μs     1.20  index_cached_properties.IndexCache.time_inferred_type('DatetimeIndex')
+      17.9±0.5ms       21.2±0.2ms     1.19  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, -2)
+     1.19±0.03ms      1.41±0.01ms     1.18  algorithms.Factorize.time_factorize(True, True, 'boolean')
+      44.4±0.5ms       52.5±0.8ms     1.18  series_methods.IsInFloat64.time_isin_many_different
+      90.0±0.8ms        106±0.3ms     1.18  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, -2)
+     1.60±0.02ms      1.89±0.02ms     1.18  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+     6.02±0.04ms       7.10±0.1ms     1.18  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 0)
+     4.84±0.08ms      5.70±0.04ms     1.18  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 0)
+      18.2±0.3ms       21.3±0.2ms     1.17  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, 2)
+     5.42±0.08ms      6.32±0.09ms     1.17  algorithms.Factorize.time_factorize(False, True, 'boolean')
+        53.9±2ms         62.8±4ms     1.16  gil.ParallelGroupbyMethods.time_parallel(4, 'count')
+     6.69±0.09ms       7.79±0.2ms     1.16  timeseries.ResampleSeries.time_resample('datetime', '5min', 'ohlc')
+      5.51±0.1ms      6.41±0.06ms     1.16  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, -2)
+     6.13±0.07ms      7.09±0.08ms     1.16  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 0)
+      5.87±0.1ms       6.77±0.2ms     1.15  algorithms.Factorize.time_factorize(False, False, 'Int64')
+      7.12±0.2ms       8.20±0.3ms     1.15  hash_functions.UniqueAndFactorizeArange.time_unique(6)
+      14.5±0.1ms       16.7±0.3ms     1.15  categoricals.Isin.time_isin_categorical('object')
+      19.7±0.6ms       22.6±0.3ms     1.15  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, 2)
+     7.44±0.08ms       8.53±0.2ms     1.15  hash_functions.UniqueAndFactorizeArange.time_unique(4)
+         130±2ms          148±1ms     1.14  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'object'>, 20)
+     2.53±0.04ms      2.89±0.07ms     1.14  stat_ops.SeriesMultiIndexOps.time_op(0, 'prod')
+      1.77±0.2μs       2.02±0.2μs     1.14  index_cached_properties.IndexCache.time_values('PeriodIndex')
+      1.76±0.1μs       2.01±0.3μs     1.14  index_cached_properties.IndexCache.time_inferred_type('PeriodIndex')
+        60.2±1ms       68.7±0.7ms     1.14  gil.ParallelGroupbyMethods.time_loop(4, 'max')
+      14.1±0.2ms       16.0±0.6ms     1.14  hash_functions.UniqueAndFactorizeArange.time_factorize(13)
+     3.67±0.05ms      4.18±0.09ms     1.14  algorithms.Duplicated.time_duplicated(False, 'last', 'uint')
+      9.25±0.3ms       10.5±0.1ms     1.14  index_object.IntervalIndexMethod.time_intersection(100000)
+        82.4±1μs       93.4±0.2μs     1.13  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+        59.5±1ms       67.5±0.7ms     1.13  gil.ParallelGroupbyMethods.time_loop(4, 'prod')
+        57.0±1ms         64.6±1ms     1.13  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'object'>, 19)
+      64.5±0.7ms       72.6±0.2ms     1.13  hash_functions.IsinWithArange.time_isin(<class 'object'>, 1000, -2)
+      59.0±0.6ms         66.4±1ms     1.12  gil.ParallelGroupbyMethods.time_loop(4, 'last')
+      8.37±0.2ms       9.39±0.3ms     1.12  hash_functions.UniqueAndFactorizeArange.time_unique(15)
+      26.0±0.2ms       29.2±0.4ms     1.12  groupby.Groups.time_series_groups('int64_small')
+       118±0.8ms          133±1ms     1.12  gil.ParallelGroupbyMethods.time_loop(8, 'mean')
+     1.18±0.01ms      1.32±0.03ms     1.12  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 15)
+         132±2ms          148±2ms     1.12  gil.ParallelGroupbyMethods.time_loop(8, 'count')
+     1.31±0.02ms      1.46±0.02ms     1.12  timeseries.DatetimeIndex.time_normalize('repeated')
+        59.5±1ms         66.3±1ms     1.12  gil.ParallelGroupbyMethods.time_loop(4, 'mean')
+      4.77±0.3μs       5.32±0.4μs     1.12  index_cached_properties.IndexCache.time_shape('UInt64Index')
+         121±2ms          135±2ms     1.11  gil.ParallelGroupbyMethods.time_loop(8, 'max')
+         119±2ms          132±1ms     1.11  gil.ParallelGroupbyMethods.time_loop(8, 'prod')
+     1.31±0.01ms      1.46±0.02ms     1.11  timeseries.DatetimeIndex.time_normalize('tz_naive')
+      7.22±0.1ms      8.04±0.08ms     1.11  hash_functions.UniqueAndFactorizeArange.time_unique(5)
+      21.8±0.4ms       24.3±0.4ms     1.11  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 2)
+         120±3ms          133±3ms     1.11  gil.ParallelGroupbyMethods.time_loop(8, 'sum')
+     10.3±0.04ms       11.4±0.2ms     1.11  algorithms.Factorize.time_factorize(False, True, 'Int64')
+      30.4±0.4ms       33.7±0.8ms     1.11  gil.ParallelGroupbyMethods.time_loop(2, 'sum')
+     1.65±0.02ms      1.83±0.02ms     1.10  timeseries.DatetimeAccessor.time_dt_accessor_normalize(tzutc())
+      8.45±0.2ms       9.32±0.3ms     1.10  hash_functions.UniqueAndFactorizeArange.time_unique(14)
+     1.63±0.03ms      1.80±0.05ms     1.10  timeseries.DatetimeAccessor.time_dt_accessor_normalize('UTC')
+      23.6±0.7ms       26.0±0.2ms     1.10  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, -2)
-      72.3±0.6ms       65.7±0.3ms     0.91  hash_functions.IsinWithArange.time_isin(<class 'object'>, 1000, 2)
-         177±1ms          159±3ms     0.90  timeseries.ToDatetimeISO8601.time_iso8601_tz_spaceformat
-      5.43±0.6μs       4.87±0.5μs     0.90  index_cached_properties.IndexCache.time_shape('CategoricalIndex')
-      8.70±0.1ms       7.71±0.3ms     0.89  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 1000000)
-        88.9±1ms         78.3±2ms     0.88  timeseries.ToDatetimeCache.time_dup_string_tzoffset_dates(False)
-      7.72±0.7μs       6.76±0.5μs     0.88  index_cached_properties.IndexCache.time_shape('IntervalIndex')
-      4.10±0.5μs       3.55±0.2μs     0.87  index_cached_properties.IndexCache.time_inferred_type('IntervalIndex')
-         515±6μs          435±5μs     0.84  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 13)
-      19.4±0.3ms       16.0±0.3ms     0.82  join_merge.Merge.time_merge_dataframe_integer_2key(True)
-     8.05±0.09ms      6.45±0.07ms     0.80  join_merge.Merge.time_merge_dataframe_integer_2key(False)
-     1.00±0.01ms          790±3μs     0.79  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 14)
-         368±4μs          287±3μs     0.78  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 12)
-      3.14±0.2ms      2.34±0.05ms     0.74  series_methods.ValueCounts.time_value_counts('float')
-      18.5±0.2ms       12.9±0.7ms     0.70  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, 2)
-         713±6μs         483±30μs     0.68  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 13)
-         311±8μs          204±2μs     0.66  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 11)
-      30.3±0.7ms      19.9±0.08ms     0.66  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, -2)
-      27.3±0.5ms       17.5±0.6ms     0.64  hash_functions.UniqueAndFactorizeArange.time_factorize(7)
-      26.2±0.3ms       16.6±0.3ms     0.63  hash_functions.UniqueAndFactorizeArange.time_factorize(12)
-     1.20±0.01ms         744±10μs     0.62  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 8000)
-      22.9±0.5ms       12.7±0.4ms     0.55  hash_functions.UniqueAndFactorizeArange.time_unique(7)
-      22.6±0.6ms       12.2±0.4ms     0.54  hash_functions.UniqueAndFactorizeArange.time_unique(12)
-        556±10μs          296±5μs     0.53  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 12)
-        38.5±1ms      20.4±0.05ms     0.53  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, 0)
-         465±7μs          211±2μs     0.45  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 11)
-         389±3μs          169±3μs     0.44  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 10)
-        743±40μs          292±3μs     0.39  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 2000)
-     1.44±0.02ms         546±10μs     0.38  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 2000)
-     1.43±0.05ms         538±10μs     0.38  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 2000)
-         890±9μs         333±20μs     0.37  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 1300)
-      35.7±0.3ms      12.6±0.09ms     0.35  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, 2)
-         609±5μs          214±4μs     0.35  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 1000)
-     4.91±0.03ms      1.71±0.04ms     0.35  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 7000)
-        995±10μs          342±4μs     0.34  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 1300)
-     5.97±0.06ms      1.91±0.03ms     0.32  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 8000)
-      62.7±0.7ms       19.7±0.2ms     0.31  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, -2)
-     5.06±0.05ms      1.59±0.01ms     0.31  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 7000)
-     6.16±0.09ms      1.84±0.02ms     0.30  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 8000)
-        66.0±1ms       18.5±0.8ms     0.28  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 70000)
-        88.9±4ms       22.3±0.6ms     0.25  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 80000)
-      78.2±0.4ms      19.5±0.08ms     0.25  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, 0)
-      56.0±0.3ms       12.7±0.3ms     0.23  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, 2)
-      3.11±0.02s          688±3ms     0.22  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 900000)
-        80.3±3ms         16.9±1ms     0.21  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 70000)
-      3.09±0.02s          629±3ms     0.20  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 900000)
-         104±1ms       20.0±0.2ms     0.19  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, -2)
-         114±6ms       21.5±0.6ms     0.19  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 80000)
-      2.80±0.03s          467±5ms     0.17  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 750000)
-       128±0.5ms       19.4±0.2ms     0.15  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, 0)
-      3.49±0.01s         528±10ms     0.15  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 750000)
-         133±1ms       17.6±0.4ms     0.13  hash_functions.UniqueAndFactorizeArange.time_factorize(8)
-         131±2ms       16.2±0.2ms     0.12  hash_functions.UniqueAndFactorizeArange.time_factorize(11)
-         128±1ms       12.9±0.3ms     0.10  hash_functions.UniqueAndFactorizeArange.time_unique(8)
-       124±0.3ms       12.1±0.8ms     0.10  hash_functions.UniqueAndFactorizeArange.time_unique(11)
-        551±10ms      16.3±0.09ms     0.03  hash_functions.UniqueAndFactorizeArange.time_factorize(10)
-         548±6ms       12.0±0.5ms     0.02  hash_functions.UniqueAndFactorizeArange.time_unique(10)
-      1.87±0.02s       31.4±0.9ms     0.02  hash_functions.Float64GroupIndex.time_groupby
-      1.06±0.01s       17.7±0.6ms     0.02  hash_functions.UniqueAndFactorizeArange.time_factorize(9)
-      1.07±0.01s       12.9±0.4ms     0.01  hash_functions.UniqueAndFactorizeArange.time_unique(9)

@realead realead force-pushed the gh_28303_float_hash branch 2 times, most recently from c852b2e to efae65c Compare October 14, 2020 22:15
@realead realead closed this Oct 14, 2020
@realead
Copy link
Contributor Author

realead commented Oct 14, 2020

Now the comparison of different probing strategies:

The different probing strategies are interesting for cases where comparisons are cheap (thus cache misses play a role), e.g. float64/int64/uint64. For heavier types (PyObject, strings), it plays only a role when there are problems with robustness in such a a way that look-up costs more than O(n).

Linear probing:

Linear probing will be really bad for types with bad first (and only) hash-function at least for some series. This is what we see (see the whole comparison at the end of comment):

  • hash for PyObject is really bad, some test cases become more than 100 times slower
  • hash for int/uint is better, but there is still a "bad" series for which it becomes factor 20 slower
  • the used murmur2 hash for float64 is quite strong, no problematic series in the test suite.

Quadratic probing

Supposed to be better than linear probing, as more robust. This is what we also can observe (all results at the end):

  • PyObject is only 20 times slower
  • problematic series for int64/uint64 are only 2 times slower
  • otherwise gains are similar to linear

Combined probing:

Theoretically best of both worlds: robust as double hashing but as few cache misses as quadratic probing. It looks as expected: there are some cases with about 10% slow-down but many more cases with speed-ups, comparable with speed-ups from quadratic probing. Also for weak hashes nothing bad happens. The way it is implemented, combined probing can be switched off/on type-wise (it is only on for float64/(u)int64, in tests for PyObject no noteworthy changes were seen with combined probing).

Here are the all comparisons for combined probing:

      before           after         ratio
     [c852b2ee]       [5e44098d]
+     23.0±0.07ms       27.9±0.9ms     1.21  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 2)
+      1.93±0.2μs       2.29±0.2μs     1.18  index_cached_properties.IndexCache.time_inferred_type('DatetimeIndex')
+      24.7±0.3ms       28.5±0.7ms     1.15  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 2)
+      23.7±0.1ms       27.1±0.5ms     1.15  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, 2)
+      1.83±0.1μs       2.08±0.2μs     1.14  index_cached_properties.IndexCache.time_values('DatetimeIndex')
+      23.5±0.3ms       26.6±0.7ms     1.13  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, -2)
+      23.5±0.2ms       26.4±0.5ms     1.12  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, -2)
+      23.4±0.2ms       26.2±0.5ms     1.12  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, 2)
+      12.6±0.1ms       14.1±0.5ms     1.12  indexing.IntervalIndexing.time_loc_list
+     4.98±0.02ms      5.57±0.05ms     1.12  timeseries.ResampleSeries.time_resample('period', '1D', 'mean')
+      23.2±0.2ms       25.9±0.8ms     1.11  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, -2)
+        664±50ns         739±70ns     1.11  index_cached_properties.IndexCache.time_inferred_type('RangeIndex')
+     1.18±0.02ms      1.32±0.01ms     1.11  series_methods.IsInForObjects.time_isin_short_series_long_values
+      1.78±0.1μs       1.98±0.2μs     1.11  index_cached_properties.IndexCache.time_values('PeriodIndex')
+      3.66±0.2μs       4.05±0.4μs     1.11  index_cached_properties.IndexCache.time_inferred_type('Float64Index')
+     4.28±0.04ms      4.72±0.08ms     1.10  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 80000)
-     3.34±0.05μs      3.04±0.02μs     0.91  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('time', 0, tzlocal())
-        16.9±1μs       15.4±0.2μs     0.91  dtypes.Dtypes.time_pandas_dtype('timedelta64')
-      1.05±0.2ms         949±60μs     0.91  ctors.SeriesConstructors.time_series_constructor(<function list_of_str at 0x7f8dd9d979d0>, False, 'int')
-      5.01±0.1ms      4.54±0.03ms     0.91  index_object.Indexing.time_get_loc_non_unique('Float')
-      3.08±0.3ms      2.79±0.07ms     0.91  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'linear')
-        705±10ms          638±5ms     0.90  join_merge.I8Merge.time_i8merge('outer')
-         727±4ms          656±4ms     0.90  join_merge.I8Merge.time_i8merge('left')
-     3.19±0.05μs      2.88±0.02μs     0.90  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('time', 0, datetime.timezone.utc)
-     2.79±0.03ms      2.52±0.02ms     0.90  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      34.8±0.5ms       31.4±0.7ms     0.90  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.uint64'>, 20)
-         393±4ms          354±5ms     0.90  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 5000000)
-     3.34±0.07μs      3.00±0.06μs     0.90  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('time', 0, None)
-         311±3μs          279±3μs     0.90  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.int64'>, 8000)
-        3.51±1ms       3.14±0.1ms     0.90  ctors.SeriesConstructors.time_series_constructor(<class 'list'>, False, 'int')
-      1.44±0.5ms      1.29±0.08ms     0.90  frame_ctor.FromRecords.time_frame_from_records_generator(1000)
-         113±1ms        101±0.8ms     0.90  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 900000)
-        72.1±2ms       64.4±0.9ms     0.89  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 20)
-        13.3±1μs       11.9±0.6μs     0.89  index_cached_properties.IndexCache.time_is_all_dates('TimedeltaIndex')
-     3.70±0.08μs      3.29±0.05μs     0.89  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('date', 1, datetime.timezone.utc)
-      3.32±0.8ms       2.95±0.1ms     0.89  ctors.SeriesConstructors.time_series_constructor(<function arr_dict at 0x7f8dd9d97af0>, False, 'float')
-        88.9±1ms         79.0±1ms     0.89  multiindex_object.Duplicated.time_duplicated
-         394±2ms          349±4ms     0.89  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 5000000)
-      12.5±0.6ms       11.1±0.6ms     0.89  hash_functions.UniqueAndFactorizeArange.time_unique(10)
-         753±2ms          667±6ms     0.88  join_merge.I8Merge.time_i8merge('right')
-      4.20±0.1μs      3.72±0.06μs     0.88  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('datetime', 1, None)
-      3.14±0.3ms      2.77±0.03ms     0.88  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'higher')
-        42.4±2ms       37.4±0.4ms     0.88  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0.5, 'midpoint')
-         138±5ms          122±3ms     0.88  gil.ParallelGroupbyMethods.time_loop(8, 'last')
-         196±2ms          173±2ms     0.88  sparse.SparseSeriesToFrame.time_series_to_frame
-        35.1±4ms       30.9±0.8ms     0.88  gil.ParallelGroupbyMethods.time_loop(2, 'sum')
-      32.0±0.8ms       28.1±0.4ms     0.88  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.int64'>, 20)
-      12.4±0.6ms       10.9±0.4ms     0.88  hash_functions.UniqueAndFactorizeArange.time_unique(15)
-        137±10ms          120±1ms     0.88  gil.ParallelGroupbyMethods.time_loop(8, 'prod')
-        12.7±2ms       11.0±0.5ms     0.87  hash_functions.UniqueAndFactorizeArange.time_unique(11)
-        958±10μs         834±30μs     0.87  timeseries.DatetimeIndex.time_unique('repeated')
-      1.01±0.3ms         876±40μs     0.87  ctors.SeriesConstructors.time_series_constructor(<function list_of_str at 0x7f8dd9d979d0>, False, 'float')
-        29.1±2μs         25.3±2μs     0.87  index_cached_properties.IndexCache.time_engine('CategoricalIndex')
-      12.5±0.4ms       10.9±0.4ms     0.87  hash_functions.UniqueAndFactorizeArange.time_unique(4)
-        40.1±3ms       34.8±0.3ms     0.87  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0.5, 'higher')
-      1.04±0.2ms         903±50μs     0.87  ctors.SeriesConstructors.time_series_constructor(<function list_of_str at 0x7f8dd9d979d0>, True, 'float')
-         141±6ms          122±1ms     0.87  gil.ParallelGroupbyMethods.time_loop(8, 'min')
-      1.44±0.2ms      1.25±0.08ms     0.87  ctors.SeriesConstructors.time_series_constructor(<class 'list'>, False, 'float')
-       94.5±20μs        81.7±10μs     0.86  ctors.SeriesConstructors.time_series_constructor(<function no_change at 0x7f8dd9d97940>, False, 'float')
-        72.5±1ms         62.6±1ms     0.86  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 750000)
-        3.64±1ms       3.13±0.1ms     0.86  ctors.SeriesConstructors.time_series_constructor(<function arr_dict at 0x7f8dd9d97af0>, True, 'float')
-        580±10ms          499±6ms     0.86  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)
-        40.2±2ms       34.5±0.3ms     0.86  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0.5, 'nearest')
-      83.8±0.6ms       71.9±0.3ms     0.86  series_methods.IsInFloat64.time_isin_many_different
-     3.41±0.03ms      2.92±0.03ms     0.86  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      12.6±0.5ms       10.8±0.4ms     0.85  hash_functions.UniqueAndFactorizeArange.time_unique(8)
-         106±1ms         89.9±2ms     0.85  join_merge.Align.time_series_align_int64_index
-        580±10ms          492±4ms     0.85  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)
-        41.2±3ms       34.7±0.2ms     0.84  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0.5, 'lower')
-      1.53±0.5ms      1.28±0.09ms     0.84  ctors.SeriesConstructors.time_series_constructor(<class 'list'>, True, 'float')
-        76.7±4ms         63.5±2ms     0.83  gil.ParallelGroupbyMethods.time_loop(4, 'max')
-      2.69±0.5ms      2.23±0.03ms     0.83  frame_methods.Lookup.time_frame_fancy_lookup
-         123±1ms          102±2ms     0.83  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 900000)
-      1.96±0.1ms      1.62±0.03ms     0.83  dtypes.SelectDtypes.time_select_dtype_float_exclude('UInt8')
-      73.8±0.9ms         61.0±1ms     0.83  gil.ParallelGroupbyMethods.time_loop(4, 'sum')
-         134±2ms          111±1ms     0.83  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 1000000)
-        74.7±4ms         61.4±1ms     0.82  gil.ParallelGroupbyMethods.time_loop(4, 'last')
-      92.0±0.8ms         75.2±1ms     0.82  join_merge.Align.time_series_align_left_monotonic
-        120±40μs         97.3±8μs     0.81  ctors.SeriesConstructors.time_series_constructor(<function no_change at 0x7f8dd9d97940>, True, 'float')
-      8.13±0.1ms      6.59±0.09ms     0.81  index_object.SetOperations.time_operation('int', 'symmetric_difference')
-      13.4±0.6ms       10.9±0.4ms     0.81  hash_functions.UniqueAndFactorizeArange.time_unique(12)
-       659±200ms          533±9ms     0.81  categoricals.Indexing.time_reindex_missing
-        7.14±1ms      5.71±0.07ms     0.80  algorithms.Factorize.time_factorize(False, True, 'boolean')
-      2.03±0.4ms      1.62±0.03ms     0.79  arithmetic.NumericInferOps.time_divide(<class 'numpy.uint8'>)
-       77.0±20ms       61.2±0.8ms     0.79  gil.ParallelGroupbyMethods.time_loop(4, 'mean')
-        7.72±1μs       6.11±0.5μs     0.79  index_cached_properties.IndexCache.time_is_all_dates('DatetimeIndex')
-      5.39±0.8ms       4.25±0.2ms     0.79  algorithms.Factorize.time_factorize(True, False, 'datetime64[ns]')
-        151±20ms          119±2ms     0.79  frame_methods.Duplicated.time_frame_duplicated
-      3.55±0.3ms      2.79±0.06ms     0.79  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'midpoint')
-        37.6±4ms       29.5±0.4ms     0.79  frame_ctor.FromLists.time_frame_from_lists
-      9.55±0.1ms       7.44±0.1ms     0.78  index_object.SetOperations.time_operation('datetime', 'symmetric_difference')
-        38.8±3ms       30.0±0.6ms     0.77  arithmetic.IrregularOps.time_add
-       635±200μs          490±7μs     0.77  arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<Day>)
-     7.32±0.05ms       5.59±0.2ms     0.76  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, 0)
-     2.16±0.01ms      1.63±0.08ms     0.75  series_methods.NanOps.time_func('prod', 1000000, 'int8')
-     3.12±0.06ms       2.34±0.2ms     0.75  arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('sub')
-      7.10±0.1ms      5.13±0.07ms     0.72  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 0)
-      1.71±0.1ms      1.24±0.01ms     0.72  frame_methods.Iteration.time_items_cached
-     7.06±0.06ms      5.08±0.08ms     0.72  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 0)
-        23.2±3ms      16.6±0.07ms     0.72  frame_ctor.FromDictwithTimestamp.time_dict_with_timestamp_offsets(<Nano>)
-       51.5±30ms         36.6±1ms     0.71  gil.ParallelGroupbyMethods.time_loop(2, 'var')
-        214±20ms        152±0.8ms     0.71  frame_ctor.FromDicts.time_nested_dict_int64
-        132±80μs         92.4±6μs     0.70  ctors.SeriesConstructors.time_series_constructor(<function no_change at 0x7f8dd9d97940>, True, 'int')
-      1.42±0.3ms          991±9μs     0.70  dtypes.SelectDtypes.time_select_dtype_int_exclude('Int64')
-       718±200μs          500±8μs     0.70  categoricals.Constructor.time_interval
-     5.92±0.06ms      4.03±0.08ms     0.68  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, 0)
-     6.40±0.09ms      4.26±0.05ms     0.67  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, -2)
-     5.63±0.02ms      3.74±0.05ms     0.66  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 0)
-     1.73±0.02ms      1.14±0.08ms     0.66  series_methods.NanOps.time_func('sum', 1000000, 'int8')
-      6.17±0.4ms      4.02±0.06ms     0.65  arithmetic.IndexArithmetic.time_divide('int')
-      5.79±0.1ms      3.77±0.04ms     0.65  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 0)
-       95.3±40ms         61.6±2ms     0.65  gil.ParallelGroupbyMethods.time_loop(4, 'prod')
-       50.1±20ms       31.1±0.5ms     0.62  gil.ParallelGroupbyMethods.time_loop(2, 'min')
-      5.13±0.4ms      2.97±0.03ms     0.58  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, -2)
-      4.02±0.5ms       2.30±0.1ms     0.57  arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function le>)
-       831±100μs         453±10μs     0.54  arithmetic.NumericInferOps.time_add(<class 'numpy.uint16'>)
-        763±10μs        414±100μs     0.54  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int8Engine'>, <class 'numpy.int8'>), 'monotonic_incr')
-        774±10μs         342±10μs     0.44  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int16Engine'>, <class 'numpy.int16'>), 'monotonic_incr')
-      26.1±0.5ms       8.32±0.3ms     0.32  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 2)
-      25.9±0.2ms       7.93±0.3ms     0.31  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, -2)
-      24.0±0.1ms       6.87±0.2ms     0.29  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 2)
-      24.5±0.2ms       6.70±0.2ms     0.27  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, -2)
-        92.1±1ms        130±0.9μs     0.00  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')

Other comparisons

Linear probing

       before           after         ratio
     [c852b2ee]       [1dc7af75]
!      96.5±0.5ms           failed      n/a  hash_functions.IsinWithArange.time_isin(<class 'object'>, 8000, -2)
!         110±1ms           failed      n/a  hash_functions.IsinWithArange.time_isin(<class 'object'>, 8000, 2)
+         109±1ms       12.8±0.01s   117.14  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, -2)
+      66.9±0.8ms       7.45±0.01s   111.28  hash_functions.IsinWithArange.time_isin(<class 'object'>, 1000, 2)
+         143±1ms       14.9±0.04s   104.27  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, 2)
+        73.5±2ms       6.40±0.01s    87.12  hash_functions.IsinWithArange.time_isin(<class 'object'>, 1000, -2)
+      24.2±0.9ms          611±2ms    25.30  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 2)
+        25.9±1ms          614±4ms    23.66  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 2)
+         520±6ms       11.8±0.03s    22.73  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 750000)
+      21.1±0.1ms         394±10ms    18.68  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 80000)
+         628±4ms       11.1±0.07s    17.60  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 900000)
+      16.0±0.4ms         281±10ms    17.59  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 70000)
+        480±10ms       8.15±0.05s    16.97  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 750000)
+     1.84±0.03ms       27.8±0.2ms    15.07  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 8000)
+         348±8μs      4.87±0.06ms    13.97  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 1300)
+     1.64±0.02ms       22.7±0.3ms    13.85  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 7000)
+         538±7μs       6.96±0.1ms    12.93  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 2000)
+         683±4ms       8.59±0.06s    12.59  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 900000)
+      22.4±0.3ms          263±9ms    11.75  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 80000)
+     1.90±0.02ms       21.9±0.6ms    11.50  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 8000)
+      17.8±0.5ms          194±4ms    10.88  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 70000)
+     1.69±0.02ms       17.9±0.2ms    10.58  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 7000)
+         332±6μs      3.42±0.02ms    10.32  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 1300)
+         537±5μs      5.41±0.03ms    10.08  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 2000)
+      33.6±0.4ms          145±1ms     4.31  io.csv.ReadCSVConcatDatetime.time_read_csv
+     6.47±0.05ms       26.5±0.1ms     4.09  join_merge.Merge.time_merge_dataframe_integer_2key(False)
+      94.0±0.6ms          324±4ms     3.44  multiindex_object.GetLoc.time_large_get_loc
+      16.2±0.1ms       53.1±0.5ms     3.28  join_merge.Merge.time_merge_dataframe_integer_2key(True)
+       111±0.8ms          340±2ms     3.06  multiindex_object.GetLoc.time_large_get_loc_warm
+         662±8μs      1.17±0.04ms     1.77  groupby.GroupByMethods.time_dtype_as_field('datetime', 'nunique', 'transformation')
+     7.23±0.06ms       12.6±0.6ms     1.74  io.csv.ReadUint64Integers.time_read_uint64_neg_values
+         665±6μs      1.15±0.01ms     1.72  groupby.GroupByMethods.time_dtype_as_field('datetime', 'nunique', 'direct')
+      7.78±0.3ms       12.5±0.1ms     1.61  io.csv.ReadUint64Integers.time_read_uint64_na_values
+         224±3μs          332±9μs     1.48  timeseries.DatetimeIndex.time_unique('dst')
+     1.17±0.02ms      1.67±0.04ms     1.43  groupby.GroupByMethods.time_dtype_as_field('datetime', 'value_counts', 'direct')
+        23.3±2ms       33.2±0.6ms     1.42  eval.Eval.time_and('numexpr', 'all')
+     1.20±0.03ms      1.68±0.03ms     1.41  groupby.GroupByMethods.time_dtype_as_field('datetime', 'value_counts', 'transformation')
+      6.76±0.1ms       8.67±0.9ms     1.28  timeseries.ResampleSeries.time_resample('datetime', '5min', 'ohlc')
+      24.7±0.6ms         31.6±1ms     1.28  categoricals.SetCategories.time_set_categories
+         142±5ms          178±9ms     1.25  gil.ParallelReadCSV.time_read_csv('float')
+      13.0±0.2ms       16.1±0.3ms     1.24  categoricals.Isin.time_isin_categorical('int64')
+         387±3ms          463±2ms     1.20  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 5000000)
+     2.10±0.03ms       2.50±0.1ms     1.19  io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(False, 'ymd')
+         386±2ms          459±2ms     1.19  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 5000000)
+     4.37±0.01ms       5.12±0.2ms     1.17  timeseries.ResampleSeries.time_resample('datetime', '5min', 'mean')
+      49.8±0.8ms       58.0±0.6ms     1.17  index_object.SetOperations.time_operation('date_string', 'symmetric_difference')
+     7.59±0.06ms       8.78±0.7ms     1.16  frame_methods.MaskBool.time_frame_mask_floats
+     2.86±0.03ms       3.29±0.1ms     1.15  io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(True, 'ymd')
+      16.4±0.2ms       18.6±0.3ms     1.13  categoricals.Isin.time_isin_categorical('object')
+      12.7±0.2ms       14.4±0.4ms     1.13  algorithms.Hashing.time_series_string
-      4.05±0.1ms      3.68±0.07ms     0.91  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 80000)
-     1.38±0.04ms      1.25±0.01ms     0.91  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.uint64'>, 16)
-     1.80±0.06μs      1.63±0.02μs     0.91  index_object.Float64IndexMethod.time_get_loc
-         245±5ms          221±4ms     0.90  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      3.91±0.2ms       3.52±0.1ms     0.90  stat_ops.FrameMultiIndexOps.time_op(1, 'var')
-     5.78±0.05ms      5.21±0.08ms     0.90  reindex.DropDuplicates.time_frame_drop_dups_int(True)
-      14.9±0.4ms       13.4±0.2ms     0.90  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.int64'>, 19)
-      24.5±0.2ms       22.0±0.2ms     0.90  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, 0)
-      4.84±0.1ms      4.35±0.07ms     0.90  stat_ops.SeriesMultiIndexOps.time_op(1, 'sem')
-     2.54±0.07ms      2.28±0.02ms     0.90  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.int64'>, 17)
-     3.14±0.07ms      2.81±0.04ms     0.90  stat_ops.FrameMultiIndexOps.time_op(0, 'mean')
-     3.47±0.07ms      3.11±0.04ms     0.90  stat_ops.SeriesMultiIndexOps.time_op(0, 'var')
-      3.26±0.2ms      2.92±0.05ms     0.90  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 70000)
-         249±9ms          222±5ms     0.89  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      2.04±0.2μs       1.83±0.2μs     0.89  index_cached_properties.IndexCache.time_values('PeriodIndex')
-        1.28±0ms      1.14±0.02ms     0.89  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.int64'>, 16)
-         295±6μs          263±2μs     0.89  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.int64'>, 13)
-      26.2±0.6ms       23.3±0.1ms     0.89  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 19)
-      3.15±0.1ms      2.81±0.04ms     0.89  stat_ops.FrameMultiIndexOps.time_op(0, 'sum')
-        672±10μs          598±8μs     0.89  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.int64'>, 15)
-      5.07±0.2ms       4.51±0.1ms     0.89  stat_ops.SeriesMultiIndexOps.time_op(0, 'sem')
-      3.68±0.5μs       3.27±0.2μs     0.89  index_cached_properties.IndexCache.time_inferred_type('CategoricalIndex')
-         770±8μs         681±10μs     0.88  series_methods.IsInForObjects.time_isin_nans
-      3.93±0.5μs       3.48±0.2μs     0.88  index_cached_properties.IndexCache.time_values('IntervalIndex')
-     1.41±0.02ms      1.25±0.02ms     0.88  series_methods.ValueCounts.time_value_counts('int')
-     3.23±0.08ms      2.85±0.06ms     0.88  reindex.DropDuplicates.time_frame_drop_dups_bool(False)
-      39.7±0.5ms       34.9±0.8ms     0.88  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.uint64'>, 20)
-     3.19±0.07ms      2.81±0.04ms     0.88  stat_ops.FrameMultiIndexOps.time_op(1, 'sum')
-        316±20μs          278±1μs     0.88  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.uint64'>, 13)
-      12.4±0.3ms       10.9±0.3ms     0.88  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 18)
-     1.45±0.03ms      1.27±0.04ms     0.88  series_methods.ValueCounts.time_value_counts('uint')
-      2.98±0.1ms      2.62±0.06ms     0.88  stat_ops.SeriesMultiIndexOps.time_op(0, 'prod')
-      17.0±0.3ms       14.9±0.1ms     0.88  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.uint64'>, 19)
-        65.3±1ms         57.1±2ms     0.87  gil.ParallelGroupbyMethods.time_loop(4, 'last')
-     3.90±0.05ms      3.41±0.02ms     0.87  timeseries.DatetimeIndex.time_add_timedelta('tz_aware')
-        27.0±1ms       23.6±0.6ms     0.87  reindex.DropDuplicates.time_frame_drop_dups_int(False)
-      16.3±0.6ms       14.2±0.2ms     0.87  hash_functions.UniqueAndFactorizeArange.time_factorize(8)
-      7.44±0.1ms       6.49±0.2ms     0.87  algorithms.Duplicated.time_duplicated(False, 'last', 'float')
-         134±2ms          117±1ms     0.87  gil.ParallelGroupbyMethods.time_loop(8, 'max')
-     4.97±0.06ms      4.33±0.05ms     0.87  index_object.Indexing.time_get_loc_non_unique('Float')
-      37.3±0.5ms       32.4±0.2ms     0.87  algorithms.Factorize.time_factorize(False, False, 'string')
-     7.65±0.08ms      6.65±0.08ms     0.87  algorithms.Factorize.time_factorize(True, True, 'datetime64[ns]')
-         132±2ms          114±3ms     0.87  gil.ParallelGroupbyMethods.time_loop(8, 'sum')
-         159±6ms          137±1ms     0.86  gil.ParallelGroupbyMethods.time_loop(8, 'var')
-     1.79±0.03ms      1.54±0.03ms     0.86  timeseries.DatetimeAccessor.time_dt_accessor_normalize(None)
-      7.81±0.1ms       6.74±0.1ms     0.86  io.hdf.HDFStoreDataFrame.time_read_store_table
-     1.05±0.04ms         903±10μs     0.86  groupby.GroupByMethods.time_dtype_as_group('int', 'cummax', 'transformation')
-        78.2±3ms         67.4±1ms     0.86  gil.ParallelGroupbyMethods.time_loop(4, 'var')
-     3.32±0.03ms      2.86±0.07ms     0.86  timeseries.DatetimeIndex.time_unique('tz_aware')
-      33.9±0.9ms       29.2±0.7ms     0.86  gil.ParallelGroupbyMethods.time_loop(2, 'max')
-        39.3±1ms       33.8±0.3ms     0.86  gil.ParallelGroupbyMethods.time_loop(2, 'var')
-      34.3±0.6ms       29.4±0.9ms     0.86  gil.ParallelGroupbyMethods.time_loop(2, 'prod')
-     1.79±0.01ms      1.54±0.02ms     0.86  timeseries.DatetimeAccessor.time_dt_accessor_normalize('UTC')
-        29.6±2ms       25.3±0.4ms     0.85  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 19)
-         134±7ms          114±2ms     0.85  gil.ParallelGroupbyMethods.time_loop(8, 'mean')
-         133±3ms          114±2ms     0.85  gil.ParallelGroupbyMethods.time_loop(8, 'min')
-      6.62±0.2ms       5.65±0.2ms     0.85  stat_ops.FrameMultiIndexOps.time_op([0, 1], 'sum')
-      2.63±0.1ms      2.24±0.04ms     0.85  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 500000)
-      67.8±0.6ms         57.8±1ms     0.85  gil.ParallelGroupbyMethods.time_loop(4, 'prod')
-         135±4ms          114±2ms     0.85  gil.ParallelGroupbyMethods.time_loop(8, 'prod')
-         147±3ms          125±1ms     0.85  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'object'>, 20)
-      35.1±0.5ms       29.7±0.6ms     0.85  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.uint64'>, 20)
-     3.34±0.09ms      2.82±0.05ms     0.84  timeseries.DatetimeIndex.time_unique('tz_local')
-         133±4ms          112±2ms     0.84  gil.ParallelGroupbyMethods.time_loop(8, 'last')
-      68.0±0.9ms         57.3±1ms     0.84  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'object'>, 19)
-         482±3μs          406±5μs     0.84  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      5.10±0.1ms      4.29±0.06ms     0.84  stat_ops.SeriesMultiIndexOps.time_op(1, 'median')
-        67.4±1ms       56.8±0.8ms     0.84  gil.ParallelGroupbyMethods.time_loop(4, 'min')
-     2.97±0.07ms      2.50±0.06ms     0.84  stat_ops.SeriesMultiIndexOps.time_op(1, 'prod')
-      12.2±0.7ms       10.3±0.2ms     0.84  hash_functions.UniqueAndFactorizeArange.time_unique(11)
-      7.60±0.6ms       6.38±0.6ms     0.84  algorithms.Duplicated.time_duplicated(False, False, 'datetime64[ns]')
-     1.45±0.03ms      1.21±0.03ms     0.84  timeseries.DatetimeIndex.time_normalize('repeated')
-      31.9±0.8ms       26.7±0.5ms     0.84  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.int64'>, 20)
-     1.83±0.04ms      1.53±0.03ms     0.84  timeseries.DatetimeAccessor.time_dt_accessor_normalize(tzutc())
-      12.1±0.5ms       10.2±0.2ms     0.84  hash_functions.UniqueAndFactorizeArange.time_unique(13)
-      6.82±0.1ms       5.70±0.2ms     0.83  algorithms.Factorize.time_factorize(False, False, 'Int64')
-      12.1±0.4ms       10.1±0.3ms     0.83  hash_functions.UniqueAndFactorizeArange.time_unique(8)
-      12.2±0.5ms       10.1±0.3ms     0.83  hash_functions.UniqueAndFactorizeArange.time_unique(15)
-      34.1±0.7ms       28.2±0.5ms     0.83  gil.ParallelGroupbyMethods.time_loop(2, 'sum')
-      71.4±0.8ms         58.8±1ms     0.82  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 20)
-      12.3±0.3ms       10.1±0.1ms     0.82  hash_functions.UniqueAndFactorizeArange.time_unique(9)
-      17.4±0.8ms       14.3±0.5ms     0.82  hash_functions.UniqueAndFactorizeArange.time_factorize(5)
-         620±9μs          507±5μs     0.82  categoricals.Constructor.time_interval
-      7.82±0.2ms       6.39±0.3ms     0.82  algorithms.Duplicated.time_duplicated(False, False, 'datetime64[ns, tz]')
-         104±1ms         84.8±1ms     0.82  join_merge.Align.time_series_align_int64_index
-      12.4±0.5ms       10.1±0.2ms     0.82  hash_functions.UniqueAndFactorizeArange.time_unique(10)
-         758±4ms          618±4ms     0.82  join_merge.I8Merge.time_i8merge('right')
-         706±3ms          575±7ms     0.81  join_merge.I8Merge.time_i8merge('outer')
-         732±5ms          596±4ms     0.81  join_merge.I8Merge.time_i8merge('left')
-      4.29±0.1ms      3.49±0.08ms     0.81  algorithms.Duplicated.time_duplicated(False, 'first', 'int')
-      12.3±0.5ms       10.0±0.2ms     0.81  hash_functions.UniqueAndFactorizeArange.time_unique(5)
-      12.4±0.6ms       10.0±0.3ms     0.81  hash_functions.UniqueAndFactorizeArange.time_unique(12)
-         709±4ms          576±4ms     0.81  join_merge.I8Merge.time_i8merge('inner')
-     4.08±0.03ms      3.31±0.04ms     0.81  algorithms.Duplicated.time_duplicated(False, 'first', 'uint')
-     3.00±0.02ms      2.42±0.05ms     0.81  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      12.4±0.5ms       9.99±0.3ms     0.81  hash_functions.UniqueAndFactorizeArange.time_unique(4)
-         260±2μs          209±7μs     0.80  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 500000)
-        492±10μs         395±10μs     0.80  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 1000000)
-         113±1ms         90.9±1ms     0.80  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 900000)
-      6.25±0.1ms       5.00±0.1ms     0.80  algorithms.Factorize.time_factorize(False, False, 'uint')
-      2.52±0.2ms      2.01±0.02ms     0.80  timeseries.DatetimeIndex.time_add_timedelta('tz_naive')
-      88.0±0.8ms       70.0±0.6ms     0.80  multiindex_object.Duplicated.time_duplicated
-      73.0±0.2μs       58.0±0.4μs     0.79  indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-        83.1±2ms       66.0±0.5ms     0.79  series_methods.IsInFloat64.time_isin_many_different
-      4.37±0.9μs       3.46±0.2μs     0.79  index_cached_properties.IndexCache.time_values('UInt64Index')
-      4.51±0.2ms       3.57±0.2ms     0.79  algorithms.Duplicated.time_duplicated(False, False, 'int')
-     5.02±0.03ms      3.95±0.04ms     0.79  algorithms.Factorize.time_factorize(True, False, 'datetime64[ns]')
-        7.90±1ms       6.18±0.3ms     0.78  stat_ops.FrameMultiIndexOps.time_op([0, 1], 'var')
-        89.9±2ms         70.3±2ms     0.78  join_merge.Align.time_series_align_left_monotonic
-     5.00±0.09ms      3.91±0.08ms     0.78  algorithms.Factorize.time_factorize(True, False, 'datetime64[ns, tz]')
-      1.93±0.2ms      1.51±0.02ms     0.78  series_methods.NanOps.time_func('prod', 1000000, 'int8')
-     3.51±0.09ms      2.73±0.02ms     0.78  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     6.69±0.09ms       5.17±0.2ms     0.77  algorithms.Factorize.time_factorize(False, True, 'boolean')
-      2.33±0.6μs       1.80±0.1μs     0.77  index_cached_properties.IndexCache.time_inferred_type('PeriodIndex')
-      9.39±0.3ms       7.24±0.2ms     0.77  index_object.SetOperations.time_operation('datetime', 'symmetric_difference')
-     2.71±0.02ms      2.08±0.03ms     0.77  series_methods.IsInForObjects.time_isin_long_series_short_values
-     1.49±0.03ms      1.14±0.02ms     0.77  algorithms.Factorize.time_factorize(True, True, 'boolean')
-         137±3ms          105±3ms     0.76  frame_methods.Duplicated.time_frame_duplicated
-        74.6±3ms       56.9±0.5ms     0.76  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 750000)
-      3.68±0.8μs       2.79±0.2μs     0.76  index_cached_properties.IndexCache.time_shape('PeriodIndex')
-      23.4±0.6ms       17.6±0.1ms     0.75  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, 2)
-     2.55±0.08ms      1.91±0.05ms     0.75  arithmetic.FrameWithFrameWide.time_op_same_blocks(<built-in function add>)
-      21.6±0.3ms       16.2±0.3ms     0.75  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, 2)
-       156±0.3μs          117±1μs     0.75  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-         133±2ms         99.4±1ms     0.75  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 1000000)
-      10.3±0.3ms       7.59±0.2ms     0.74  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 1000000)
-     1.17±0.01ms         861±10μs     0.74  algorithms.Factorize.time_factorize(True, False, 'boolean')
-         124±2ms       90.7±0.2ms     0.73  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 900000)
-         589±8ms          432±2ms     0.73  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)
-      24.0±0.7ms       17.5±0.2ms     0.73  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, -2)
-         587±8ms          428±4ms     0.73  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)
-      22.4±0.5ms       16.2±0.1ms     0.72  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, -2)
-         533±6μs          379±1μs     0.71  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      7.54±0.2ms       5.34±0.1ms     0.71  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, 0)
-      47.8±0.3ms       33.7±0.2ms     0.70  series_methods.IsInFloat64.time_isin_few_different
-      48.0±0.9ms       33.7±0.3ms     0.70  series_methods.IsInFloat64.time_isin_nan_values
-      6.69±0.4ms       4.70±0.1ms     0.70  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, -2)
-      7.25±0.1ms      5.06±0.05ms     0.70  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 0)
-      7.33±0.3ms      5.08±0.04ms     0.69  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 0)
-      4.53±0.1ms      3.12±0.04ms     0.69  algorithms.Duplicated.time_duplicated(False, False, 'uint')
-        964±30μs         662±20μs     0.69  timeseries.DatetimeIndex.time_unique('repeated')
-      6.20±0.1ms      4.02±0.02ms     0.65  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, 0)
-      5.73±0.1ms      3.65±0.03ms     0.64  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 0)
-     5.37±0.05ms      3.31±0.07ms     0.62  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, -2)
-      6.12±0.1ms       3.70±0.1ms     0.60  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 0)
-       641±100μs          295±3μs     0.46  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int8Engine'>, <class 'numpy.int8'>), 'monotonic_incr')
-      26.9±0.6ms       6.48±0.1ms     0.24  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 2)
-        26.7±1ms       6.20±0.1ms     0.23  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, -2)
-      25.3±0.8ms       5.08±0.1ms     0.20  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 2)
-      25.1±0.4ms      4.80±0.07ms     0.19  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, -2)

Quadratic probing:

       before           after         ratio
     [c852b2ee]       [59600e58]
+         533±5ms       15.3±0.03s    28.66  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 750000)
+         474±8ms          11.8±0s    24.81  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 750000)
+      21.6±0.5ms         502±10ms    23.24  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 80000)
+      16.5±0.4ms         339±20ms    20.50  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 70000)
+         631±6ms       12.1±0.02s    19.25  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 900000)
+      22.5±0.5ms         386±10ms    17.19  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 80000)
+      18.2±0.5ms         280±10ms    15.33  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 70000)
+         111±2ms       1.44±0.01s    13.04  hash_functions.IsinWithArange.time_isin(<class 'object'>, 8000, 2)
+     1.85±0.05ms       23.9±0.3ms    12.92  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 8000)
+      95.4±0.8ms       1.22±0.01s    12.82  hash_functions.IsinWithArange.time_isin(<class 'object'>, 8000, -2)
+     1.95±0.05ms       23.8±0.6ms    12.22  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 8000)
+        345±20μs      4.04±0.05ms    11.71  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 1300)
+     1.64±0.02ms       19.1±0.2ms    11.66  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 7000)
+     1.68±0.02ms       18.9±0.3ms    11.25  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 7000)
+         527±6μs      5.58±0.07ms    10.58  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 2000)
+        332±10μs      3.49±0.04ms    10.53  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 1300)
+         541±9μs      5.43±0.09ms    10.05  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 2000)
+      66.5±0.5ms          466±3ms     7.01  hash_functions.IsinWithArange.time_isin(<class 'object'>, 1000, 2)
+        72.8±1ms          424±2ms     5.83  hash_functions.IsinWithArange.time_isin(<class 'object'>, 1000, -2)
+         108±9ms          589±4ms     5.44  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, -2)
+         143±9ms          657±6ms     4.59  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, 2)
+      6.57±0.1ms       16.2±0.3ms     2.46  join_merge.Merge.time_merge_dataframe_integer_2key(False)
+      23.0±0.2ms      48.9±0.08ms     2.13  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 2)
+      16.3±0.2ms       34.0±0.4ms     2.09  join_merge.Merge.time_merge_dataframe_integer_2key(True)
+      24.6±0.1ms       50.2±0.4ms     2.04  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 2)
+      34.0±0.5ms       43.3±0.4ms     1.27  io.csv.ReadCSVConcatDatetime.time_read_csv
+        96.8±2ms          118±1ms     1.22  multiindex_object.GetLoc.time_large_get_loc
+      1.71±0.2μs       2.09±0.3μs     1.22  index_cached_properties.IndexCache.time_inferred_type('PeriodIndex')
+         114±1ms          135±1ms     1.19  multiindex_object.GetLoc.time_large_get_loc_warm
+         228±3μs          260±8μs     1.14  timeseries.DatetimeIndex.time_unique('dst')
+        613±30ns         693±30ns     1.13  index_cached_properties.IndexCache.time_inferred_type('Int64Index')
+     7.97±0.09ms       8.99±0.8ms     1.13  inference.ToNumericDowncast.time_downcast('int32', 'signed')
+      6.37±0.4μs       7.02±0.6μs     1.10  index_cached_properties.IndexCache.time_engine('DatetimeIndex')
-     4.95±0.08ms      4.50±0.03ms     0.91  indexing.NonNumericSeriesIndexing.time_getitem_list_like('period', 'non_monotonic')
-      28.6±0.8ms       25.9±0.4ms     0.91  hash_functions.IsinWithArange.time_isin(<class 'object'>, 8000, 0)
-     3.25±0.05ms      2.94±0.01ms     0.90  stat_ops.FrameMultiIndexOps.time_op(0, 'prod')
-        65.9±3ms       59.5±0.5ms     0.90  timedelta.ToTimedeltaErrors.time_convert('ignore')
-      1.88±0.2μs       1.70±0.1μs     0.90  index_cached_properties.IndexCache.time_values('PeriodIndex')
-      3.74±0.5μs       3.37±0.2μs     0.90  index_cached_properties.IndexCache.time_values('CategoricalIndex')
-        11.3±1μs       10.2±0.8μs     0.90  index_cached_properties.IndexCache.time_engine('TimedeltaIndex')
-     3.08±0.02ms      2.77±0.06ms     0.90  stat_ops.FrameMultiIndexOps.time_op(0, 'mean')
-      7.27±0.2ms       6.52±0.1ms     0.90  algorithms.Duplicated.time_duplicated(False, 'first', 'float')
-         155±2ms        139±0.6ms     0.90  gil.ParallelGroupbyMethods.time_loop(8, 'var')
-      3.97±0.3μs       3.56±0.2μs     0.90  index_cached_properties.IndexCache.time_values('Float64Index')
-     1.98±0.09ms      1.76±0.06ms     0.89  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-         387±2ms          345±1ms     0.89  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 5000000)
-      32.6±0.3ms       29.1±0.8ms     0.89  arithmetic.IrregularOps.time_add
-         390±3ms          348±2ms     0.89  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 5000000)
-     16.2±0.09ms       14.4±0.2ms     0.89  categoricals.Isin.time_isin_categorical('object')
-     3.85±0.09ms      3.43±0.03ms     0.89  timeseries.DatetimeIndex.time_add_timedelta('tz_aware')
-        67.8±5ms       60.2±0.3ms     0.89  timedelta.ToTimedeltaErrors.time_convert('coerce')
-      68.5±0.6ms       60.7±0.4ms     0.89  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 750000)
-        66.6±2ms       58.9±0.6ms     0.88  gil.ParallelGroupbyMethods.time_loop(4, 'prod')
-      16.4±0.1ms       14.5±0.3ms     0.88  hash_functions.UniqueAndFactorizeArange.time_factorize(7)
-      8.63±0.4μs      7.62±0.09μs     0.88  tslibs.timestamp.TimestampProperties.time_is_quarter_end(None, 'B')
-       133±0.9ms          117±2ms     0.88  gil.ParallelGroupbyMethods.time_loop(8, 'prod')
-        70.4±3ms       62.0±0.9ms     0.88  gil.ParallelGroupbyMethods.time_loop(4, 'max')
-      16.3±0.4ms       14.4±0.3ms     0.88  hash_functions.UniqueAndFactorizeArange.time_factorize(14)
-         133±1ms          117±3ms     0.88  gil.ParallelGroupbyMethods.time_loop(8, 'last')
-      16.3±0.2ms       14.3±0.4ms     0.88  hash_functions.UniqueAndFactorizeArange.time_factorize(9)
-        67.3±1ms       59.1±0.5ms     0.88  gil.ParallelGroupbyMethods.time_loop(4, 'min')
-         147±2ms          130±1ms     0.88  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'object'>, 20)
-      14.9±0.4ms       13.1±0.2ms     0.88  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.int64'>, 19)
-      60.4±0.3ms       52.9±0.6ms     0.88  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 20)
-      17.4±0.4ms       15.3±0.1ms     0.88  groupby.MultiColumn.time_col_select_numpy_sum
-        29.3±1ms       25.6±0.2ms     0.87  groupby.Groups.time_series_groups('int64_small')
-         134±2ms          117±2ms     0.87  gil.ParallelGroupbyMethods.time_loop(8, 'sum')
-         133±2ms          116±2ms     0.87  gil.ParallelGroupbyMethods.time_loop(8, 'mean')
-      9.54±0.2ms       8.32±0.2ms     0.87  algorithms.Factorize.time_factorize(False, True, 'uint')
-      33.8±0.8ms       29.4±0.3ms     0.87  gil.ParallelGroupbyMethods.time_loop(2, 'prod')
-      7.71±0.2ms       6.71±0.1ms     0.87  timeseries.ResampleSeries.time_resample('datetime', '5min', 'ohlc')
-      26.1±0.4ms       22.7±0.3ms     0.87  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 19)
-        40.2±1ms       34.8±0.4ms     0.87  gil.ParallelGroupbyMethods.time_loop(2, 'var')
-     1.47±0.02ms      1.27±0.02ms     0.86  timeseries.DatetimeIndex.time_normalize('tz_naive')
-      2.85±0.2ms      2.46±0.06ms     0.86  stat_ops.SeriesMultiIndexOps.time_op(0, 'mean')
-     16.5±0.09ms       14.3±0.2ms     0.86  hash_functions.UniqueAndFactorizeArange.time_factorize(5)
-     1.79±0.01ms      1.54±0.01ms     0.86  timeseries.DatetimeAccessor.time_dt_accessor_normalize('UTC')
-     1.45±0.03ms      1.25±0.01ms     0.86  timeseries.DatetimeIndex.time_normalize('repeated')
-         199±4ms          171±4ms     0.86  sparse.SparseSeriesToFrame.time_series_to_frame
-      16.5±0.1ms       14.2±0.2ms     0.86  hash_functions.UniqueAndFactorizeArange.time_factorize(13)
-      34.2±0.3ms       29.4±0.4ms     0.86  gil.ParallelGroupbyMethods.time_loop(2, 'max')
-      16.5±0.4ms       14.2±0.5ms     0.86  hash_functions.UniqueAndFactorizeArange.time_factorize(11)
-      16.4±0.3ms       14.1±0.4ms     0.86  hash_functions.UniqueAndFactorizeArange.time_factorize(15)
-      6.87±0.2ms       5.88±0.2ms     0.86  stat_ops.FrameMultiIndexOps.time_op([0, 1], 'prod')
-     1.45±0.03ms      1.24±0.04ms     0.85  series_methods.ValueCounts.time_value_counts('int')
-      34.0±0.4ms       29.0±0.4ms     0.85  gil.ParallelGroupbyMethods.time_loop(2, 'min')
-      16.7±0.5ms       14.3±0.1ms     0.85  hash_functions.UniqueAndFactorizeArange.time_factorize(12)
-        68.9±1ms       58.7±0.4ms     0.85  gil.ParallelGroupbyMethods.time_loop(4, 'mean')
-      16.8±0.6ms       14.3±0.5ms     0.85  hash_functions.UniqueAndFactorizeArange.time_factorize(8)
-      6.76±0.2ms       5.73±0.3ms     0.85  stat_ops.FrameMultiIndexOps.time_op([0, 1], 'mean')
-        29.0±1ms       24.6±0.6ms     0.85  reindex.DropDuplicates.time_frame_drop_dups_int(False)
-     2.92±0.07ms       2.48±0.1ms     0.85  stat_ops.SeriesMultiIndexOps.time_op(1, 'sum')
-      4.26±0.1ms       3.61±0.1ms     0.85  algorithms.Duplicated.time_duplicated(False, 'first', 'int')
-         740±5ms          626±3ms     0.85  join_merge.I8Merge.time_i8merge('right')
-         246±6μs          208±7μs     0.85  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 500000)
-     2.69±0.07ms      2.27±0.08ms     0.85  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 500000)
-        33.9±1ms       28.7±0.3ms     0.84  gil.ParallelGroupbyMethods.time_loop(2, 'sum')
-      7.89±0.4ms       6.66±0.2ms     0.84  algorithms.Factorize.time_factorize(True, True, 'datetime64[ns]')
-     2.88±0.09ms      2.42±0.05ms     0.84  stat_ops.SeriesMultiIndexOps.time_op(0, 'sum')
-      5.77±0.5μs       4.86±0.4μs     0.84  index_cached_properties.IndexCache.time_shape('UInt64Index')
-      29.1±0.3ms       24.5±0.3ms     0.84  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 19)
-     3.39±0.08ms      2.84±0.03ms     0.84  timeseries.DatetimeIndex.time_unique('tz_naive')
-      70.8±0.9ms         59.1±1ms     0.84  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 20)
-      9.32±0.4ms       7.75±0.4ms     0.83  algorithms.Factorize.time_factorize(False, False, 'datetime64[ns]')
-     4.08±0.02ms      3.39±0.04ms     0.83  algorithms.Duplicated.time_duplicated(False, 'first', 'uint')
-         722±4ms          598±5ms     0.83  join_merge.I8Merge.time_i8merge('left')
-      4.26±0.2ms       3.53±0.2ms     0.83  algorithms.Duplicated.time_duplicated(False, 'last', 'int')
-         700±5ms          578±6ms     0.83  join_merge.I8Merge.time_i8merge('outer')
-      12.3±0.6ms       10.1±0.5ms     0.82  hash_functions.UniqueAndFactorizeArange.time_unique(5)
-      10.4±0.1ms       8.55±0.1ms     0.82  timeseries.ResampleSeries.time_resample('period', '5min', 'ohlc')
-      4.21±0.4μs       3.46±0.2μs     0.82  index_cached_properties.IndexCache.time_inferred_type('TimedeltaIndex')
-      12.5±0.4ms       10.3±0.5ms     0.82  hash_functions.UniqueAndFactorizeArange.time_unique(9)
-      12.5±0.3ms       10.3±0.6ms     0.82  hash_functions.UniqueAndFactorizeArange.time_unique(6)
-      12.4±0.6ms       10.1±0.6ms     0.82  hash_functions.UniqueAndFactorizeArange.time_unique(7)
-     3.34±0.09ms      2.73±0.02ms     0.82  timeseries.DatetimeIndex.time_unique('tz_local')
-      7.91±0.6ms       6.47±0.4ms     0.82  algorithms.Duplicated.time_duplicated(False, False, 'datetime64[ns]')
-      12.4±0.5ms       10.1±0.5ms     0.82  hash_functions.UniqueAndFactorizeArange.time_unique(8)
-         104±3ms         84.8±2ms     0.82  join_merge.Align.time_series_align_int64_index
-         711±3ms          577±4ms     0.81  join_merge.I8Merge.time_i8merge('inner')
-     2.72±0.05ms      2.20±0.04ms     0.81  series_methods.IsInForObjects.time_isin_long_series_short_values
-      12.3±0.4ms       9.99±0.3ms     0.81  hash_functions.UniqueAndFactorizeArange.time_unique(11)
-      12.4±0.6ms       10.0±0.5ms     0.81  hash_functions.UniqueAndFactorizeArange.time_unique(4)
-      12.4±0.5ms       10.0±0.3ms     0.81  hash_functions.UniqueAndFactorizeArange.time_unique(10)
-      12.3±0.5ms       9.92±0.2ms     0.81  hash_functions.UniqueAndFactorizeArange.time_unique(13)
-      12.4±0.4ms       9.99±0.2ms     0.81  hash_functions.UniqueAndFactorizeArange.time_unique(12)
-      5.10±0.6ms      4.11±0.07ms     0.81  indexing.IntervalIndexing.time_getitem_scalar
-     6.23±0.06ms      5.02±0.08ms     0.81  algorithms.Factorize.time_factorize(False, False, 'uint')
-      9.38±0.3ms       7.51±0.2ms     0.80  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 1000000)
-      9.83±0.5ms       7.83±0.4ms     0.80  algorithms.Factorize.time_factorize(False, False, 'datetime64[ns, tz]')
-      3.02±0.1ms      2.40±0.03ms     0.80  stat_ops.SeriesMultiIndexOps.time_op(1, 'mean')
-      12.5±0.6ms       9.88±0.2ms     0.79  hash_functions.UniqueAndFactorizeArange.time_unique(15)
-        72.5±2μs       57.4±0.7μs     0.79  indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      12.5±0.9ms       9.92±0.3ms     0.79  hash_functions.UniqueAndFactorizeArange.time_unique(14)
-      8.22±0.3ms       6.51±0.2ms     0.79  algorithms.Duplicated.time_duplicated(False, False, 'datetime64[ns, tz]')
-     3.43±0.04ms      2.71±0.02ms     0.79  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      90.5±0.9ms         71.5±2ms     0.79  join_merge.Align.time_series_align_left_monotonic
-      4.24±0.4μs       3.35±0.1μs     0.79  index_cached_properties.IndexCache.time_values('TimedeltaIndex')
-     1.45±0.03ms      1.14±0.03ms     0.79  algorithms.Factorize.time_factorize(True, True, 'boolean')
-     6.96±0.09ms       5.43±0.1ms     0.78  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 0)
-      3.91±0.7μs       3.04±0.2μs     0.78  index_cached_properties.IndexCache.time_shape('DatetimeIndex')
-      7.48±0.3ms       5.83±0.2ms     0.78  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, 0)
-         154±3μs          120±1μs     0.78  timedelta.TimedeltaIndexing.time_unique
-      9.42±0.1ms      7.30±0.06ms     0.77  index_object.SetOperations.time_operation('datetime', 'symmetric_difference')
-      4.52±0.2ms       3.47±0.2ms     0.77  algorithms.Duplicated.time_duplicated(False, False, 'int')
-      88.7±0.7ms         68.2±1ms     0.77  multiindex_object.Duplicated.time_duplicated
-         594±5μs         455±20μs     0.77  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-         136±2ms          104±2ms     0.77  frame_methods.Duplicated.time_frame_duplicated
-         115±2ms         87.3±1ms     0.76  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 900000)
-        514±10μs         390±10μs     0.76  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 1000000)
-     5.70±0.07ms       4.31±0.1ms     0.75  algorithms.Factorize.time_factorize(False, False, 'boolean')
-      6.82±0.2ms      5.14±0.09ms     0.75  algorithms.Factorize.time_factorize(False, True, 'boolean')
-      5.04±0.1ms      3.80±0.08ms     0.75  algorithms.Factorize.time_factorize(True, False, 'datetime64[ns, tz]')
-      5.16±0.2ms      3.85±0.07ms     0.75  algorithms.Factorize.time_factorize(True, False, 'datetime64[ns]')
-         525±3μs          390±4μs     0.74  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-        85.4±1ms       63.4±0.5ms     0.74  series_methods.IsInFloat64.time_isin_many_different
-     6.37±0.08ms      4.69±0.09ms     0.74  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, -2)
-      7.23±0.7ms      5.31±0.08ms     0.73  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 0)
-         135±1ms         98.6±1ms     0.73  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 1000000)
-     1.19±0.04ms         864±10μs     0.73  algorithms.Factorize.time_factorize(True, False, 'boolean')
-         124±2ms       90.2±0.9ms     0.73  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 900000)
-       157±0.8μs          114±2μs     0.73  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     5.89±0.06ms       4.25±0.1ms     0.72  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, 0)
-     5.59±0.03ms      4.03±0.04ms     0.72  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 0)
-     2.99±0.05ms      2.15±0.02ms     0.72  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     4.49±0.05ms      3.23±0.04ms     0.72  algorithms.Duplicated.time_duplicated(False, False, 'uint')
-         583±3ms          412±6ms     0.71  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)
-      47.8±0.3ms       33.8±0.3ms     0.71  series_methods.IsInFloat64.time_isin_few_different
-      47.8±0.4ms       33.7±0.2ms     0.71  series_methods.IsInFloat64.time_isin_nan_values
-     5.67±0.06ms      3.99±0.05ms     0.70  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 0)
-         590±2ms          409±5ms     0.69  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)
-        981±30μs         649±10μs     0.66  timeseries.DatetimeIndex.time_unique('repeated')
-     4.99±0.04ms      3.29±0.07ms     0.66  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, -2)
-        8.27±1μs       5.36±0.4μs     0.65  index_cached_properties.IndexCache.time_shape('TimedeltaIndex')
-      22.5±0.3ms       14.3±0.2ms     0.63  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, 2)
-      12.7±0.3ms      7.98±0.05ms     0.63  index_object.IntervalIndexMethod.time_intersection_one_duplicate(100000)
-      23.4±0.7ms       14.3±0.1ms     0.61  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, -2)
-      21.3±0.3ms      12.8±0.05ms     0.60  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, 2)
-      21.4±0.2ms      12.8±0.06ms     0.60  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, -2)
-      10.7±0.2ms       6.29±0.1ms     0.59  index_object.IntervalIndexMethod.time_intersection(100000)
-      26.1±0.5ms      6.69±0.07ms     0.26  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 2)
-      26.2±0.5ms       6.60±0.4ms     0.25  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, -2)
-      24.3±0.4ms       5.52±0.3ms     0.23  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 2)
-      24.6±0.3ms      5.08±0.02ms     0.21  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, -2)

@realead realead reopened this Oct 14, 2020
@realead
Copy link
Contributor Author

realead commented Oct 14, 2020

@jreback @jbrockmendel

The above long investigation in a nutshell:

  1. no brainer: a stronger second hash (18cf44a) is needed. This alone would fix this bug, but possible not problems for other series.

  2. Are stronger first hashes for float64/(u)int64/PyObject needed? The currently used weak hashes have some advantages: less collisions, fast to calculate, getting best out of the cache. However, they also have intristic risk of catastrophic performance degradation. float64 and int64 are basically the same hash-function, however due to bit-pattern of double-representation we quite easily get problematic "series" with float64, which are "unusual" as int64.

I think a stronger hash would be good for float64 (this PR) and PyObject (not (yet) this PR). Probably for the sake of consistency it also should be done for (u)int64 (the impact is not yet investigated). This trade-off is however not as clear cut as the first.

  1. Using combined probing strategy has advantages, so it is worth considering to switch to it.

  2. Switching to quadratic probing (Klib upgrade (factorizing performance increase) #8524, khash 0.2.8) makes only sense when strong first hash is used for all types (all above PyObject).

Now, it is your call...

@jreback
Copy link
Contributor

jreback commented Oct 15, 2020

@jreback @jbrockmendel

The above long investigation in a nutshell:

  1. no brainer: a stronger second hash (18cf44a) is needed. This alone would fix this bug, but possible not problems for other series.
  2. Are stronger first hashes for float64/(u)int64/PyObject needed? The currently used weak hashes have some advantages: less collisions, fast to calculate, getting best out of the cache. However, they also have intristic risk of catastrophic performance degradation. float64 and int64 are basically the same hash-function, however due to bit-pattern of double-representation we quite easily get problematic "series" with float64, which are "unusual" as int64.

I think a stronger hash would be good for float64 (this PR) and PyObject (not (yet) this PR). Probably for the sake of consistency it also should be done for (u)int64 (the impact is not yet investigated). This trade-off is however not as clear cut as the first.

  1. Using combined probing strategy has advantages, so it is worth considering to switch to it.
  2. Switching to quadratic probing (Klib upgrade (factorizing performance increase) #8524, khash 0.2.8) makes only sense when strong first hash is used for all types (all above PyObject).

Now, it is your call...

can you comment on the implementation complexity / maintability of these strategies as you have outlined. meaning, assume they are well-documented is any one significantly more complex than the others? e.g. the next thing we need to investigate, how approachable is the code (of course will have this PR for refernce as well). or are they all 'similar' in code complexity?

@realead
Copy link
Contributor Author

realead commented Oct 15, 2020

@jreback

sorry, formatting of a previous comment went wrong, there are the needed code changes for different probing strategies:

I would say not really much is going on code-complexity wise: linear probing and quadratic probing are slightly simpler than the current state, combined approach adds a little bit of complexity, as there is an additional parameter (how many linear steps) but nothing crucial.

Using stronger hash functions for PyObject/(u)int64 would look a lot like this change: 4e89fce#diff-7ac30c345bd6d38838a46337e4c6b5b6feae3e1fd5aea54b5d9e37d20054edf5R29

@realead realead force-pushed the gh_28303_float_hash branch from efae65c to dad1ede Compare October 16, 2020 20:35
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, some doc requests.

@jbrockmendel @TomAugspurger

doc/source/whatsnew/v1.2.0.rst Outdated Show resolved Hide resolved
pandas/_libs/src/klib/khash_python.h Show resolved Hide resolved
@realead
Copy link
Contributor Author

realead commented Oct 26, 2020

@jreback docs are updated.

@jbrockmendel @TomAugspurger bump:)

In the meantime I took a look at the implementation of CPython's set and dict (and also notes) there is a lot of great insights!

I have ported their approach to khash (realead@160fcf3): it would solve the issue at hand (#28303)and seems to do better on average than this PR (see the timings at the end of this post). If you like, I could rather submit the other approach instead of this PR.

Overview of pro and cons:

                                                      Pro                                             Con
Master:                                   status quo                                         float64 is problematic, because higher hash bits aren't taken into account
This PR:                                 most robust of three                          performance hit for some scenarios 
CPython's approach:             quite robust, same performance        less robust than this PR (first hash weak)
                                              as master  on average  but better
                                              for bad corner cases   

Here is a selection of the most important/relevant (imo) insights for khash and this PR:

1.Quote: "Use cases for sets differ considerably from dictionaries where looked-up keys are more likely to be present. In contrast, sets are primarily about membership testing where the presence of an element is not known in advance. Accordingly, the set implementation needs to optimize for both the found and not-found case."

After reading this, a shortcomming of the current use of khash becomes obvious: the same hash-table is used as dictionary (Index) and as set (unique, isin), which makes optimization harder (and I don't even talk about additional memory wasted in unique and isin).

This is also the expirience I've made: optimizing for the look-up in the case of "key is contained" hurts the performance of the case "key is not contained" and vice versa: for the "key is contained"-case, the most important part is not to have hash-collisions - having long chains of occupied buckets isn't a problem at all if there are no collisions. For the "key is not contained" it is better to take some collisions but not too have long chains of occupied buckets, which might mean catastrophic performance degradation.

  1. dict uses a quite ingenious probing stategy (rationale in much detail can be found here):
size_t perturb = (size_t)hash;
...
perturb >>= PERTURB_SHIFT;
i = mask & (i*5 + perturb + 1);

Why is it better than double hashing?

Double hashing has a problem: if the second hash (modulo size of table) is 1,2,3,4, or another small number: it takes many steps to traverse a large chain of occupied buckets. The chances for such a bad hash aren't very high (smaller than .1% for 10^6 elements), but if they happen it really hurts performance. Due to the 5*i-part, the step in the CPython's dict doesn't stay constant, thus (similar to quadratic probing) such "occupied chains" are traversed much faster.

Why is it better than quadratic probing?

The problem with quadratic probing is that it doesn't take higher bits of the hash into consideration, thus making the hash weaker as it is and leading to more hash-collision.

Why is it better than combined probing?

Theoretically, utilizing cache better should be a good thing, however the CPython people made the experience, that this actually hurts the performance for the scenarios in which maps are used. I have made the same experience (to my great surprise!), this is the reason why in my "combined probing" implementation, I had used only 3 linear steps - after that the performance in map-scenarios started to detoriate.

  1. CPython's set uses linear steps, utilizing cache before jumping somewhere. Here, this really improves the performance (just a reminder of the first great insight: "Use cases for sets differ considerably from dictionaries..."

  2. With all that in mind, one should also be aware, that the scenarios for Python's dict (small dictionary are quite important) and pandas hash-tables(small dictionaries/set don't play a big rolle - the python-overhead will overshadow some additional ops) are different, so different trade off might make sense.


Timings CPython's probing (realead@160fcf3, after) vs this PR (before):

       before           after         ratio
     [11d97319]       [160fcf32]
+     2.30±0.05ms       3.68±0.3ms     1.60  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
+      12.6±0.6ms         16.6±2ms     1.32  hash_functions.UniqueAndFactorizeArange.time_unique(8)
+        27.7±2ms       35.6±0.4ms     1.28  hash_functions.Float64GroupIndex.time_groupby
+        18.4±1ms       23.3±0.9ms     1.27  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
+      16.4±0.2ms         20.4±1ms     1.25  hash_functions.UniqueAndFactorizeArange.time_factorize(10)
+     2.09±0.06ms      2.57±0.07ms     1.23  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
+        10.3±1μs         12.5±1μs     1.21  index_cached_properties.IndexCache.time_engine('TimedeltaIndex')
+      16.5±0.2ms       20.0±0.6ms     1.21  hash_functions.UniqueAndFactorizeArange.time_factorize(8)
+         482±4ms         561±10ms     1.16  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 750000)
+      1.42±0.1ms       1.64±0.3ms     1.15  index_cached_properties.IndexCache.time_engine('MultiIndex')
+         538±8ms         619±10ms     1.15  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 750000)
+        644±20ms         735±40ms     1.14  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 900000)
+      1.92±0.2μs       2.16±0.3μs     1.13  index_cached_properties.IndexCache.time_values('DatetimeIndex')
+      11.0±0.7μs         12.3±1μs     1.13  index_cached_properties.IndexCache.time_engine('UInt64Index')
-      14.5±0.6ms       13.0±0.4ms     0.90  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.uint64'>, 19)
-         311±4ms          278±4ms     0.89  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-     2.57±0.06ms      2.27±0.03ms     0.88  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 16)
-     1.37±0.02ms      1.20±0.03ms     0.88  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 15)
-         327±6ms          283±3ms     0.86  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-      5.72±0.1ms       4.82±0.1ms     0.84  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 17)
-     1.47±0.03ms      1.23±0.03ms     0.84  timeseries.DatetimeIndex.time_normalize('tz_naive')
-     2.73±0.06ms      2.29±0.02ms     0.84  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 16)
-        739±10μs          618±6μs     0.84  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 14)
-         116±3ms       97.4±0.8ms     0.84  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 900000)
-      24.8±0.7ms       20.7±0.1ms     0.83  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, -2)
-        736±40μs          612±6μs     0.83  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 8000)
-         124±3ms          103±1ms     0.83  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 900000)
-      71.7±0.7ms       59.5±0.9ms     0.83  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 750000)
-      2.42±0.1ms      1.99±0.04ms     0.82  timeseries.DatetimeIndex.time_add_timedelta('tz_naive')
-      24.3±0.7ms       19.9±0.3ms     0.82  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, 2)
-         252±4ms          206±6ms     0.82  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-         589±8ms          481±6ms     0.82  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)
-         250±7ms          203±5ms     0.81  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     3.57±0.07ms      2.88±0.03ms     0.81  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'numpy.float64'>, 70000)
-        29.0±4ms       23.2±0.2ms     0.80  hash_functions.IsinWithArange.time_isin(<class 'object'>, 1000, 0)
-      24.3±0.4ms       19.1±0.2ms     0.78  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 2)
-      3.41±0.6ms      2.68±0.03ms     0.78  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.int64'>, 100000)
-        71.9±7ms       55.8±0.4ms     0.78  hash_functions.IsinWithArange.time_isin(<class 'object'>, 1000, 2)
-        25.4±1ms       19.7±0.3ms     0.78  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, -2)
-     1.25±0.03ms         970±40μs     0.78  algorithms.Factorize.time_factorize(True, False, 'boolean')
-        23.9±3ms       18.3±0.1ms     0.77  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, -2)
-      24.2±0.6ms       18.4±0.2ms     0.76  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, 2)
-      26.3±0.3ms       19.9±0.4ms     0.76  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 2)
-      26.0±0.5ms       19.6±0.4ms     0.75  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, -2)
-      16.2±0.4ms       12.0±0.3ms     0.74  hash_functions.UniqueAndFactorizeArange.time_factorize(15)
-        25.9±1ms      19.0±0.05ms     0.73  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, 2)
-        17.2±1ms       12.6±0.9ms     0.73  hash_functions.UniqueAndFactorizeArange.time_factorize(14)
-      25.0±0.4ms       18.2±0.2ms     0.73  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, 2)
-      25.8±0.7ms       18.7±0.1ms     0.72  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, -2)
-        75.1±1ms       54.0±0.6ms     0.72  hash_functions.IsinWithArange.time_isin(<class 'object'>, 1000, -2)
-        28.3±2ms       20.2±0.3ms     0.71  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, -2)
-      25.8±0.5ms       18.2±0.4ms     0.71  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 19)
-        7.75±2ms      5.48±0.06ms     0.71  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 0)
-        47.9±1ms         33.9±1ms     0.71  series_methods.IsInFloat64.time_isin_nan_values
-      1.08±0.1ms         761±20μs     0.70  timeseries.DatetimeIndex.time_unique('repeated')
-        62.5±3ms         43.4±1ms     0.69  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 20)
-     5.74±0.08ms      3.95±0.09ms     0.69  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 100000)
-      24.1±0.5ms       16.6±0.6ms     0.69  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, -2)
-      7.82±0.6ms      5.38±0.04ms     0.69  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 0)
-      25.3±0.6ms       17.4±0.3ms     0.69  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 2)
-        48.5±3ms       33.2±0.5ms     0.69  series_methods.IsInFloat64.time_isin_few_different
-        75.2±3μs         50.9±3μs     0.68  indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     5.99±0.05ms      4.03±0.09ms     0.67  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 0)
-        361±60μs          240±3μs     0.67  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 2000)
-       707±200μs         471±10μs     0.67  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.uint64'>, 14)
-        28.0±6ms      18.3±0.09ms     0.65  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, 2)
-      12.3±0.5ms       7.95±0.4ms     0.65  hash_functions.UniqueAndFactorizeArange.time_unique(15)
-      12.4±0.4ms       7.90±0.2ms     0.64  hash_functions.UniqueAndFactorizeArange.time_unique(14)
-      5.29±0.2ms      3.31±0.09ms     0.63  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, -2)
-      16.5±0.6ms       10.3±0.5ms     0.62  hash_functions.UniqueAndFactorizeArange.time_factorize(6)
-     2.87±0.02ms      1.77±0.04ms     0.62  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      3.14±0.1ms      1.93±0.07ms     0.62  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      9.22±0.9ms      5.66±0.09ms     0.61  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 8000, 0)
-      17.0±0.5ms       10.3±0.2ms     0.60  hash_functions.UniqueAndFactorizeArange.time_factorize(5)
-        5.39±2ms      3.24±0.04ms     0.60  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 80000)
-         100±2ms         59.1±1ms     0.59  hash_functions.IsinWithArange.time_isin(<class 'object'>, 8000, -2)
-        876±90μs          515±4μs     0.59  hash_functions.IsinWithRandomFloat.time_isin(<class 'numpy.float64'>, 7000)
-        511±20μs          301±8μs     0.59  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 1000000)
-        501±10μs         290±10μs     0.58  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      12.3±0.5ms       6.80±0.1ms     0.55  hash_functions.UniqueAndFactorizeArange.time_unique(4)
-         162±3μs         87.3±3μs     0.54  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-        94.7±6ms         51.0±3ms     0.54  series_methods.IsInFloat64.time_isin_many_different
-      12.3±0.4ms       6.50±0.1ms     0.53  hash_functions.UniqueAndFactorizeArange.time_unique(5)
-        118±10ms       62.0±0.8ms     0.53  hash_functions.IsinWithArange.time_isin(<class 'object'>, 8000, 2)
-      12.5±0.4ms      6.49±0.04ms     0.52  hash_functions.UniqueAndFactorizeArange.time_unique(6)
-        9.11±3ms       4.66±0.1ms     0.51  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, -2)
-        8.67±1ms      4.28±0.02ms     0.49  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 8000, 0)
-         116±8ms       54.6±0.4ms     0.47  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, -2)
-         137±9ms         60.0±2ms     0.44  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 1000000)
-         148±6ms       56.8±0.7ms     0.38  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, 2)
-        92.3±3ms       9.01±0.3ms     0.10  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 1000000)
-       96.1±20ms         338±20μs     0.00  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')

Timings CPython's probing (realead@160fcf3, after) vs master (before):

       before           after         ratio
     [9cb37237]       [160fcf32]

+        94.4±2μs         137±40μs     1.45  index_cached_properties.IndexCache.time_engine('RangeIndex')
+       206±0.7μs         272±80μs     1.32  timeseries.DatetimeIndex.time_unique('dst')
+     1.81±0.02ms       2.28±0.1ms     1.26  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+      7.28±0.2ms      9.06±0.05ms     1.25  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 18)
+         265±1μs         329±30μs     1.24  join_merge.JoinNonUnique.time_join_non_unique_equal
+      6.65±0.3μs       7.92±0.7μs     1.19  index_cached_properties.IndexCache.time_shape('TimedeltaIndex')
+     2.18±0.03ms       2.60±0.1ms     1.19  series_methods.IsInForObjects.time_isin_long_series_short_values
+     3.80±0.07ms      4.51±0.04ms     1.19  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 17)
+      6.57±0.5ms       7.77±0.3ms     1.18  algorithms.Duplicated.time_duplicated(False, False, 'datetime64[ns, tz]')
+     3.56±0.03μs       4.19±0.5μs     1.18  tslibs.fields.TimeGetStartEndField.time_get_start_end_field(1, 'end', 'month', None, 12)
+      43.2±0.5ms         50.7±3ms     1.17  series_methods.IsInFloat64.time_isin_many_different
+     10.9±0.06ms       12.8±0.1ms     1.17  categoricals.Isin.time_isin_categorical('int64')
+     1.89±0.02ms      2.21±0.02ms     1.17  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 16)
+     6.18±0.05ms       7.18±0.6ms     1.16  rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'float', 'sum')
+         273±4μs          315±8μs     1.15  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+     5.96±0.09ms       6.85±0.7ms     1.15  rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'int', 'kurt')
+      14.1±0.1ms       15.9±0.2ms     1.13  categoricals.Isin.time_isin_categorical('object')
+     6.15±0.04ms       6.91±0.7ms     1.12  rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'float', 'std')
+      2.12±0.1μs       2.37±0.1μs     1.12  index_cached_properties.IndexCache.time_is_all_dates('Int64Index')
+      23.3±0.6ms       26.0±0.5ms     1.12  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 19)
+      12.7±0.8μs       14.1±0.8μs     1.11  index_cached_properties.IndexCache.time_is_all_dates('Float64Index')
+         315±4μs         348±10μs     1.10  reindex.Fillna.time_reindexed('pad')
+     1.59±0.03ms      1.75±0.06ms     1.10  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+     6.21±0.05ms       6.80±0.2ms     1.10  rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'float', 'median')
+     2.17±0.09μs       2.38±0.1μs     1.10  index_cached_properties.IndexCache.time_is_all_dates('RangeIndex')
+     1.58±0.01ms      1.72±0.02ms     1.09  dtypes.SelectDtypes.time_select_dtype_int_exclude('UInt32')
+     3.01±0.02ms       3.28±0.3ms     1.09  timeseries.DatetimeIndex.time_unique('tz_local')
+     1.67±0.01ms       1.82±0.1ms     1.09  series_methods.IsIn.time_isin('uint64')
+     2.17±0.03ms      2.37±0.06ms     1.09  rolling.ForwardWindowMethods.time_rolling('DataFrame', 10, 'float', 'mean')
+      16.6±0.1ms       18.1±0.7ms     1.09  join_merge.MergeAsof.time_on_int32('forward', 5)
+        900±50μs         980±30μs     1.09  ctors.SeriesConstructors.time_series_constructor(<function list_of_str at 0x7f437bf9f9d0>, True, 'int')
+     1.58±0.02ms      1.71±0.07ms     1.09  dtypes.SelectDtypes.time_select_dtype_int_exclude('float32')
+        833±10μs         905±30μs     1.09  dtypes.SelectDtypes.time_select_dtype_string_include(<class 'float'>)
+      12.9±0.5μs       14.0±0.6μs     1.09  index_cached_properties.IndexCache.time_is_all_dates('IntervalIndex')
+     5.20±0.03ms       5.62±0.1ms     1.08  timeseries.ToDatetimeCache.time_unique_seconds_and_unit(True)
+     8.53±0.03μs       9.18±0.3μs     1.08  tslibs.normalize.Normalize.time_normalize_i8_timestamps(1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+     4.30±0.04μs       4.62±0.1μs     1.08  categoricals.SearchSorted.time_categorical_contains
+     4.33±0.04ms      4.66±0.06ms     1.07  stat_ops.SeriesMultiIndexOps.time_op(0, 'median')
+        882±10μs         946±20μs     1.07  dtypes.SelectDtypes.time_select_dtype_string_include('int64')
+     1.41±0.01ms      1.50±0.07ms     1.07  stat_ops.FrameOps.time_op('prod', 'int', 0)
+     4.12±0.05μs       4.39±0.5μs     1.06  tslibs.timedelta.TimedeltaConstructor.time_from_np_timedelta
+         269±8ns          286±8ns     1.06  timedelta.DatetimeAccessor.time_dt_accessor
+         879±5μs         932±20μs     1.06  dtypes.SelectDtypes.time_select_dtype_bool_include('int64')
+     1.32±0.04μs       1.40±0.1μs     1.05  index_cached_properties.IndexCache.time_is_monotonic_decreasing('Int64Index')
-      7.42±0.9μs       7.02±0.5μs     0.95  index_cached_properties.IndexCache.time_engine('DatetimeIndex')
-         316±3ms        298±0.8ms     0.94  hash_functions.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 5000000)
-     1.10±0.01ms      1.04±0.01ms     0.94  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 15)
-      5.98±0.5μs       5.61±0.5μs     0.94  index_cached_properties.IndexCache.time_shape('CategoricalIndex')
-      12.9±0.5μs         12.1±1μs     0.93  index_cached_properties.IndexCache.time_engine('TimedeltaIndex')
-      3.89±0.2μs       3.59±0.2μs     0.92  index_cached_properties.IndexCache.time_inferred_type('TimedeltaIndex')
-      21.3±0.3ms       19.6±0.2ms     0.92  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 2)
-      20.1±0.1ms       18.4±0.1ms     0.92  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 2)
-         654±6ms          597±4ms     0.91  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 5000000)
-      5.12±0.2ms      4.60±0.09ms     0.90  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 17)
-     5.90±0.04ms      5.29±0.05ms     0.90  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, 0)
-     22.5±0.08ms       20.1±0.2ms     0.89  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, -2)
-        739±70ns         658±30ns     0.89  index_cached_properties.IndexCache.time_is_unique('RangeIndex')
-      6.02±0.2ms      5.35±0.08ms     0.89  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 0)
-        805±80ns         709±50ns     0.88  index_cached_properties.IndexCache.time_inferred_type('RangeIndex')
-      21.0±0.1ms       18.5±0.2ms     0.88  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, -2)
-      38.6±0.5ms       33.9±0.8ms     0.88  series_methods.IsInFloat64.time_isin_few_different
-     4.55±0.04ms      3.99±0.07ms     0.88  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 1000, 0)
-      5.23±0.1ms      4.57±0.08ms     0.87  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 1000, -2)
-     4.55±0.03ms      3.96±0.06ms     0.87  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 0)
-      2.17±0.2μs       1.87±0.2μs     0.86  index_cached_properties.IndexCache.time_inferred_type('DatetimeIndex')
-      5.34±0.5μs       4.54±0.2μs     0.85  index_cached_properties.IndexCache.time_shape('MultiIndex')
-      7.96±0.2ms       6.73±0.2ms     0.84  join_merge.Merge.time_merge_dataframe_integer_2key(False)
-        7.34±1μs       6.15±0.4μs     0.84  index_cached_properties.IndexCache.time_engine('PeriodIndex')
-      22.4±0.3ms       18.7±0.2ms     0.84  hash_functions.IsinWithArange.time_isin(<class 'numpy.uint64'>, 2000, 2)
-        710±10μs          589±6μs     0.83  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 14)
-      20.8±0.2ms       17.2±0.2ms     0.83  hash_functions.IsinWithArange.time_isin(<class 'numpy.int64'>, 2000, 2)
-      2.25±0.6μs       1.83±0.2μs     0.82  index_cached_properties.IndexCache.time_inferred_type('PeriodIndex')
-         487±6μs          392±7μs     0.80  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 13)
-      11.5±0.2ms       9.17±0.2ms     0.80  index_object.IntervalIndexMethod.time_intersection_one_duplicate(100000)
-         948±8μs         732±10μs     0.77  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 14)
-      71.3±0.5ms       54.9±0.3ms     0.77  hash_functions.IsinWithArange.time_isin(<class 'object'>, 1000, 2)
-     3.00±0.06ms       2.27±0.2ms     0.76  series_methods.ValueCounts.time_value_counts('float')
-     1.54±0.01ms      1.16±0.02ms     0.75  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 15)
-     2.91±0.04ms      2.17±0.04ms     0.74  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 16)
-      25.0±0.5ms       18.1±0.3ms     0.72  hash_functions.UniqueAndFactorizeArange.time_factorize(7)
-        392±30μs          279±5μs     0.71  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 12)
-      29.3±0.7ms       20.3±0.1ms     0.69  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, -2)
-         683±7μs         473±60μs     0.69  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 13)
-      86.9±0.6ms       56.8±0.9ms     0.65  hash_functions.IsinWithArange.time_isin(<class 'object'>, 8000, -2)
-      37.5±0.7ms         24.2±1ms     0.64  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 8000, 0)
-        306±10μs          197±2μs     0.64  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 11)
-      23.0±0.3ms       14.3±0.4ms     0.62  hash_functions.UniqueAndFactorizeArange.time_unique(7)
-      25.9±0.2ms       15.9±0.3ms     0.61  hash_functions.UniqueAndFactorizeArange.time_factorize(12)
-         264±3μs          159±2μs     0.61  hash_functions.IsinAlmostFullWithRandomInt.time_isin_outside(<class 'numpy.float64'>, 10)
-     1.24±0.04ms          740±8μs     0.60  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 8000)
-        89.6±2ms       53.4±0.4ms     0.60  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, -2)
-      34.9±0.6ms       18.8±0.1ms     0.54  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, 2)
-         113±9ms       60.8±0.4ms     0.54  hash_functions.IsinWithArange.time_isin(<class 'object'>, 8000, 2)
-      23.0±0.6ms       12.3±0.4ms     0.53  hash_functions.UniqueAndFactorizeArange.time_unique(12)
-         533±7μs          284±2μs     0.53  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 12)
-         112±1ms       55.3±0.3ms     0.49  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, 2)
-      1.36±0.5μs         653±20ns     0.48  index_cached_properties.IndexCache.time_is_unique('Int64Index')
-         449±5μs          200±3μs     0.45  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 11)
-         372±4μs          166±4μs     0.44  hash_functions.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.float64'>, 10)
-         855±8μs          341±5μs     0.40  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 1300)
-     1.40±0.01ms          548±9μs     0.39  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 2000)
-     1.41±0.02ms          547±4μs     0.39  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 2000)
-        959±10μs          350±4μs     0.36  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 1300)
-      4.92±0.1ms      1.75±0.02ms     0.36  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 7000)
-     5.80±0.07ms      2.06±0.02ms     0.35  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 8000)
-     4.88±0.03ms      1.66±0.01ms     0.34  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 7000)
-      54.9±0.6ms       18.4±0.6ms     0.33  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, 2)
-       792±300μs          262±4μs     0.33  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 2000)
-     5.94±0.06ms      1.92±0.01ms     0.32  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 8000)
-      61.1±0.5ms       19.2±0.1ms     0.31  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, -2)
-        63.9±2ms       19.8±0.3ms     0.31  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 70000)
-        653±30μs         194±10μs     0.30  hash_functions.IsinWithArangeSorted.time_isin(<class 'numpy.float64'>, 1000)
-         3.08±0s         789±10ms     0.26  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 900000)
-        93.8±2ms       23.8±0.8ms     0.25  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 80000)
-      3.05±0.02s          715±5ms     0.23  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 900000)
-        76.0±1ms       17.8±0.2ms     0.23  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 2000, 0)
-        79.2±3ms       17.6±0.6ms     0.22  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 70000)
-         108±4ms       22.3±0.7ms     0.21  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 80000)
-      2.78±0.02s         542±10ms     0.19  hash_functions.IsinWithRandomFloat.time_isin(<class 'object'>, 750000)
-      3.48±0.01s          600±7ms     0.17  hash_functions.IsinWithRandomFloat.time_isin_outside(<class 'object'>, 750000)
-       101±0.9ms       16.3±0.1ms     0.16  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, -2)
-         131±2ms       19.0±0.7ms     0.14  hash_functions.UniqueAndFactorizeArange.time_factorize(8)
-         128±1ms       16.9±0.3ms     0.13  hash_functions.UniqueAndFactorizeArange.time_factorize(11)
-       124±0.5ms      15.0±0.08ms     0.12  hash_functions.IsinWithArange.time_isin(<class 'numpy.float64'>, 1000, 0)
-       127±0.8ms       15.3±0.2ms     0.12  hash_functions.UniqueAndFactorizeArange.time_unique(8)
-       123±0.9ms       13.2±0.5ms     0.11  hash_functions.UniqueAndFactorizeArange.time_unique(11)
-         552±2ms       18.1±0.1ms     0.03  hash_functions.UniqueAndFactorizeArange.time_factorize(10)
-         551±9ms       14.4±0.4ms     0.03  hash_functions.UniqueAndFactorizeArange.time_unique(10)
-      1.84±0.01s         35.5±2ms     0.02  hash_functions.Float64GroupIndex.time_groupby
-      1.05±0.02s       19.1±0.2ms     0.02  hash_functions.UniqueAndFactorizeArange.time_factorize(9)
-      1.12±0.03s       15.4±0.5ms     0.01  hash_functions.UniqueAndFactorizeArange.time_unique(9)

@jreback
Copy link
Contributor

jreback commented Oct 31, 2020

@realead if you can rebase

These are the last entires on cpython vs your impl.

-         148±6ms       56.8±0.7ms     0.38  hash_functions.IsinWithArange.time_isin(<class 'object'>, 2000, 2)
-        92.3±3ms       9.01±0.3ms     0.10  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 1000000)
-       96.1±20ms         338±20μs     0.00  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')

are these just degenerate?

@jbrockmendel
Copy link
Member

needs rebase

@realead realead force-pushed the gh_28303_float_hash branch from 11d9731 to ac16c1e Compare November 3, 2020 05:10
@realead
Copy link
Contributor Author

realead commented Nov 3, 2020

@jreback

are these just degenerate?

I cannot really explain these result, when I rerun the tests I get:

-      9.28±0.2ms       7.37±0.3ms     0.79  hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 1000000)
-         490±6μs          321±3μs     0.65  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')

However, the "bad" measurements aren't just a fluke in my opinion. A possible explanation is: This PR is less "gentle" with cache than the current state or CPython's approach, thus more sensitive if some background process (my testing machine is a VM sharing hardware with others) claims some part of the cache.

This shows why the CPython-approach is tempting: it leaves the memory-access-pattern intact while being more robust for corner cases (because it takes the higher bits of the hash into account while the current implementations doesn't) on the other hand this PR offers more safety against degeneration of the performance, but is less performant for some special series like 1,2,3,4,...

However, while CPython's approach makes most sense for the usage of dict's: they probably aren't used very often for more than 1e6 elements, and for 1e6-elements the running time e.g. of n*sqrt(n) is not pleasant but not the end of the world, thus degeneration can be risked. For 1e8 elements, the running time of n*sqrt(n) is a show stopper and its risk should probably be minimized.

@jreback
Copy link
Contributor

jreback commented Nov 13, 2020

@realead

i am fine merging this with the current appropach, small comment, pls merge master and ping on green.

@realead realead force-pushed the gh_28303_float_hash branch from 824869f to e743c79 Compare November 13, 2020 21:18
@realead
Copy link
Contributor Author

realead commented Nov 13, 2020

@jreback

It is failing, but seems to be unrelated to to my changes (TestRegistration.test_pandas_plots_register should be unaffected by it).

... small comment ...

Sorry, I don't understand what you are referring to...

@jbrockmendel
Copy link
Member

but seems to be unrelated to to my changes

thats affecting all builds, dont worry about it.

Sorry, I don't understand what you are referring to [re "small comment"]

Best guess is @jreback was referring to the "are these just degenerate" question, which I think you answered.

I cannot really explain these result, when I rerun the tests I get [very different results, possibly bimodal around .1 and .8?]

Unfortunately, it isnt unusual for asv runs to include a lot of noise. I usually do multiple runs and successively narrow down consistent-looking results. That said, the results usually dont look bimodal, which these do if you squint.

@@ -13,25 +13,31 @@
// is 64 bits the truncation causes collission issues. Given all that, we use our own
// simple hash, viewing the double bytes as an int64 and using khash's default
// hash for 64 bit integers.
// GH 13436
// GH 13436 showed that _Py_HashDouble doesn't work well with khash
// GH 28303 showed, that the simple xoring-version isn't good enough
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a tiny bit more content here on why this appropach vs the CPython appropach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a late comment, pls update in a followon.

@jreback jreback merged commit 4cfa97a into pandas-dev:master Nov 14, 2020
@jreback
Copy link
Contributor

jreback commented Nov 14, 2020

thanks @realead this was quite some analysis. :->

if you can rebase #36611 and re-run the asv's let's see how that looks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extreme performance difference between int and float factorization
5 participants