Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Always using panda's hashtable approach, dropping np.in1d #36611

Merged
merged 14 commits into from
Nov 17, 2020

Conversation

realead
Copy link
Contributor

@realead realead commented Sep 24, 2020

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

numpy's in1d uses look-up in O(log n) in in1d compared to panda's O(1) so for a large number of unique elements in values pandas solution will become better.

Here is comparison in the performance benchmark:

       before           after         ratio
     [27aae225]       [796a7dbc]
+     11.4±0.07ms       48.8±0.3ms     4.29  series_methods.IsInLongSeries.time_isin('float64', 1)
+      19.4±0.2ms       59.5±0.6ms     3.06  series_methods.IsInLongSeries.time_isin('float64', 2)
+      13.1±0.1ms       39.2±0.5ms     2.98  series_methods.IsInLongSeries.time_isin('int64', 1)
+      26.9±0.4ms       64.0±0.7ms     2.38  series_methods.IsInLongSeries.time_isin('float32', 1)
+      34.5±0.2ms       74.9±0.9ms     2.17  series_methods.IsInLongSeries.time_isin('float32', 2)
+      23.2±0.1ms       44.5±0.3ms     1.92  series_methods.IsInLongSeries.time_isin('int64', 2)
+      28.1±0.4ms       53.8±0.4ms     1.91  series_methods.IsInLongSeries.time_isin('int32', 1)
+      42.7±0.2ms       70.3±0.3ms     1.64  series_methods.IsInLongSeries.time_isin('float64', 5)
+      38.2±0.3ms       59.1±0.5ms     1.55  series_methods.IsInLongSeries.time_isin('int32', 2)
+      57.4±0.3ms       85.3±0.5ms     1.49  series_methods.IsInLongSeries.time_isin('float32', 5)
-        83.2±1ms       76.6±0.5ms     0.92  series_methods.IsInLongSeries.time_isin('float64', 10)
-         118±6ms       69.8±0.2ms     0.59  series_methods.IsInLongSeries.time_isin('int32', 10)
-         102±1ms       54.9±0.3ms     0.54  series_methods.IsInLongSeries.time_isin('int64', 10)
-         515±3ms          216±1ms     0.42  series_methods.IsInLongSeries.time_isin('int32', 100000)
-         498±3ms          201±1ms     0.40  series_methods.IsInLongSeries.time_isin('int64', 100000)
-         529±2ms          154±7ms     0.29  series_methods.IsInLongSeries.time_isin('float32', 100000)
-         529±3ms        131±0.7ms     0.25  series_methods.IsInLongSeries.time_isin('float64', 100000)
-         519±5ms        107±0.6ms     0.21  series_methods.IsInLongSeries.time_isin('int32', 1000)
-         499±3ms       91.3±0.3ms     0.18  series_methods.IsInLongSeries.time_isin('int64', 1000)
-         534±4ms         87.6±3ms     0.16  series_methods.IsInLongSeries.time_isin('float32', 1000)
-         519±2ms       72.6±0.9ms     0.14  series_methods.IsInLongSeries.time_isin('float64', 1000)

It looks as if for more than 10 values-elements, panda's approach is faster.

See also this comment (#22205 (comment)).

The question is whether it really makes sense to keep numpy's approach for less than 10 values-elements.

@WillAyd
Copy link
Member

WillAyd commented Sep 24, 2020

Nice find. Not sure if we should just lower the limit in the condition to keep performance for a low number of elements to check, but I think this is OK

@WillAyd WillAyd added the Performance Memory or execution speed performance label Sep 24, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am pretty sure we have some other isin benchmarks, can you run them.

This is very odd result, I am pretty sure this numpy approach at least when we added this was WAY faster.

could be wrong, but see if you can find the issues that changed this and make sure that this is not in error.

@realead
Copy link
Contributor Author

realead commented Sep 25, 2020

@jreback

I have run all benchmarks which used isin, which resulted in

asv continuous -f 1.01 upstream/master HEAD -b ^series_methods -b ^categorial

The only tests with difference were:

+     4.44±0.03ms      4.77±0.08ms     1.08  series_methods.NanOps.time_func('var', 1000000, 'int8')
+        961±10μs      1.01±0.02ms     1.05  series_methods.NanOps.time_func('mean', 1000000, 'int8')
-     2.12±0.03ms      1.55±0.08ms     0.73  series_methods.NanOps.time_func('prod', 1000000, 'int8')
-     1.67±0.01ms       1.10±0.1ms     0.66  series_methods.NanOps.time_func('sum', 1000000, 'int8')

which I would say is an improvement in overall.

However, I have also changed my introduced benchmarks because they had a bias in the first version: most of the elements from the series were not in the values, which has interesting implications:

  • for numpy's in1d, which uses binary search as look-up strategy, that means there were almost no cache misses (which more or less is the bottle neck for big tables): most of the time we search for an element which is larger than the largest element in values, thus we always go the same path during the binary search which stays hot in the cache and there are almost no cache misses at all.
  • for khash-table it should make no difference, but it does (and that is a strange thing).

Now the benchmark have another bias: every element from the series is present in values, however elements of the series are random thus we can expect a lot of cache misses in numpy's in1d, the timing with this "improved" dataset are:

       before           after         ratio
     [e4086ec1]       [b93b1e46]
+      18.4±0.2ms        116±0.3ms     6.29  series_methods.IsInLongSeries.time_isin('float64', 2)
+      42.2±0.6ms          195±1ms     4.61  series_methods.IsInLongSeries.time_isin('float64', 5)
+      10.7±0.2ms       46.4±0.1ms     4.33  series_methods.IsInLongSeries.time_isin('float64', 1)
+      33.9±0.3ms        131±0.7ms     3.86  series_methods.IsInLongSeries.time_isin('float32', 2)
+      57.2±0.4ms          210±2ms     3.67  series_methods.IsInLongSeries.time_isin('float32', 5)
+      12.7±0.1ms       42.7±0.4ms     3.37  series_methods.IsInLongSeries.time_isin('int64', 1)
+      80.7±0.3ms        248±0.5ms     3.07  series_methods.IsInLongSeries.time_isin('float64', 10)
+      95.6±0.4ms          263±3ms     2.75  series_methods.IsInLongSeries.time_isin('float32', 10)
+      25.6±0.1ms       62.1±0.7ms     2.42  series_methods.IsInLongSeries.time_isin('float32', 1)
+      28.1±0.1ms         58.3±3ms     2.08  series_methods.IsInLongSeries.time_isin('int32', 1)
+      22.7±0.1ms       42.5±0.2ms     1.88  series_methods.IsInLongSeries.time_isin('int64', 2)
+      37.5±0.5ms       57.1±0.4ms     1.52  series_methods.IsInLongSeries.time_isin('int32', 2)
-        67.1±2ms       57.4±0.5ms     0.86  series_methods.IsInLongSeries.time_isin('int32', 5)
-         1.67±0s       1.26±0.01s     0.76  series_methods.IsInLongSeries.time_isin('float32', 1000)
-      1.66±0.01s          1.25±0s     0.75  series_methods.IsInLongSeries.time_isin('float64', 1000)
-       117±0.7ms       57.4±0.4ms     0.49  series_methods.IsInLongSeries.time_isin('int32', 10)
-       100±0.5ms       42.4±0.2ms     0.42  series_methods.IsInLongSeries.time_isin('int64', 10)
-      2.02±0.02s          286±1ms     0.14  series_methods.IsInLongSeries.time_isin('float32', 100000)
-         2.01±0s        271±0.9ms     0.13  series_methods.IsInLongSeries.time_isin('float64', 100000)
-         1.89±0s       75.5±0.2ms     0.04  series_methods.IsInLongSeries.time_isin('int32', 100000)
-         1.55±0s       57.1±0.3ms     0.04  series_methods.IsInLongSeries.time_isin('int32', 1000)
-         1.87±0s       60.8±0.3ms     0.03  series_methods.IsInLongSeries.time_isin('int64', 100000)
-      1.52±0.01s         42.8±1ms     0.03  series_methods.IsInLongSeries.time_isin('int64', 1000) 

Interesting difference to the original benchmark:

  • for many values numpy's in1d becomes 3-4 times slower compared to the original benchmark - the cost of more cache misses.
  • while for int32 and int64 the runtime of khash-table (panda's approach) is not really dependent on the number of elements in values - as expected, this is not the case for float32 and float64.
  • Curiously, series_methods.IsInLongSeries.time_isin('float64', 100000), (i.e. values has 10^5 unique elements), is 5 times faster than series_methods.IsInLongSeries.time_isin('float64', 1000), (i.e. values has 1000 unique elements): 270ms vs 1250ms.

My assumption is that the hash-functions for float64 (

#define kh_float64_hash_func_0_NAN(key) (khint32_t)((asint64(key))>>33^(asint64(key))^(asint64(key))<<11)
) isn't that great and 1.0, 2.0, 3.0 and so on are too often mapped onto the same hash-value. But this needs further inverstigation.

@WillAyd
Copy link
Member

WillAyd commented Sep 25, 2020

My assumption is that the hash-functions for float64 (

See also #28303

@realead
Copy link
Contributor Author

realead commented Sep 26, 2020

Just to illustrate how bad/severe the issue is. Here is the currently used hash-function for float64 as a standalone function:

%%cython
cdef extern from *:
    """
    #include <string.h>

    inline int64_t f64_to_i64(double val){
          unsigned long long int res; 
          memcpy(&res, &val, sizeof(double)); 
          return res;
    } 


    #define kh_float64_hash_func(key) (unsigned int)((f64_to_i64(key))>>33^(f64_to_i64(key))^(f64_to_i64(key))<<11)
    """
    unsigned int kh_float64_hash_func(double val)
    
def hash_f64(double val):
    return kh_float64_hash_func(val)

For 1000 keys, khash set/map will use 2048-buckets, thus we need to see the hashes modulo 2048:

n_buckets = 2048
f64hashes=[float64_hash(float(x))%n_buckets for x in range(1000)]
len(set(f64hashes)) # 2

gives us only 2 different hashes - 0 and 1024, but for int64 there are 1000 different hashes. Which mean that the look-up is O(n^2) for float64.

@realead
Copy link
Contributor Author

realead commented Sep 26, 2020

The most obvious solution would be to use _Py_HashDouble (as comments suggest "again"). I have tried quick and dirty for 64bit Python the following function:

khint32_t PANDAS_INLINE f64_hash(double val){
      if(val==0.0){
        return 0; //ZERO_HASH
      }
      if(val!=val){
       return 0; //NAN_HASH
      }
      Py_hash_t hash = _Py_HashDouble(val);
      return (khint32_t)((hash)>>33^(hash)^(hash)<<11);
}

This changed hash-function on top of this branch gave (only floats):

       before           after         ratio
     [e4086ec1]       [51986956]
+        28.8±1ms          343±4ms    11.88  series_methods.IsInLongSeries.time_isin('float64', 2)
+      52.4±0.9ms         375±10ms     7.15  series_methods.IsInLongSeries.time_isin('float32', 2)
+        64.5±2ms          355±4ms     5.51  series_methods.IsInLongSeries.time_isin('float64', 5)
+        88.6±3ms          382±3ms     4.31  series_methods.IsInLongSeries.time_isin('float32', 5)
+      17.0±0.5ms       68.0±0.3ms     4.00  series_methods.IsInLongSeries.time_isin('float64', 1)
+         121±4ms          356±4ms     2.96  series_methods.IsInLongSeries.time_isin('float64', 10)
+         149±7ms          385±3ms     2.58  series_methods.IsInLongSeries.time_isin('float32', 10)
+        41.5±1ms         98.0±8ms     2.36  series_methods.IsInLongSeries.time_isin('float32', 1)
-       2.82±0.2s          348±3ms     0.12  series_methods.IsInLongSeries.time_isin('float64', 1000)
-      3.14±0.06s          380±7ms     0.12  series_methods.IsInLongSeries.time_isin('float32', 1000)
-       3.78±0.3s         452±40ms     0.12  series_methods.IsInLongSeries.time_isin('float32', 100000)
-      3.51±0.03s         405±10ms     0.12  series_methods.IsInLongSeries.time_isin('float64', 100000)

The important results are:

  • we should not consider the case n=1, there is probably an optimization, that hash isn't calculated at all.
  • otherwise the running times are about 400ms independent of n (as expected)
  • looking at results for series_methods.IsInLongSeries.time_isin('float64', 2) we can see that using _Py_HashDouble is about 3-5 times slower than the current implementation.

It looks promising, however _Py_HashDouble has a property that probably doesn't work well with khash:

  • hash(1.0)=1,
  • hash(2.0)=2
  • and so on
  • hash(1000.0)=1000

there are no hash-collisions and that is all Python's dict needs. However, khash has another strategy and such a regular hash function probably means problems (by the way it also means problems for PyObject variant as well).

Another alternative which should be considered/tested is murmur-hash (https://github.com/aappleby/smhasher) which is used in gcc's libstdc++ (https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/libsupc%2B%2B/hash_bytes.cc#L25) as well as in libc++ (https://github.com/llvm/llvm-project/blob/1cfde143e82aeb47cffba436ba7b5302d8e14193/libcxx/include/utility#L977).

@jreback
Copy link
Contributor

jreback commented Sep 27, 2020

not averse to having a better float hash impl but would need some serious benchmarking

@realead
Copy link
Contributor Author

realead commented Sep 29, 2020

@jreback

could be wrong, but see if you can find the issues that changed this and make sure that this is not in error.

It is hard to tell what brought the speed-up. My first suspect would be #13436, but I'm not sure.

However, the performance testing for this PR should probably wait until a judgement is passed for #36729.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

param_names = ["dtype", "M"]

def setup(self, dtype, M):
self.s = Series(np.random.randint(0, M, 10 ** 7)).astype(dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we can expect big perf differences if the values here are clustered, e.g. if they are all negative or all >M in the extreme cases. Do we need to measure any non-uniform cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel You are right, the test is a little bit naive. We should test at least (I have pushed them):

  • average (random) case with elements in the look-up-table
  • average (random) case with elements not in the look-up-table
  • monotone series with elements in the look-up-table (better cache utilization for numpy's approach)
  • monotone series with elements not in the look-up-table (better cache utilization for numpy's approach)

it looks as if float32/float64 are somewhat problematic, hopefully after #36729 it will become clearer what is going on.

@realead
Copy link
Contributor Author

realead commented Nov 13, 2020

Current time comparisons:

       before           after         ratio
     [1f42d456]       [6977afe8]
+      21.3±0.7ms          119±1ms     5.60  series_methods.IsInLongSeries.time_isin('float64', 2, 'random_hits')
+        14.1±4ms         71.0±5ms     5.02  series_methods.IsInLongSeries.time_isin('float64', 1, 'random_misses')
+      46.4±0.7ms          207±6ms     4.45  series_methods.IsInLongSeries.time_isin('float64', 5, 'random_hits')
+        20.8±1ms       85.5±0.9ms     4.11  series_methods.IsInLongSeries.time_isin('float64', 2, 'random_misses')
+      11.9±0.3ms         47.5±4ms     3.99  series_methods.IsInLongSeries.time_isin('float64', 1, 'random_hits')
+        37.8±3ms         142±10ms     3.75  series_methods.IsInLongSeries.time_isin('float32', 2, 'random_hits')
+      13.6±0.7ms         45.7±1ms     3.36  series_methods.IsInLongSeries.time_isin('int64', 1, 'random_hits')
+        65.3±6ms          215±5ms     3.30  series_methods.IsInLongSeries.time_isin('float32', 5, 'random_hits')
+      14.1±0.6ms         45.4±1ms     3.21  series_methods.IsInLongSeries.time_isin('int64', 1, 'monotone')
+      12.2±0.6ms       38.5±0.8ms     3.16  series_methods.IsInLongSeries.time_isin('float64', 1, 'monotone')
+        49.4±4ms          145±6ms     2.94  series_methods.IsInLongSeries.time_isin('float64', 5, 'random_misses')
+        32.4±3ms        89.4±10ms     2.76  series_methods.IsInLongSeries.time_isin('float32', 1, 'random_misses')
+        39.7±6ms          105±6ms     2.65  series_methods.IsInLongSeries.time_isin('float32', 2, 'random_misses')
+        15.2±1ms         38.8±2ms     2.54  series_methods.IsInLongSeries.time_isin('int64', 1, 'random_misses')
+        22.8±1ms       56.7±0.6ms     2.49  series_methods.IsInLongSeries.time_isin('float64', 2, 'monotone')
+        63.0±4ms          154±3ms     2.45  series_methods.IsInLongSeries.time_isin('float32', 5, 'random_misses')
+         116±7ms         274±10ms     2.35  series_methods.IsInLongSeries.time_isin('float32', 10, 'random_hits')
+        62.5±6ms          142±5ms     2.28  series_methods.IsInLongSeries.time_isin('int64', 5, 'random_misses')
+         108±6ms         233±20ms     2.14  series_methods.IsInLongSeries.time_isin('float32', 10, 'random_misses')
+         655±7ms       1.40±0.06s     2.14  series_methods.IsInLongSeries.time_isin('float32', 1000, 'monotone')
+        465±90ms         979±80ms     2.11  series_methods.IsInLongSeries.time_isin('float64', 50, 'random_misses')
+        105±40ms          216±6ms     2.05  series_methods.IsInLongSeries.time_isin('float64', 10, 'random_misses')
+        635±30ms       1.29±0.05s     2.03  series_methods.IsInLongSeries.time_isin('float64', 1000, 'monotone')
+        901±40ms       1.80±0.04s     2.00  series_methods.IsInLongSeries.time_isin('float32', 100, 'random_misses')
+        31.3±2ms         62.2±5ms     1.99  series_methods.IsInLongSeries.time_isin('int32', 1, 'monotone')
+        483±70ms         942±10ms     1.95  series_methods.IsInLongSeries.time_isin('float32', 50, 'random_misses')
+        38.0±2ms        71.4±10ms     1.88  series_methods.IsInLongSeries.time_isin('int32', 1, 'random_hits')
+      25.0±0.9ms         46.5±3ms     1.86  series_methods.IsInLongSeries.time_isin('int64', 2, 'random_hits')
+        24.3±1ms       44.8±0.8ms     1.84  series_methods.IsInLongSeries.time_isin('int64', 2, 'monotone')
+        39.7±2ms         72.0±1ms     1.81  series_methods.IsInLongSeries.time_isin('float32', 2, 'monotone')
+        156±40ms          263±8ms     1.68  series_methods.IsInLongSeries.time_isin('float64', 10, 'random_hits')
+        41.7±2ms         69.4±4ms     1.66  series_methods.IsInLongSeries.time_isin('int32', 2, 'monotone')
+        68.0±8ms          109±3ms     1.60  series_methods.IsInLongSeries.time_isin('float32', 5, 'monotone')
+      24.6±0.7ms       38.3±0.8ms     1.55  series_methods.IsInLongSeries.time_isin('int64', 2, 'random_misses')
+        35.6±1ms         55.2±2ms     1.55  series_methods.IsInLongSeries.time_isin('int32', 1, 'random_misses')
+      40.0±0.5ms       61.0±0.8ms     1.52  series_methods.IsInLongSeries.time_isin('int32', 2, 'random_hits')
+         110±5ms          154±9ms     1.41  series_methods.IsInLongSeries.time_isin('float32', 10, 'monotone')
+        41.7±2ms       55.0±0.9ms     1.32  series_methods.IsInLongSeries.time_isin('int32', 2, 'random_misses')
+         108±1ms          136±3ms     1.25  series_methods.IsInLongSeries.time_isin('int64', 10, 'random_misses')
-        78.5±7ms       62.0±0.2ms     0.79  series_methods.IsInLongSeries.time_isin('int32', 5, 'monotone')
-      2.04±0.06s       1.35±0.08s     0.66  series_methods.IsInLongSeries.time_isin('float32', 1000, 'random_hits')
-       2.11±0.1s       1.33±0.05s     0.63  series_methods.IsInLongSeries.time_isin('float64', 1000, 'random_hits')
-      2.09±0.07s       1.10±0.06s     0.53  series_methods.IsInLongSeries.time_isin('float32', 1000, 'random_misses')
-      2.06±0.06s       1.06±0.03s     0.52  series_methods.IsInLongSeries.time_isin('float64', 1000, 'random_misses')
-         125±3ms       60.9±0.9ms     0.49  series_methods.IsInLongSeries.time_isin('int32', 10, 'monotone')
-         112±5ms         47.0±4ms     0.42  series_methods.IsInLongSeries.time_isin('int64', 10, 'random_hits')
-         112±4ms       45.5±0.7ms     0.40  series_methods.IsInLongSeries.time_isin('int64', 10, 'monotone')
-        639±30ms          112±5ms     0.18  series_methods.IsInLongSeries.time_isin('float32', 100000, 'monotone')
-        626±10ms         97.9±5ms     0.16  series_methods.IsInLongSeries.time_isin('float64', 100000, 'monotone')
-      2.37±0.07s         304±60ms     0.13  series_methods.IsInLongSeries.time_isin('float32', 100000, 'random_hits')
-      2.22±0.08s          275±9ms     0.12  series_methods.IsInLongSeries.time_isin('int64', 100000, 'random_misses')
-       2.32±0.1s          284±8ms     0.12  series_methods.IsInLongSeries.time_isin('float64', 100000, 'random_hits')
-       2.41±0.1s         291±80ms     0.12  series_methods.IsInLongSeries.time_isin('float32', 100000, 'random_misses')
-         613±6ms         70.6±6ms     0.12  series_methods.IsInLongSeries.time_isin('int32', 100000, 'monotone')
-      2.29±0.05s          254±7ms     0.11  series_methods.IsInLongSeries.time_isin('float64', 100000, 'random_misses')
-       569±100ms       62.3±0.8ms     0.11  series_methods.IsInLongSeries.time_isin('int32', 50, 'monotone')
-        572±20ms         61.8±1ms     0.11  series_methods.IsInLongSeries.time_isin('int32', 50, 'random_hits')
-       2.75±0.2s         288±50ms     0.10  series_methods.IsInLongSeries.time_isin('int32', 100000, 'random_misses')
-        557±20ms         55.7±3ms     0.10  series_methods.IsInLongSeries.time_isin('int32', 50, 'random_misses')
-        570±30ms         53.8±2ms     0.09  series_methods.IsInLongSeries.time_isin('int64', 100000, 'monotone')
-        667±50ms         61.6±5ms     0.09  series_methods.IsInLongSeries.time_isin('int32', 1000, 'monotone')
-        532±10ms         45.2±2ms     0.08  series_methods.IsInLongSeries.time_isin('int64', 50, 'random_hits')
-        557±50ms         45.2±2ms     0.08  series_methods.IsInLongSeries.time_isin('int64', 1000, 'monotone')
-        546±10ms       44.4±0.7ms     0.08  series_methods.IsInLongSeries.time_isin('int64', 50, 'monotone')
-         534±9ms       38.1±0.3ms     0.07  series_methods.IsInLongSeries.time_isin('int64', 50, 'random_misses')
-       1.06±0.2s        65.2±10ms     0.06  series_methods.IsInLongSeries.time_isin('int32', 100, 'monotone')
-      1.08±0.05s       60.9±0.8ms     0.06  series_methods.IsInLongSeries.time_isin('int32', 100, 'random_hits')
-      1.07±0.02s         54.4±2ms     0.05  series_methods.IsInLongSeries.time_isin('int32', 100, 'random_misses')
-      1.10±0.06s         45.2±1ms     0.04  series_methods.IsInLongSeries.time_isin('int64', 100, 'random_hits')
-       1.23±0.2s         45.5±5ms     0.04  series_methods.IsInLongSeries.time_isin('int64', 100, 'monotone')
-       2.34±0.1s        83.9±20ms     0.04  series_methods.IsInLongSeries.time_isin('int32', 100000, 'random_hits')
-      1.06±0.01s       37.9±0.5ms     0.04  series_methods.IsInLongSeries.time_isin('int64', 100, 'random_misses')
-      1.85±0.04s        64.4±10ms     0.03  series_methods.IsInLongSeries.time_isin('int32', 1000, 'random_hits')
-       2.33±0.2s         78.3±8ms     0.03  series_methods.IsInLongSeries.time_isin('int64', 100000, 'random_hits')
-      1.89±0.05s        53.8±10ms     0.03  series_methods.IsInLongSeries.time_isin('int32', 1000, 'random_misses')
-       1.85±0.2s         49.6±8ms     0.03  series_methods.IsInLongSeries.time_isin('int64', 1000, 'random_hits')
-       1.89±0.1s         38.6±1ms     0.02  series_methods.IsInLongSeries.time_isin('int64', 1000, 'random_misses')

One should see, how it looks once #36729 is resolved - right now float64/float32 seems to be problematic.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2020

this looks way better even w/o #36729, but if you can re-run and will see.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also pls add a release note in performance.

else:
f = np.in1d
elif is_integer_dtype(comps):
if is_integer_dtype(comps.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you but some commentary / link here to why we are not using np.int1d

@jreback jreback added this to the 1.2 milestone Nov 14, 2020
@realead
Copy link
Contributor Author

realead commented Nov 15, 2020

The timings are (pls use asv continuous -f 1.01 upstream/master HEAD~1 -b ^series_methods.IsInLongSeries because I've removing some of the test in the last commit - it is nice to see where the cut could be done, but takes otherwise too much time), for comparison with earlier timings IsInLongSeries was renamed to IsInLongSeriesLookUpDominates:

       before           after         ratio
     [8d1b8aba]       [ef49ca39]
+      11.1±0.1ms          130±1ms    11.70  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 1, 'monotone_misses')
+      18.9±0.2ms        203±0.6ms    10.73  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 2, 'monotone_misses')
+     18.9±0.08ms        176±0.3ms     9.34  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 2, 'random_misses')
+      11.2±0.1ms       90.3±0.4ms     8.10  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 1, 'random_misses')
+      19.0±0.2ms        126±0.6ms     6.66  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 2, 'random_hits')
+      34.6±0.2ms          224±3ms     6.49  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 2, 'monotone_misses')
+        43.3±2ms          265±2ms     6.13  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 5, 'monotone_misses')
+      43.0±0.5ms          249±1ms     5.78  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 5, 'random_misses')
+      34.6±0.2ms          193±2ms     5.58  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 2, 'random_misses')
+      26.2±0.7ms          145±1ms     5.52  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 1, 'monotone_misses')
+      42.7±0.5ms          218±2ms     5.10  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 5, 'random_hits')
+      57.8±0.3ms          280±1ms     4.85  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 5, 'monotone_misses')
+      11.0±0.2ms       51.0±0.8ms     4.65  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 1, 'monotone_hits')
+      57.6±0.1ms          265±2ms     4.60  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 5, 'random_misses')
+      11.3±0.1ms       50.9±0.4ms     4.50  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 1, 'random_hits')
+      23.0±0.2ms          102±2ms     4.42  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 2, 'monotone_misses')
+      13.1±0.2ms       53.0±0.2ms     4.06  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 1, 'random_hits')
+      13.3±0.3ms       53.1±0.2ms     4.00  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 1, 'monotone_hits')
+      19.0±0.4ms       74.6±0.3ms     3.93  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 2, 'monotone_hits')
+        36.3±3ms          141±2ms     3.89  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 2, 'random_hits')
+      27.3±0.9ms        106±0.7ms     3.88  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 1, 'random_misses')
+        60.7±1ms          234±2ms     3.86  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 5, 'random_hits')
+      13.2±0.4ms       50.0±0.7ms     3.80  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 1, 'monotone_misses')
+      12.8±0.1ms       47.7±0.3ms     3.73  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 1, 'random_misses')
+        84.1±1ms          273±1ms     3.25  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 10, 'random_misses')
+      83.6±0.9ms          271±1ms     3.25  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 10, 'monotone_misses')
+      83.0±0.5ms          248±2ms     2.99  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 10, 'random_hits')
+      38.8±0.9ms        115±0.9ms     2.96  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 2, 'monotone_misses')
+      98.5±0.7ms          287±1ms     2.91  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 10, 'monotone_misses')
+         100±2ms        287±0.9ms     2.87  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 10, 'random_misses')
+      42.7±0.8ms        116±0.7ms     2.72  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 5, 'monotone_hits')
+        98.5±1ms          262±2ms     2.66  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 10, 'random_hits')
+      34.4±0.4ms       90.5±0.9ms     2.63  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 2, 'monotone_hits')
+      53.0±0.3ms        137±0.4ms     2.59  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 5, 'random_misses')
+      26.2±0.3ms       66.5±0.6ms     2.54  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 1, 'monotone_hits')
+      26.6±0.2ms       66.4±0.4ms     2.49  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 1, 'random_hits')
+        53.2±1ms          133±1ms     2.49  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 5, 'monotone_misses')
+      28.5±0.2ms         69.3±1ms     2.43  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 1, 'monotone_hits')
+      29.0±0.2ms       68.0±0.7ms     2.35  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 1, 'random_hits')
+      23.0±0.2ms       52.9±0.4ms     2.30  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 2, 'monotone_hits')
+      23.2±0.2ms       52.7±0.2ms     2.27  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 2, 'random_hits')
+        58.6±1ms          133±1ms     2.27  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 5, 'monotone_hits')
+      28.6±0.6ms       64.7±0.5ms     2.26  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 1, 'monotone_misses')
+        68.3±2ms          153±1ms     2.24  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 5, 'random_misses')
+      28.5±0.4ms       61.5±0.7ms     2.16  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 1, 'random_misses')
+        70.2±2ms        149±0.8ms     2.13  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 5, 'monotone_misses')
+      23.0±0.1ms       47.2±0.8ms     2.05  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 2, 'random_misses')
+      38.3±0.5ms       68.9±0.8ms     1.80  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 2, 'monotone_hits')
+         132±3ms          231±2ms     1.75  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 16, 'monotone_misses')
+        38.8±1ms       67.6±0.5ms     1.74  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 2, 'random_hits')
+         144±1ms          246±2ms     1.70  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 16, 'monotone_misses')
+       103±0.7ms          173±2ms     1.68  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 10, 'monotone_misses')
+      38.6±0.5ms         61.7±1ms     1.60  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 2, 'random_misses')
+         117±1ms          187±1ms     1.59  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 10, 'monotone_misses')
+         132±1ms          200±1ms     1.52  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 16, 'random_misses')
+      83.8±0.8ms        124±0.6ms     1.48  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 10, 'monotone_hits')
+         147±1ms        213±0.3ms     1.45  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 16, 'random_misses')
+         131±1ms        189±0.7ms     1.45  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 16, 'random_hits')
+      97.7±0.6ms          139±1ms     1.43  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 10, 'monotone_hits')
+         147±3ms          205±2ms     1.39  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 16, 'random_hits')
+         374±3ms         518±10ms     1.39  series_methods.IsInLongSeriesValuesDominate.time_isin('float64', 'monotone')
+         390±4ms         506±10ms     1.30  series_methods.IsInLongSeriesValuesDominate.time_isin('float32', 'monotone')
+       102±0.3ms        118±0.9ms     1.16  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 10, 'random_misses')
-         147±1ms          130±1ms     0.88  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 16, 'monotone_hits')
-         177±3ms        155±0.5ms     0.87  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 16, 'monotone_misses')
-         133±1ms          115±1ms     0.87  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 16, 'monotone_hits')
-         162±3ms        139±0.9ms     0.86  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 16, 'monotone_misses')
-         342±5ms          277±1ms     0.81  series_methods.IsInLongSeriesValuesDominate.time_isin('int32', 'monotone')
-         326±4ms          264±2ms     0.81  series_methods.IsInLongSeriesValuesDominate.time_isin('int64', 'monotone')
-         542±3ms          357±4ms     0.66  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 100000, 'monotone_misses')
-         532±6ms          345±4ms     0.65  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 100000, 'monotone_misses')
-         522±2ms          298±1ms     0.57  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 100000, 'monotone_misses')
-         121±2ms       68.6±0.5ms     0.57  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 10, 'random_hits')
-         123±4ms       68.2±0.4ms     0.56  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 10, 'monotone_hits')
-         518±8ms          286±3ms     0.55  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 100000, 'monotone_misses')
-         104±2ms       53.5±0.6ms     0.52  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 10, 'monotone_hits')
-         104±1ms       52.8±0.1ms     0.51  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 10, 'random_hits')
-         418±5ms        200±0.9ms     0.48  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 50, 'monotone_misses')
-         536±4ms          246±2ms     0.46  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 1000, 'monotone_misses')
-         1.27±0s         577±20ms     0.45  series_methods.IsInLongSeriesValuesDominate.time_isin('float32', 'random')
-        408±30ms          183±1ms     0.45  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 50, 'monotone_misses')
-         418±4ms          187±1ms     0.45  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 50, 'random_misses')
-         535±7ms          229±2ms     0.43  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 1000, 'monotone_misses')
-         405±8ms        173±0.6ms     0.43  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 50, 'random_misses')
-         1.26±0s          528±7ms     0.42  series_methods.IsInLongSeriesValuesDominate.time_isin('float64', 'random')
-         180±2ms         69.2±2ms     0.38  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 16, 'monotone_hits')
-         179±2ms       67.9±0.2ms     0.38  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 16, 'random_hits')
-         1.09±0s         409±20ms     0.38  series_methods.IsInLongSeriesValuesDominate.time_isin('int32', 'random')
-         416±3ms          155±1ms     0.37  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 50, 'random_hits')
-      1.08±0.01s          391±5ms     0.36  series_methods.IsInLongSeriesValuesDominate.time_isin('int64', 'random')
-         399±3ms          142±2ms     0.35  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 50, 'random_hits')
-         181±1ms       61.4±0.3ms     0.34  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 16, 'random_misses')
-         436±2ms          147±2ms     0.34  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 100000, 'monotone_hits')
-       160±0.6ms       53.1±0.6ms     0.33  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 16, 'random_hits')
-       162±0.6ms       52.9±0.3ms     0.33  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 16, 'monotone_hits')
-         424±5ms          132±2ms     0.31  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 100000, 'monotone_hits')
-         514±2ms          158±2ms     0.31  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 1000, 'monotone_misses')
-         163±3ms       46.6±0.2ms     0.29  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 16, 'random_misses')
-         420±6ms        120±0.8ms     0.28  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 50, 'monotone_hits')
-         507±7ms        144±0.9ms     0.28  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 1000, 'monotone_misses')
-         442±3ms        124±0.4ms     0.28  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 1000, 'monotone_hits')
-         401±5ms        105±0.7ms     0.26  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 50, 'monotone_hits')
-         437±8ms        109±0.7ms     0.25  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 1000, 'monotone_hits')
-         812±3ms          201±1ms     0.25  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 100, 'random_misses')
-         815±4ms          197±1ms     0.24  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 100, 'monotone_misses')
-         796±7ms          187±2ms     0.23  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 100, 'random_misses')
-         519±5ms        120±0.6ms     0.23  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 50, 'monotone_misses')
-         803±8ms          184±2ms     0.23  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 100, 'monotone_misses')
-         499±8ms          105±1ms     0.21  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 50, 'monotone_misses')
-         816±4ms        151±0.6ms     0.18  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 100, 'random_hits')
-         405±2ms         73.7±1ms     0.18  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 100000, 'monotone_hits')
-         389±1ms       68.2±0.6ms     0.18  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 1000, 'monotone_hits')
-      2.07±0.02s          358±2ms     0.17  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 100000, 'random_misses')
-      1.94±0.01s          333±5ms     0.17  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 100000, 'random_misses')
-        796±10ms        136±0.3ms     0.17  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 100, 'random_hits')
-      2.04±0.02s          342±2ms     0.17  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 100000, 'random_misses')
-      1.94±0.03s          317±2ms     0.16  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 100000, 'random_misses')
-      2.06±0.01s          324±1ms     0.16  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 100000, 'random_hits')
-      2.04±0.01s          311±2ms     0.15  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 100000, 'random_hits')
-         814±5ms          120±1ms     0.15  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 100, 'monotone_hits')
-      1.71±0.01s        251±0.7ms     0.15  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 1000, 'random_misses')
-         399±9ms       56.9±0.4ms     0.14  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 100000, 'monotone_hits')
-         381±5ms       53.4±0.2ms     0.14  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 1000, 'monotone_hits')
-      1.71±0.02s          236±3ms     0.14  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 1000, 'random_misses')
-         800±7ms          106±1ms     0.13  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 100, 'monotone_hits')
-         522±4ms       68.5±0.5ms     0.13  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 50, 'monotone_hits')
-        532±20ms       67.9±0.2ms     0.13  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 50, 'random_hits')
-      1.01±0.01s          122±1ms     0.12  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 100, 'monotone_misses')
-         519±6ms       62.3±0.7ms     0.12  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 50, 'random_misses')
-         998±4ms          106±1ms     0.11  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 100, 'monotone_misses')
-         498±5ms       52.7±0.1ms     0.11  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 50, 'random_hits')
-         501±8ms       52.9±0.4ms     0.11  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 50, 'monotone_hits')
-      1.71±0.01s          168±1ms     0.10  series_methods.IsInLongSeriesLookUpDominates.time_isin('float32', 1000, 'random_hits')
-         500±7ms       46.8±0.7ms     0.09  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 50, 'random_misses')
-      1.70±0.01s          154±2ms     0.09  series_methods.IsInLongSeriesLookUpDominates.time_isin('float64', 1000, 'random_hits')
-      1.02±0.01s       68.4±0.6ms     0.07  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 100, 'random_hits')
-      1.02±0.01s         68.0±1ms     0.07  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 100, 'monotone_hits')
-      1.01±0.01s       61.5±0.6ms     0.06  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 100, 'random_misses')
-      1.00±0.01s       53.7±0.9ms     0.05  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 100, 'monotone_hits')
-         999±6ms       53.5±0.9ms     0.05  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 100, 'random_hits')
-         1.00±0s       47.1±0.5ms     0.05  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 100, 'random_misses')
-      1.92±0.01s         85.8±1ms     0.04  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 100000, 'random_hits')
-         1.59±0s       67.9±0.2ms     0.04  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 1000, 'random_hits')
-      1.58±0.01s       62.0±0.5ms     0.04  series_methods.IsInLongSeriesLookUpDominates.time_isin('int32', 1000, 'random_misses')
-      1.91±0.01s       70.0±0.3ms     0.04  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 100000, 'random_hits')
-      1.57±0.01s         55.6±2ms     0.04  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 1000, 'random_hits')
-      1.58±0.05s       47.0±0.2ms     0.03  series_methods.IsInLongSeriesLookUpDominates.time_isin('int64', 1000, 'random_misses')

When the look-up is dominated by the calculation of the hash-function (small numbers), we see the disadvantages of #36729 - it is costlier now (almost factor 3). However, already for n about 100 we see the advantages of a more robust hash-function: for some series we are almost 10 times faster (e.g. series_methods.IsInLongSeries.time_isin('float32', 100, 'random_misses') 1.8s vs. 201±1ms).

The question is: is it worth to keep np.in1d for (len(values)<16) for best possible performance or to drop it completely?

Seeing the numbers, I would say "Yes", even if it make the code harder to understand/maintain. What is your opinion @jreback @jbrockmendel @WillAyd ?

@jreback
Copy link
Contributor

jreback commented Nov 15, 2020

yeah the question here is there is a good benefit for hashing when you have x values to lookup but if fewer then the constant time creation hits you

so not averse to having a switchover point for a perf reasons at the expense of complexity

@jbrockmendel
Copy link
Member

The maintenance burden for this optimization is pretty small, I'm OK with keeping the switchover

I think the bigger optimization available (that would also decrease the [non-cython] complexity) is getting hashtables with more dtypes so we dont have to cast to int64 xref #33287

@jreback
Copy link
Contributor

jreback commented Nov 17, 2020

@realead ping when ready here

@jreback jreback merged commit d4d2e19 into pandas-dev:master Nov 17, 2020
@jreback
Copy link
Contributor

jreback commented Nov 17, 2020

thanks @realead very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants