Add try_unary, binary, try_binary kernels ~90% faster #2666

tustvold · 2022-09-06T13:20:50Z

Which issue does this PR close?

Closes #.

Rationale for this change

ArrayIter is great from an ergonomics perspective and definitely something we should continue to provide to users, however, the way it interleaves null masks severely makes for pretty poor performance. For primitive arrays, this can result in orders of magnitude more time spent handling the null mask than the values themselves.

divide                  time:   [75.202 us 75.276 us 75.359 us]                   
                        change: [-89.655% -89.644% -89.633%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

divide_nulls            time:   [92.458 us 92.489 us 92.522 us]                         
                        change: [-93.765% -93.754% -93.746%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe


modulo_nulls            time:   [218.79 us 218.85 us 218.92 us]                         
                        change: [-72.606% -72.571% -72.509%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

What changes are included in this PR?

Are there any user-facing changes?

FYI @viirya @liukun4515

tustvold · 2022-09-06T13:23:41Z

arrow/src/compute/kernels/arity.rs

 use std::sync::Arc;

+fn try_for_each_valid<F: FnMut(usize) -> Result<()>>(


This internal iteration style optimises much better than the external style when there are multiple variants, I may create a HybridBitIndexIterator that special cases for_each and try_fold 🤔

liukun4515 · 2022-09-06T14:17:00Z

arrow/src/compute/kernels/arity.rs

@@ -198,6 +259,80 @@ where
    }
 }

+pub fn binary<A, B, F, O>(


liukun4515 · 2022-09-06T14:19:18Z

arrow/src/compute/kernels/arity.rs

 #[inline]
-fn into_primitive_array_data<I: ArrowPrimitiveType, O: ArrowPrimitiveType>(
-    array: &PrimitiveArray<I>,
+unsafe fn build_primitive_array<O: ArrowPrimitiveType>(


the new name makes more sense than before

liukun4515 · 2022-09-06T14:36:42Z

arrow/src/compute/kernels/arithmetic.rs

-            }
+    try_binary(left, right, |a, b| {
+        op(a, b).ok_or_else(|| {
+            ArrowError::ComputeError(format!("Overflow happened on: {:?}, {:?}", a, b))


Can we point out the op which cause the overflow?

viirya · 2022-09-06T16:58:29Z

arrow/src/array/iterator.rs

@@ -24,8 +24,51 @@ use super::{
    PrimitiveArray,
 };

-/// an iterator that returns Some(T) or None, that can be used on any [`ArrayAccessor`]
-// Note: This implementation is based on std's [Vec]s' [IntoIter].
+/// An iterator that returns Some(T) or None, that can be used on any [`ArrayAccessor`]


Good explanation for this API. 👍

viirya · 2022-09-06T17:37:04Z

arrow/src/compute/kernels/arity.rs

+    let null_count = array.null_count();
+
+    let mut buffer = BufferBuilder::<O::Native>::new(len);
+    buffer.append_n_zeroed(array.len());


Hmm, do we need zero-initialized values for the buffer?

It is UB if we don't initialize all values in the buffer, even the null slots. We must therefore zero out the nulls, in the past I have found it is faster to zero initialize everything, and override the valid indexes, than to interleave appending nulls and values.

viirya · 2022-09-06T22:30:19Z

arrow/src/compute/kernels/arity.rs

+        if selectivity > 0.8 {
+            BitSliceIterator::new(nulls.unwrap(), 0, len)
+                .flat_map(|(start, end)| start..end)
+                .try_for_each(f)
+        } else {
+            BitIndexIterator::new(nulls.unwrap(), 0, len).try_for_each(f)


Am I read this incorrectly? I think higher selectivity here is more null values in the array? So I guess BitIndexIterator is more performant?

Oops, i think you're right. I copied this from the filter kernel and forgot to reverse this

HaoYang670 · 2022-09-07T05:03:08Z

arrow/src/compute/kernels/arity.rs

@@ -198,6 +259,80 @@ where
    }
 }

+pub fn binary<A, B, F, O>(


I guess we need to add docs for binary and try_binary.

HaoYang670 · 2022-09-07T05:11:03Z

arrow/src/compute/kernels/arity.rs

+        return Ok(PrimitiveArray::from(ArrayData::new_empty(&O::DATA_TYPE)));
+    }
+
+    let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len).unwrap();


Suggested change

let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len).unwrap();

let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len)?;

HaoYang670 · 2022-09-07T05:16:56Z

arrow/src/compute/kernels/arithmetic.rs

    op: F,
-) -> Result<PrimitiveArray<T>>
+) -> Result<PrimitiveArray<LT>>


Why is the return type same as the left type?

In the style, the type of left and right can be different.
Do we need to support this？

I believe it is required to allow adding a duration to a time

tustvold · 2022-09-09T10:28:49Z

Final benchmarks

add(0)                  time:   [11.227 us 11.246 us 11.266 us]                    
                        change: [+16.981% +17.118% +17.246%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

add_checked(0)          time:   [65.867 us 65.890 us 65.916 us]                           
                        change: [-92.090% -92.078% -92.060%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

add_scalar(0)           time:   [5.9757 us 5.9783 us 5.9812 us]                           
                        change: [-5.9046% -5.7997% -5.6941%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

subtract(0)             time:   [10.012 us 10.026 us 10.040 us]                         
                        change: [+2.7087% +2.8497% +2.9788%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

subtract_checked(0)     time:   [66.243 us 66.253 us 66.265 us]                                
                        change: [-92.047% -92.030% -92.010%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild
  2 (2.00%) high severe

subtract_scalar(0)      time:   [6.0194 us 6.0281 us 6.0367 us]                                
                        change: [-5.6497% -5.4648% -5.2827%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

multiply(0)             time:   [9.0082 us 9.0199 us 9.0313 us]                         
                        change: [-7.6503% -7.4907% -7.3226%] (p = 0.00 < 0.05)
                        Performance has improved.

multiply_checked(0)     time:   [65.879 us 65.896 us 65.916 us]                                
                        change: [-92.134% -92.122% -92.109%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  8 (8.00%) high severe

multiply_scalar(0)      time:   [5.9307 us 5.9356 us 5.9408 us]                                
                        change: [-6.8873% -6.7886% -6.6976%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

divide(0)               time:   [11.193 us 11.194 us 11.196 us]                       
                        change: [-0.2047% +0.0109% +0.1345%] (p = 0.92 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

divide_checked(0)       time:   [92.997 us 93.031 us 93.067 us]                              
                        change: [-88.829% -88.810% -88.769%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

divide_scalar(0)        time:   [11.121 us 11.122 us 11.124 us]                              
                        change: [-6.5334% -5.2295% -3.9311%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

modulo(0)               time:   [285.83 us 285.93 us 286.04 us]                      
                        change: [-38.958% -38.888% -38.775%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

Benchmarking modulo_scalar(0): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
modulo_scalar(0)        time:   [1.2745 ms 1.2751 ms 1.2759 ms]                              
                        change: [+0.5454% +0.6770% +0.8583%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

add(0.1)                time:   [10.884 us 10.943 us 11.019 us]                      
                        change: [-6.5805% -5.7599% -4.9249%] (p = 0.00 < 0.05)
                        Performance has improved.

add_checked(0.1)        time:   [152.23 us 152.30 us 152.38 us]                             
                        change: [-84.601% -84.578% -84.562%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

add_scalar(0.1)         time:   [6.3819 us 6.3834 us 6.3850 us]                             
                        change: [+7.1879% +7.2832% +7.3988%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

subtract(0.1)           time:   [10.841 us 10.850 us 10.858 us]                           
                        change: [-2.3402% -1.5446% -0.7289%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

subtract_checked(0.1)   time:   [150.19 us 150.25 us 150.32 us]                                  
                        change: [-84.988% -84.942% -84.915%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

subtract_scalar(0.1)    time:   [6.3538 us 6.3562 us 6.3591 us]                                  
                        change: [+7.4412% +7.5009% +7.5576%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

multiply(0.1)           time:   [11.025 us 11.042 us 11.062 us]                           
                        change: [-3.1599% -2.5328% -1.8726%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

multiply_checked(0.1)   time:   [150.59 us 150.61 us 150.63 us]                                  
                        change: [-84.775% -84.766% -84.758%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

multiply_scalar(0.1)    time:   [6.3859 us 6.3892 us 6.3931 us]                                  
                        change: [+7.7047% +7.7979% +7.8903%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

divide(0.1)             time:   [11.957 us 11.965 us 11.974 us]                         
                        change: [-0.6239% -0.3466% -0.0980%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) high mild
  4 (4.00%) high severe

divide_checked(0.1)     time:   [214.63 us 217.64 us 221.37 us]                                
                        change: [-77.104% -76.827% -76.523%] (p = 0.00 < 0.05)
                        Performance has improved.

divide_scalar(0.1)      time:   [11.122 us 11.125 us 11.128 us]                                
                        change: [-0.2726% +0.0005% +0.2795%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  8 (8.00%) high severe

modulo(0.1)             time:   [340.24 us 340.60 us 340.93 us]                        
                        change: [-47.233% -47.195% -47.154%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  10 (10.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

Benchmarking modulo_scalar(0.1): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
modulo_scalar(0.1)      time:   [1.2175 ms 1.2177 ms 1.2180 ms]                                
                        change: [-0.1447% +0.0582% +0.1750%] (p = 0.66 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

add(0.5)                time:   [9.7497 us 9.7623 us 9.7760 us]                      
                        change: [-15.134% -14.995% -14.848%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

add_checked(0.5)        time:   [88.050 us 88.075 us 88.104 us]                             
                        change: [-93.946% -93.935% -93.929%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

add_scalar(0.5)         time:   [5.9287 us 5.9329 us 5.9380 us]                             
                        change: [-8.3559% -8.1373% -7.9945%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

subtract(0.5)           time:   [9.8349 us 9.8428 us 9.8503 us]                           
                        change: [-17.554% -17.021% -16.534%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

subtract_checked(0.5)   time:   [86.171 us 86.188 us 86.208 us]                                  
                        change: [-94.066% -94.056% -94.050%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

subtract_scalar(0.5)    time:   [5.9195 us 5.9225 us 5.9269 us]                                  
                        change: [-8.2995% -8.2140% -8.1189%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

multiply(0.5)           time:   [9.6635 us 9.6777 us 9.6943 us]                           
                        change: [-15.146% -14.954% -14.764%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

multiply_checked(0.5)   time:   [85.098 us 85.126 us 85.159 us]                                  
                        change: [-94.132% -94.118% -94.102%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

multiply_scalar(0.5)    time:   [5.9200 us 5.9237 us 5.9278 us]                                  
                        change: [-8.2128% -8.1431% -8.0698%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

divide(0.5)             time:   [12.046 us 12.049 us 12.051 us]                         
                        change: [-0.4953% -0.2927% -0.1787%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

divide_checked(0.5)     time:   [91.931 us 91.972 us 92.018 us]                                
                        change: [-93.638% -93.635% -93.632%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

divide_scalar(0.5)      time:   [11.355 us 11.463 us 11.604 us]                                
                        change: [+6.9121% +8.4076% +9.9229%] (p = 0.00 < 0.05)
                        Performance has regressed.

modulo(0.5)             time:   [207.41 us 207.46 us 207.52 us]                        
                        change: [-72.652% -72.601% -72.572%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

modulo_scalar(0.5)      time:   [915.27 us 915.38 us 915.50 us]                               
                        change: [+0.1323% +0.1758% +0.2137%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  8 (8.00%) high severe

add(0.9)                time:   [11.666 us 11.671 us 11.676 us]                      
                        change: [-3.3882% -3.2702% -3.1398%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

add_checked(0.9)        time:   [27.665 us 27.675 us 27.686 us]                              
                        change: [-97.033% -97.032% -97.031%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

add_scalar(0.9)         time:   [6.3798 us 6.3827 us 6.3867 us]                             
                        change: [+7.7749% +7.8943% +8.0098%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  5 (5.00%) high severe

subtract(0.9)           time:   [11.760 us 11.765 us 11.770 us]                           
                        change: [+13.484% +13.620% +13.752%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

subtract_checked(0.9)   time:   [26.822 us 26.834 us 26.848 us]                                   
                        change: [-97.131% -97.127% -97.124%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

subtract_scalar(0.9)    time:   [6.3898 us 6.3927 us 6.3960 us]                                  
                        change: [+7.9864% +8.0589% +8.1266%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

multiply(0.9)           time:   [11.780 us 11.792 us 11.804 us]                           
                        change: [+14.224% +14.359% +14.501%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low mild
  4 (4.00%) high mild

multiply_checked(0.9)   time:   [28.264 us 28.275 us 28.288 us]                                   
                        change: [-96.974% -96.968% -96.959%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

multiply_scalar(0.9)    time:   [6.3647 us 6.3662 us 6.3681 us]                                  
                        change: [+7.3504% +7.4791% +7.6189%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

divide(0.9)             time:   [11.957 us 11.960 us 11.964 us]                         
                        change: [-2.6038% -2.5495% -2.4895%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

divide_checked(0.9)     time:   [30.427 us 30.438 us 30.450 us]                                 
                        change: [-96.737% -96.733% -96.726%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

divide_scalar(0.9)      time:   [11.153 us 11.155 us 11.159 us]                                
                        change: [+0.0128% +0.2851% +0.5665%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

modulo(0.9)             time:   [54.611 us 54.641 us 54.675 us]                        
                        change: [-85.530% -85.520% -85.510%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  6 (6.00%) high mild
  9 (9.00%) high severe

modulo_scalar(0.9)      time:   [347.32 us 347.43 us 347.58 us]                               
                        change: [-0.4787% -0.4314% -0.3671%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

add(1)                  time:   [9.9205 us 9.9267 us 9.9329 us]                    
                        change: [-7.5383% -7.4679% -7.3991%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild

add_checked(1)          time:   [4.6835 us 4.7599 us 4.8586 us]                            
                        change: [-99.416% -99.408% -99.398%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

add_scalar(1)           time:   [6.4029 us 6.4052 us 6.4078 us]                           
                        change: [+1.0775% +1.1667% +1.2739%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

subtract(1)             time:   [9.9767 us 9.9877 us 9.9982 us]                         
                        change: [-11.592% -11.492% -11.395%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

subtract_checked(1)     time:   [4.7278 us 4.8780 us 5.0544 us]                                  
                        change: [-99.434% -99.427% -99.419%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

subtract_scalar(1)      time:   [6.4270 us 6.4293 us 6.4321 us]                                
                        change: [+1.1474% +1.2054% +1.2601%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

multiply(1)             time:   [10.013 us 10.031 us 10.051 us]                         
                        change: [-8.6851% -8.4095% -8.1241%] (p = 0.00 < 0.05)
                        Performance has improved.

multiply_checked(1)     time:   [4.7410 us 4.8338 us 4.9462 us]                                 
                        change: [-99.411% -99.402% -99.392%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) high mild
  11 (11.00%) high severe

multiply_scalar(1)      time:   [6.4224 us 6.4264 us 6.4310 us]                                
                        change: [+1.2780% +1.3487% +1.4176%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  1 (1.00%) high mild
  6 (6.00%) high severe

divide(1)               time:   [12.043 us 12.048 us 12.053 us]                       
                        change: [-0.2281% -0.1755% -0.1296%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

divide_checked(1)       time:   [4.7782 us 4.8815 us 5.0043 us]                               
                        change: [-99.405% -99.394% -99.381%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe

divide_scalar(1)        time:   [11.283 us 11.356 us 11.454 us]                              
                        change: [+6.4576% +7.8297% +9.3776%] (p = 0.00 < 0.05)
                        Performance has regressed.

modulo(1)               time:   [4.7719 us 4.9086 us 5.1167 us]                       
                        change: [-98.369% -98.349% -98.324%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

modulo_scalar(1)        time:   [217.53 us 217.57 us 217.61 us]                             
                        change: [-2.3072% -2.1226% -2.0109%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

So anything from 50% to 99% faster for the checked kernels 🎉 . The unchecked kernels regularly vary by +-10% but we're talking single-digit microseconds, so this isn't all that surprising.

ursabot · 2022-09-11T06:41:24Z

Benchmark runs are scheduled for baseline = d88ed6a and contender = 2d28010. 2d28010 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

tustvold force-pushed the add-try-unary branch from 0815487 to 882a655 Compare September 6, 2022 13:22

tustvold commented Sep 6, 2022

View reviewed changes

tustvold mentioned this pull request Sep 6, 2022

Overflow-checking variant of arithmetic scalar kernels #2650

Merged

github-actions bot added the arrow Changes to the arrow crate label Sep 6, 2022

liukun4515 reviewed Sep 6, 2022

View reviewed changes

arrow/src/compute/kernels/arity.rs

@@ -198,6 +259,80 @@ where

}

}

pub fn binary<A, B, F, O>(

Copy link

Contributor

liukun4515 Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

liukun4515 reviewed Sep 6, 2022

View reviewed changes

viirya reviewed Sep 6, 2022

View reviewed changes

HaoYang670 reviewed Sep 7, 2022

View reviewed changes

tustvold force-pushed the add-try-unary branch from 90e12b1 to fb9ae62 Compare September 9, 2022 09:47

Add try_unary, binary, try_binary kernels

f05a3e6

tustvold force-pushed the add-try-unary branch from fdd3040 to f05a3e6 Compare September 9, 2022 10:29

tustvold marked this pull request as ready for review September 9, 2022 10:30

tustvold requested review from viirya, HaoYang670 and liukun4515 September 9, 2022 14:03

viirya approved these changes Sep 11, 2022

View reviewed changes

tustvold merged commit 2d28010 into apache:master Sep 11, 2022

alamb mentioned this pull request Sep 15, 2022

Upgrade to arrow 23.0.0 apache/datafusion#3483

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add try_unary, binary, try_binary kernels ~90% faster #2666

Add try_unary, binary, try_binary kernels ~90% faster #2666

tustvold commented Sep 6, 2022 •

edited

Loading

tustvold Sep 6, 2022

liukun4515 Sep 6, 2022

liukun4515 Sep 6, 2022

liukun4515 Sep 6, 2022

viirya Sep 6, 2022

viirya Sep 6, 2022

tustvold Sep 6, 2022

viirya Sep 6, 2022

tustvold Sep 6, 2022

HaoYang670 Sep 7, 2022

HaoYang670 Sep 7, 2022

HaoYang670 Sep 7, 2022

liukun4515 Sep 7, 2022

tustvold Sep 7, 2022

tustvold commented Sep 9, 2022

ursabot commented Sep 11, 2022

		use std::sync::Arc;

		fn try_for_each_valid<F: FnMut(usize) -> Result<()>>(

	let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len).unwrap();
	let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len)?;

Add try_unary, binary, try_binary kernels ~90% faster #2666

Add try_unary, binary, try_binary kernels ~90% faster #2666

Conversation

tustvold commented Sep 6, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Sep 9, 2022

ursabot commented Sep 11, 2022

tustvold commented Sep 6, 2022 •

edited

Loading