Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add try_unary, binary, try_binary kernels ~90% faster #2666

Merged
merged 1 commit into from
Sep 11, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Sep 6, 2022

Which issue does this PR close?

Closes #.

Rationale for this change

ArrayIter is great from an ergonomics perspective and definitely something we should continue to provide to users, however, the way it interleaves null masks severely makes for pretty poor performance. For primitive arrays, this can result in orders of magnitude more time spent handling the null mask than the values themselves.

divide                  time:   [75.202 us 75.276 us 75.359 us]                   
                        change: [-89.655% -89.644% -89.633%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

divide_nulls            time:   [92.458 us 92.489 us 92.522 us]                         
                        change: [-93.765% -93.754% -93.746%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe


modulo_nulls            time:   [218.79 us 218.85 us 218.92 us]                         
                        change: [-72.606% -72.571% -72.509%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

What changes are included in this PR?

Are there any user-facing changes?

FYI @viirya @liukun4515

use std::sync::Arc;

fn try_for_each_valid<F: FnMut(usize) -> Result<()>>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This internal iteration style optimises much better than the external style when there are multiple variants, I may create a HybridBitIndexIterator that special cases for_each and try_fold 🤔

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 6, 2022
@@ -198,6 +259,80 @@ where
}
}

pub fn binary<A, B, F, O>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

#[inline]
fn into_primitive_array_data<I: ArrowPrimitiveType, O: ArrowPrimitiveType>(
array: &PrimitiveArray<I>,
unsafe fn build_primitive_array<O: ArrowPrimitiveType>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new name makes more sense than before

}
try_binary(left, right, |a, b| {
op(a, b).ok_or_else(|| {
ArrowError::ComputeError(format!("Overflow happened on: {:?}, {:?}", a, b))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we point out the op which cause the overflow?

@@ -24,8 +24,51 @@ use super::{
PrimitiveArray,
};

/// an iterator that returns Some(T) or None, that can be used on any [`ArrayAccessor`]
// Note: This implementation is based on std's [Vec]s' [IntoIter].
/// An iterator that returns Some(T) or None, that can be used on any [`ArrayAccessor`]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good explanation for this API. 👍

let null_count = array.null_count();

let mut buffer = BufferBuilder::<O::Native>::new(len);
buffer.append_n_zeroed(array.len());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do we need zero-initialized values for the buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is UB if we don't initialize all values in the buffer, even the null slots. We must therefore zero out the nulls, in the past I have found it is faster to zero initialize everything, and override the valid indexes, than to interleave appending nulls and values.

Comment on lines 43 to 48
if selectivity > 0.8 {
BitSliceIterator::new(nulls.unwrap(), 0, len)
.flat_map(|(start, end)| start..end)
.try_for_each(f)
} else {
BitIndexIterator::new(nulls.unwrap(), 0, len).try_for_each(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I read this incorrectly? I think higher selectivity here is more null values in the array? So I guess BitIndexIterator is more performant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, i think you're right. I copied this from the filter kernel and forgot to reverse this

@@ -198,6 +259,80 @@ where
}
}

pub fn binary<A, B, F, O>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to add docs for binary and try_binary.

return Ok(PrimitiveArray::from(ArrayData::new_empty(&O::DATA_TYPE)));
}

let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len).unwrap();
let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len)?;

op: F,
) -> Result<PrimitiveArray<T>>
) -> Result<PrimitiveArray<LT>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the return type same as the left type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the style, the type of left and right can be different.
Do we need to support this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is required to allow adding a duration to a time

@tustvold
Copy link
Contributor Author

tustvold commented Sep 9, 2022

Final benchmarks
add(0)                  time:   [11.227 us 11.246 us 11.266 us]                    
                        change: [+16.981% +17.118% +17.246%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

add_checked(0)          time:   [65.867 us 65.890 us 65.916 us]                           
                        change: [-92.090% -92.078% -92.060%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

add_scalar(0)           time:   [5.9757 us 5.9783 us 5.9812 us]                           
                        change: [-5.9046% -5.7997% -5.6941%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

subtract(0)             time:   [10.012 us 10.026 us 10.040 us]                         
                        change: [+2.7087% +2.8497% +2.9788%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

subtract_checked(0)     time:   [66.243 us 66.253 us 66.265 us]                                
                        change: [-92.047% -92.030% -92.010%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild
  2 (2.00%) high severe

subtract_scalar(0)      time:   [6.0194 us 6.0281 us 6.0367 us]                                
                        change: [-5.6497% -5.4648% -5.2827%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

multiply(0)             time:   [9.0082 us 9.0199 us 9.0313 us]                         
                        change: [-7.6503% -7.4907% -7.3226%] (p = 0.00 < 0.05)
                        Performance has improved.

multiply_checked(0)     time:   [65.879 us 65.896 us 65.916 us]                                
                        change: [-92.134% -92.122% -92.109%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  8 (8.00%) high severe

multiply_scalar(0)      time:   [5.9307 us 5.9356 us 5.9408 us]                                
                        change: [-6.8873% -6.7886% -6.6976%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

divide(0)               time:   [11.193 us 11.194 us 11.196 us]                       
                        change: [-0.2047% +0.0109% +0.1345%] (p = 0.92 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

divide_checked(0)       time:   [92.997 us 93.031 us 93.067 us]                              
                        change: [-88.829% -88.810% -88.769%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

divide_scalar(0)        time:   [11.121 us 11.122 us 11.124 us]                              
                        change: [-6.5334% -5.2295% -3.9311%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

modulo(0)               time:   [285.83 us 285.93 us 286.04 us]                      
                        change: [-38.958% -38.888% -38.775%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

Benchmarking modulo_scalar(0): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
modulo_scalar(0)        time:   [1.2745 ms 1.2751 ms 1.2759 ms]                              
                        change: [+0.5454% +0.6770% +0.8583%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

add(0.1)                time:   [10.884 us 10.943 us 11.019 us]                      
                        change: [-6.5805% -5.7599% -4.9249%] (p = 0.00 < 0.05)
                        Performance has improved.

add_checked(0.1)        time:   [152.23 us 152.30 us 152.38 us]                             
                        change: [-84.601% -84.578% -84.562%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

add_scalar(0.1)         time:   [6.3819 us 6.3834 us 6.3850 us]                             
                        change: [+7.1879% +7.2832% +7.3988%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

subtract(0.1)           time:   [10.841 us 10.850 us 10.858 us]                           
                        change: [-2.3402% -1.5446% -0.7289%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

subtract_checked(0.1)   time:   [150.19 us 150.25 us 150.32 us]                                  
                        change: [-84.988% -84.942% -84.915%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

subtract_scalar(0.1)    time:   [6.3538 us 6.3562 us 6.3591 us]                                  
                        change: [+7.4412% +7.5009% +7.5576%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

multiply(0.1)           time:   [11.025 us 11.042 us 11.062 us]                           
                        change: [-3.1599% -2.5328% -1.8726%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

multiply_checked(0.1)   time:   [150.59 us 150.61 us 150.63 us]                                  
                        change: [-84.775% -84.766% -84.758%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

multiply_scalar(0.1)    time:   [6.3859 us 6.3892 us 6.3931 us]                                  
                        change: [+7.7047% +7.7979% +7.8903%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

divide(0.1)             time:   [11.957 us 11.965 us 11.974 us]                         
                        change: [-0.6239% -0.3466% -0.0980%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) high mild
  4 (4.00%) high severe

divide_checked(0.1)     time:   [214.63 us 217.64 us 221.37 us]                                
                        change: [-77.104% -76.827% -76.523%] (p = 0.00 < 0.05)
                        Performance has improved.

divide_scalar(0.1)      time:   [11.122 us 11.125 us 11.128 us]                                
                        change: [-0.2726% +0.0005% +0.2795%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  8 (8.00%) high severe

modulo(0.1)             time:   [340.24 us 340.60 us 340.93 us]                        
                        change: [-47.233% -47.195% -47.154%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  10 (10.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

Benchmarking modulo_scalar(0.1): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
modulo_scalar(0.1)      time:   [1.2175 ms 1.2177 ms 1.2180 ms]                                
                        change: [-0.1447% +0.0582% +0.1750%] (p = 0.66 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

add(0.5)                time:   [9.7497 us 9.7623 us 9.7760 us]                      
                        change: [-15.134% -14.995% -14.848%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

add_checked(0.5)        time:   [88.050 us 88.075 us 88.104 us]                             
                        change: [-93.946% -93.935% -93.929%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

add_scalar(0.5)         time:   [5.9287 us 5.9329 us 5.9380 us]                             
                        change: [-8.3559% -8.1373% -7.9945%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

subtract(0.5)           time:   [9.8349 us 9.8428 us 9.8503 us]                           
                        change: [-17.554% -17.021% -16.534%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

subtract_checked(0.5)   time:   [86.171 us 86.188 us 86.208 us]                                  
                        change: [-94.066% -94.056% -94.050%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

subtract_scalar(0.5)    time:   [5.9195 us 5.9225 us 5.9269 us]                                  
                        change: [-8.2995% -8.2140% -8.1189%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

multiply(0.5)           time:   [9.6635 us 9.6777 us 9.6943 us]                           
                        change: [-15.146% -14.954% -14.764%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

multiply_checked(0.5)   time:   [85.098 us 85.126 us 85.159 us]                                  
                        change: [-94.132% -94.118% -94.102%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

multiply_scalar(0.5)    time:   [5.9200 us 5.9237 us 5.9278 us]                                  
                        change: [-8.2128% -8.1431% -8.0698%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

divide(0.5)             time:   [12.046 us 12.049 us 12.051 us]                         
                        change: [-0.4953% -0.2927% -0.1787%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

divide_checked(0.5)     time:   [91.931 us 91.972 us 92.018 us]                                
                        change: [-93.638% -93.635% -93.632%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

divide_scalar(0.5)      time:   [11.355 us 11.463 us 11.604 us]                                
                        change: [+6.9121% +8.4076% +9.9229%] (p = 0.00 < 0.05)
                        Performance has regressed.

modulo(0.5)             time:   [207.41 us 207.46 us 207.52 us]                        
                        change: [-72.652% -72.601% -72.572%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

modulo_scalar(0.5)      time:   [915.27 us 915.38 us 915.50 us]                               
                        change: [+0.1323% +0.1758% +0.2137%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  8 (8.00%) high severe

add(0.9)                time:   [11.666 us 11.671 us 11.676 us]                      
                        change: [-3.3882% -3.2702% -3.1398%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

add_checked(0.9)        time:   [27.665 us 27.675 us 27.686 us]                              
                        change: [-97.033% -97.032% -97.031%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

add_scalar(0.9)         time:   [6.3798 us 6.3827 us 6.3867 us]                             
                        change: [+7.7749% +7.8943% +8.0098%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  5 (5.00%) high severe

subtract(0.9)           time:   [11.760 us 11.765 us 11.770 us]                           
                        change: [+13.484% +13.620% +13.752%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

subtract_checked(0.9)   time:   [26.822 us 26.834 us 26.848 us]                                   
                        change: [-97.131% -97.127% -97.124%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

subtract_scalar(0.9)    time:   [6.3898 us 6.3927 us 6.3960 us]                                  
                        change: [+7.9864% +8.0589% +8.1266%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

multiply(0.9)           time:   [11.780 us 11.792 us 11.804 us]                           
                        change: [+14.224% +14.359% +14.501%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low mild
  4 (4.00%) high mild

multiply_checked(0.9)   time:   [28.264 us 28.275 us 28.288 us]                                   
                        change: [-96.974% -96.968% -96.959%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

multiply_scalar(0.9)    time:   [6.3647 us 6.3662 us 6.3681 us]                                  
                        change: [+7.3504% +7.4791% +7.6189%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

divide(0.9)             time:   [11.957 us 11.960 us 11.964 us]                         
                        change: [-2.6038% -2.5495% -2.4895%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

divide_checked(0.9)     time:   [30.427 us 30.438 us 30.450 us]                                 
                        change: [-96.737% -96.733% -96.726%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

divide_scalar(0.9)      time:   [11.153 us 11.155 us 11.159 us]                                
                        change: [+0.0128% +0.2851% +0.5665%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

modulo(0.9)             time:   [54.611 us 54.641 us 54.675 us]                        
                        change: [-85.530% -85.520% -85.510%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  6 (6.00%) high mild
  9 (9.00%) high severe

modulo_scalar(0.9)      time:   [347.32 us 347.43 us 347.58 us]                               
                        change: [-0.4787% -0.4314% -0.3671%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

add(1)                  time:   [9.9205 us 9.9267 us 9.9329 us]                    
                        change: [-7.5383% -7.4679% -7.3991%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild

add_checked(1)          time:   [4.6835 us 4.7599 us 4.8586 us]                            
                        change: [-99.416% -99.408% -99.398%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

add_scalar(1)           time:   [6.4029 us 6.4052 us 6.4078 us]                           
                        change: [+1.0775% +1.1667% +1.2739%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

subtract(1)             time:   [9.9767 us 9.9877 us 9.9982 us]                         
                        change: [-11.592% -11.492% -11.395%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

subtract_checked(1)     time:   [4.7278 us 4.8780 us 5.0544 us]                                  
                        change: [-99.434% -99.427% -99.419%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

subtract_scalar(1)      time:   [6.4270 us 6.4293 us 6.4321 us]                                
                        change: [+1.1474% +1.2054% +1.2601%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

multiply(1)             time:   [10.013 us 10.031 us 10.051 us]                         
                        change: [-8.6851% -8.4095% -8.1241%] (p = 0.00 < 0.05)
                        Performance has improved.

multiply_checked(1)     time:   [4.7410 us 4.8338 us 4.9462 us]                                 
                        change: [-99.411% -99.402% -99.392%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) high mild
  11 (11.00%) high severe

multiply_scalar(1)      time:   [6.4224 us 6.4264 us 6.4310 us]                                
                        change: [+1.2780% +1.3487% +1.4176%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  1 (1.00%) high mild
  6 (6.00%) high severe

divide(1)               time:   [12.043 us 12.048 us 12.053 us]                       
                        change: [-0.2281% -0.1755% -0.1296%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

divide_checked(1)       time:   [4.7782 us 4.8815 us 5.0043 us]                               
                        change: [-99.405% -99.394% -99.381%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe

divide_scalar(1)        time:   [11.283 us 11.356 us 11.454 us]                              
                        change: [+6.4576% +7.8297% +9.3776%] (p = 0.00 < 0.05)
                        Performance has regressed.

modulo(1)               time:   [4.7719 us 4.9086 us 5.1167 us]                       
                        change: [-98.369% -98.349% -98.324%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

modulo_scalar(1)        time:   [217.53 us 217.57 us 217.61 us]                             
                        change: [-2.3072% -2.1226% -2.0109%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

So anything from 50% to 99% faster for the checked kernels 🎉 . The unchecked kernels regularly vary by +-10% but we're talking single-digit microseconds, so this isn't all that surprising.

@tustvold tustvold merged commit 2d28010 into apache:master Sep 11, 2022
@ursabot
Copy link

ursabot commented Sep 11, 2022

Benchmark runs are scheduled for baseline = d88ed6a and contender = 2d28010. 2d28010 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants