Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ArrowBitsUnpackInt32() #278

Merged
merged 4 commits into from
Aug 17, 2023
Merged

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Aug 17, 2023

As a follow-up to #276. The int32 version is useful because R uses 32-bit integers to represent boolean (i.e., logical) arrays. This results in a significant speedup in boolean conversion!

@WillAyd: I updated a few things that you just added (Sorry! 😬 ):

  • I changed Bitmap -> Bits and removed Unsafe to make it more consistent with the other functions that accept const uint8_t* bits
  • I updated the test function so that it tests both the int32 and int8 types at once

Before this PR:

library(nanoarrow)

lgls <- nanoarrow:::vec_gen(logical(), 1e6)
bool_array <- as_nanoarrow_array(lgls)
bool_array_arrow <- arrow::as_arrow_array(bool_array)

bench::mark(
  convert_array(bool_array, logical()),
  as.vector(bool_array_arrow),
  as.logical(lgls)
)
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 convert_array(bool_array, logical()) 556µs  749µs    1.33e3    3.82MB     156.
#> 2 as.vector(bool_array_arrow)          558µs  780µs    1.30e3    3.82MB     144.
#> 3 as.logical(lgls)                         0    1ns    2.28e8        0B       0

bench::mark(
  convert_array(bool_array, integer()),
  as.integer(lgls)
)
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 convert_array(bool_array, integer()) 733µs  912µs     1093.    3.81MB     167.
#> 2 as.integer(lgls)                     615µs  788µs     1273.    3.81MB     182.

After this PR:

library(nanoarrow)

lgls <- nanoarrow:::vec_gen(logical(), 1e6)
bool_array <- as_nanoarrow_array(lgls)
bool_array_arrow <- arrow::as_arrow_array(bool_array)

bench::mark(
  convert_array(bool_array, logical()),
  as.vector(bool_array_arrow),
  as.logical(lgls)
)
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 convert_array(bool_array, logical()) 105µs  308µs    3.21e3    3.83MB     367.
#> 2 as.vector(bool_array_arrow)          559µs  772µs    1.30e3    3.82MB     143.
#> 3 as.logical(lgls)                         0      0    5.87e8        0B       0

bench::mark(
  convert_array(bool_array, integer()),
  as.integer(lgls)
)
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 convert_array(bool_array, integer()) 104µs  310µs     3181.    3.81MB     423.
#> 2 as.integer(lgls)                     615µs  784µs     1278.    3.81MB     142.

Created on 2023-08-17 with reprex v2.0.2

@codecov-commenter
Copy link

Codecov Report

Merging #278 (e685462) into main (e21cc98) will increase coverage by 1.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #278      +/-   ##
==========================================
+ Coverage   87.22%   88.23%   +1.01%     
==========================================
  Files          66        3      -63     
  Lines       10158      357    -9801     
==========================================
- Hits         8860      315    -8545     
+ Misses       1298       42    -1256     

see 63 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@paleolimbot paleolimbot changed the title feat: Add ArrowBitUnpackInt32Unsafe() feat: Add ArrowBitsUnpackInt32() Aug 17, 2023
@paleolimbot paleolimbot marked this pull request as ready for review August 17, 2023 14:54
Copy link
Contributor

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries at all. I think these are all pretty reasonable - nice work!

@paleolimbot paleolimbot merged commit ad83497 into apache:main Aug 17, 2023
27 checks passed
@paleolimbot paleolimbot deleted the unpack-int32 branch August 17, 2023 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants