Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add enum ArrowType buffer_data_type member to struct ArrowLayout #207

Merged
merged 2 commits into from
May 30, 2023

Conversation

paleolimbot
Copy link
Member

The existing enum ArrowBufferType is useful to switch on for some things and the element_size_bits is useful for calculating byte offsets; however, the combination of those is still not sufficient to do endian-swapping as each Arrow type has its own rules. This concept has also come up when printing buffers in the R package and when exporting in Python via the buffer protocol, both of which had their own workarounds. Attaching a reasonable ArrowType to each buffer should provide a generic route to solving all of those.

@codecov-commenter
Copy link

Codecov Report

Merging #207 (a63265f) into main (dcb33f0) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #207      +/-   ##
==========================================
+ Coverage   87.90%   87.93%   +0.02%     
==========================================
  Files          60       60              
  Lines        9114     9164      +50     
==========================================
+ Hits         8012     8058      +46     
- Misses       1102     1106       +4     
Impacted Files Coverage Δ
src/nanoarrow/nanoarrow_types.h 92.75% <ø> (ø)
src/nanoarrow/utils.c 98.51% <100.00%> (-1.49%) ⬇️

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@paleolimbot paleolimbot merged commit 1e6ca7e into apache:main May 30, 2023
@paleolimbot paleolimbot deleted the buffer-type branch May 30, 2023 12:49
paleolimbot added a commit that referenced this pull request May 31, 2023
After #207, buffer data types are readily available. Before, the R
bindings did most of this on their own to get debug buffer printing.
This PR exploits the new feature to improve debug output and adds the
ability to convert buffers into R vectors along the way.

Before this PR:

``` r
library(nanoarrow)
as_nanoarrow_array(c(NA, stringr::words))
#> <nanoarrow_array string[981]>
#>  $ length    : int 981
#>  $ null_count: int 1
#>  $ offset    : int 0
#>  $ buffers   :List of 3
#>   ..$ :<nanoarrow_buffer_validity[123 b] at 0x13b71d8d0>
#>   ..$ :<nanoarrow_buffer_data_offset32[3928 b] at 0x13d87b600>
#>   ..$ :<nanoarrow_buffer_data_utf8[5126 b] at 0x13d87c600>
#>  $ dictionary: NULL
#>  $ children  : list()
```

<sup>Created on 2023-05-30 with [reprex
v2.0.2](https://reprex.tidyverse.org)</sup>

After this PR:

``` r
library(nanoarrow)
as_nanoarrow_array(c(NA, stringr::words))
#> <nanoarrow_array string[981]>
#>  $ length    : int 981
#>  $ null_count: int 1
#>  $ offset    : int 0
#>  $ buffers   :List of 3
#>   ..$ :<nanoarrow_buffer validity<bool>[984][123 b]> `FALSE TRUE TRUE TRUE T...`
#>   ..$ :<nanoarrow_buffer data_offset<int32>[982][3928 b]> `0 0 1 5 10 18 24 ...`
#>   ..$ :<nanoarrow_buffer data<string>[5126 b]> `aableaboutabsoluteacceptacco...`
#>  $ dictionary: NULL
#>  $ children  : list()
```

<sup>Created on 2023-05-30 with [reprex
v2.0.2](https://reprex.tidyverse.org)</sup>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants