feat(python): Add column-wise buffer builder #464

paleolimbot · 2024-05-13T19:52:55Z

This PR implements building columns buffer-wise for the types where this makes sense. It also implements a few other changes:

item_size was renamed to itemsize to match the memoryview property name
The visitor methods are now the ArrayViewVisitable mixin such that they are available in both the Array and ArrayView without duplicating documentation.

Functionally this means that the Array and ArrayStream now have to_column() and to_column_list() methods that do something that more closely matches what somebody would expect.

A quick demo:

import nanoarrow as na
import pyarrow as pa

batch = pa.record_batch({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
batch_with_nulls = pa.record_batch({"col1": [1, None, 3], "col2": ["a", "b", None]})

# Either builds a buffer or a list depending on column types
na.Array(batch).to_columns_pysequence()
#> (['col1', 'col2'],
#>  [nanoarrow.c_lib.CBuffer(int64[24 b] 1 2 3), ['a', 'b', 'c']])

# One can inject a null handler (a few experimental ones provided)
na.Array(batch_with_nulls).to_columns_pysequence(handle_nulls=na.nulls_as_sentinel())
#> (['col1', 'col2'], [array([ 1., nan,  3.]), ['a', 'b', None]])

# ...by default you have to choose how to do this or we error
na.Array(batch_with_nulls).to_columns_pysequence()
#> ValueError: Null present with null_handler=nulls_forbid()

This will basically get you data frame conversion:

import nanoarrow as na
import pandas as pd

url = "https://github.com/apache/arrow-experiments/raw/main/data/arrow-commits/arrow-commits.arrows"
names, data = na.ArrayStream.from_url(url).to_columns_pysequence(handle_nulls=na.nulls_as_sentinel())
pd.DataFrame({k: v for k, v in zip(names, data)})
#>                                          commit                      time  \
#> 0      49cdb0fe4e98fda19031c864a18e6156c6edbf3c 2024-03-07 02:00:52+00:00   
#> 1      1d966e98e41ce817d1f8c5159c0b9caa4de75816 2024-03-06 21:51:34+00:00   
#> 2      96f26a89bd73997f7532643cdb27d04b70971530 2024-03-06 20:29:15+00:00   
#> 3      ee1a8c39a55f3543a82fed900dadca791f6e9f88 2024-03-06 07:46:45+00:00   
#> 4      3d467ac7bfae03cf2db09807054c5672e1959aec 2024-03-05 16:13:32+00:00   
#> ...                                         ...                       ...   
#> 15482  23c4b08d154f8079806a1f0258d7e4af17bdf5fd 2016-02-17 12:39:03+00:00   
#> 15483  16e44e3d456219c48595142d0a6814c9c950d30c 2016-02-17 12:38:39+00:00   
#> 15484  fa5f0299f046c46e1b2f671e5e3b4f1956522711 2016-02-17 12:38:39+00:00   
#> 15485  cbc56bf8ac423c585c782d5eda5c517ea8df8e3c 2016-02-17 12:38:39+00:00   
#> 15486  d5aa7c46692474376a3c31704cfc4783c86338f2 2016-02-05 20:08:35+00:00   
#> 
#>        files  merge                                            message  
#> 0          2  False  GH-40370: [C++] Define ARROW_FORCE_INLINE for ...  
#> 1          1  False     GH-40386: [Python] Fix except clauses (#40387)  
#> 2          1  False  GH-40227: [R] ensure executable files in `crea...  
#> 3          1  False  GH-40366: [C++] Remove const qualifier from Bu...  
#> 4          1  False  GH-20127: [Python][CI] Remove legacy hdfs test...  
#> ...      ...    ...                                                ...  
#> 15482     73  False  ARROW-4: This provides an partial C++11 implem...  
#> 15483      8  False  ARROW-3: This patch includes a WIP draft speci...  
#> 15484    124  False                 ARROW-1: Initial Arrow Code Commit  
#> 15485      2  False             Update readme and add license in root.  
#> 15486      1  False                                     Initial Commit  
#> 
#> [15487 rows x 5 columns]

jorisvandenbossche

A few quick comments!

jorisvandenbossche · 2024-05-14T16:02:12Z

python/src/nanoarrow/visitor.py

@@ -49,7 +50,7 @@ def to_pylist(obj, schema=None) -> List:
    return ListBuilder.visit(obj, schema)


-def to_columns(obj, schema=None) -> Tuple[List[str], List[Sequence]]:
+def to_columns(obj, schema=None, handle_nulls=None) -> Tuple[List[str], List[Sequence]]:
    """Convert ``obj`` to a ``list()` of sequences

    Converts a stream of struct arrays into its column-wise representation


For the line below, but can you then clarify when you get a buffer or when a list?

python/src/nanoarrow/visitor.py

jorisvandenbossche · 2024-05-14T16:20:33Z

python/src/nanoarrow/visitor.py

+    return handle
+
+
+def nulls_as_masked_array():


I wonder if it would be more useful to actually return each array as a tuple of data and mask (the numpy masked array isn't used that much)

Good to know! I'll remove that then 🙂

I wonder if it would be more useful to actually return each array as a tuple of data and mask

Currently that would be nulls_debug(), mostly because I couldn't think of a good name (in a previous version it was nulls_as_mask_and_data()). Returning a tuple here sort of breaks the "every column is a sequence" guarantee. There could also be a MaskedSequence class that wraps the mask and data but that seems like opening a can of worms.

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

danepitkin

Very cool! The Column builder code is quite nuanced, I need to do another pass over it.

danepitkin · 2024-05-15T19:22:10Z

python/src/nanoarrow/visitor.py

 from nanoarrow.iterator import ArrayViewBaseIterator, PyIterator
 from nanoarrow.schema import Type


-def to_pylist(obj, schema=None) -> List:
-    """Convert ``obj`` to a ``list()` of Python objects
+class ArrayViewVisitable:


Awesome, love this!

danepitkin · 2024-05-15T19:23:13Z

python/src/nanoarrow/visitor.py

+
+        >>> import nanoarrow as na
+        >>> import pyarrow as pa
+        >>> batch = pa.record_batch([pa.array([1, 2, 3])], names=["col1"])


nit: can we use two columns in the docstrings? I think it would better show the functionality.

danepitkin · 2024-05-15T19:28:10Z

python/src/nanoarrow/visitor.py

+    return handle
+
+
+def nulls_debug() -> Callable[[CBuffer, Sequence], Tuple[CBuffer, Sequence]]:


Should we mark this as an internal API? e.g. _nulls_debug()

I changed the name to nulls_separate() and exported it...from @jorisvandenbossche's comment it sounds like it would be useful (in the situation where caller wants to handle nulls completely on their own).

nulls_separate sounds in any case as a better name! (and yes I think it would be useful)

danepitkin · 2024-05-15T19:29:57Z

python/src/nanoarrow/visitor.py


    def begin(self, total_elements: Union[int, None] = None) -> None:
        for child_visitor in self._child_visitors:
            child_visitor.begin(total_elements)

    def visit_chunk_view(self, array_view: CArrayView) -> Any:
+        if array_view.null_count > 0:
+            raise ValueError("null_count > 0 encountered in ColumnsBuilder")


Why is this an error?

If there are any nulls here the column results will be bogus! (We have no way to propagate a top-level nulls through into the child columns). At the "record batch" level nulls don't exist (typically they are exported as an explicitly non-nullable struct; however, at least one implementation doesn't do this so we have to check for actual nulls whilst iterating instead of erroring when we see the schema).

Can you add a brief comment about this in the code?

danepitkin · 2024-05-15T19:47:05Z

python/tests/test_nanoarrow.py

+
+    # Infer null count == 0 because of null data buffer
+    array = na.c_array_from_buffers(
+        na.int32(), 3, (None, na.c_buffer([1, 2, 3], na.int32())), null_count=-1


why do we need to add null_count=-1 in these tests?

I added comments to each of these...-1 is the sentinel for "not computed yet". Technically it's the default but here it's exactly what we're testing.

jorisvandenbossche

First set of comments. And thanks for the nice docstrings!

python/src/nanoarrow/visitor.py

jorisvandenbossche · 2024-05-16T13:26:18Z

python/src/nanoarrow/visitor.py

+        handle_nulls : callable
+            A function returning a sequence based on a validity bytemap and a
+            contiguous buffer of values (e.g., the callable returned by
+            :meth:`nulls_as_sentinel`).


Mention that the default is nulls_forbid()?

And maybe also list the built-in options

jorisvandenbossche · 2024-05-16T13:29:00Z

python/src/nanoarrow/visitor.py

+
+        Converts a stream of arrays into a columnar representation
+        such that each column is either a contiguous buffer or a ``list()``.
+        Integer, float, and interval arrays are currently converted to their


What is the reason interval arrays are returned as buffer? (just because there is no obvious python object?)

And why are other primitive fixed-width types like timestamp not returned as buffers? (because they are not a c primitive type? And have an obvious python object to use?)

I can imagine that for certain purposes, you would actually want the integers behind a timestamp column, because that is much cheaper to work with than a list of python datetime.datetime objects.

Should we make this user configurable?

What is the reason interval arrays are returned as buffer? (just because there is no obvious python object?)

The technical answer is that schema_view.buffer_format returns "iiq" (i.e., not None)...I never really looked at what pyarrow does here but I see now that it returns a named tuple if converted to a list. The last time I tried to_numpy() on a pyarrow interval I got a crash ( apache/arrow#41326 ). I'm pretty happy to do anything here (or make a breaking change later to handle it properly).

And why are other primitive fixed-width types like timestamp not returned as buffers?

If just the storage were returned it would be lossy (i.e., pd.Series() would do the wrong thing). There's no way to communicate what to do here without invoking numpy or pandas-specific logic, so a list of Python objects is maybe a safer default.

Should we make this user configurable?

Totally (and also make Python object conversion configurable), but I'm not sure exactly how to do that yet. If somebody wanted to do this today, I'd suggest subclassing the visitor:

import nanoarrow as na import pyarrow as pa from datetime import datetime from nanoarrow.visitor import ColumnsBuilder, NullableColumnBuilder class CustomColumnsBuilder(ColumnsBuilder): def _resolve_child_visitor(self, child_schema, child_array_view, handle_nulls): if na.Schema(child_schema).type == na.Type.TIMESTAMP: return NullableColumnBuilder(na.int64(), handle_nulls=handle_nulls) else: return super()._resolve_child_visitor( child_schema, child_array_view, handle_nulls ) batch = pa.record_batch({"ts": [datetime.now()]}) CustomColumnsBuilder.visit(batch) #> (['ts'], [nanoarrow.c_lib.CBuffer(int64[8 b] 1715862463542238)])

python/src/nanoarrow/visitor.py

jorisvandenbossche · 2024-05-16T13:38:18Z

python/src/nanoarrow/visitor.py

+        schema : schema-like, optional
+            An optional schema, passed to :func:`c_array_stream`.


This is not actually present in the signature?

jorisvandenbossche · 2024-05-16T13:40:48Z

python/src/nanoarrow/visitor.py

+        if len(is_valid) > 0:
+            raise ValueError("Null present with null_handler=nulls_forbid()")


It's not because there is a validity buffer that it includes any nulls? Or did you ensure that before? (eg not convert the bitmap to bytemap if the bitmap is all set?)

Yes...the bool buffer indicating validity is never allocated unless a null is actually present (and is None otherwise). I added this to the handle_nulls documentation!

jorisvandenbossche · 2024-05-16T13:43:54Z

python/src/nanoarrow/visitor.py

+    The default sentinel value will result in ``nan`` assigned to null
+    values in numeric and boolean outputs.


I am not entirely sure this is the case for boolean? (setting None into a boolean array sets False)

Ah, but you ensure to convert int/bool to float in case there are nulls, so that setting as NaN indeed works correctly. Maybe clarify that here

Yes, it's a bit of a side-effect of how numpy handles None with result_type(). I added a comment clarifying this.

import numpy as np sentinel = None arr = np.array([True, False, True, True], np.result_type(sentinel, np.bool_)) arr[2] = sentinel arr #> array([ 1., 0., nan, 1.])

jorisvandenbossche · 2024-05-16T13:48:58Z

python/src/nanoarrow/visitor.py

+
+    >>> import nanoarrow as na
+    >>> na.Array([1, 2, 3], na.int32()).to_column(na.nulls_separate())
+    (nanoarrow.c_lib.CBuffer(uint8[0 b] ), nanoarrow.c_lib.CBuffer(int32[12 b] 1 2 3))


Should we return None here instead of an empty buffer? Because if you convert both outputs to numpy arrays, you can have numpy arrays of different length (of course, it's not that one of checking for None or for empty / len 0 is much harder than the other)

Yes! I updated this to return None, which is less ambiguous. One could also update this such that handle_nulls() is never called unless there are actually nulls?

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

jorisvandenbossche · 2024-05-16T13:55:43Z

python/src/nanoarrow/visitor.py

-    def finish(self) -> List:
-        return self._lst
+    def finish(self) -> Any:
+        return self._visitor.finish()


 class ColumnsBuilder(ArrayStreamVisitor):


FWIW, and I don't directly have a better idea, but I found the usage of "columns" somewhat confusing here (and also "Builder", because that sounds to similar as the builders to build arrays/buffers in the python->arrow direction). Naming is hard .. ;)

It's been through a few iterations of naming 🙂 . Maybe sequence is better than column? The gist of it is "something that either numpy or Pandas can understand" and/or "we dealt with all the hard parts of a non-contiguous or non-zero offset array so you didn't have to". I could stick with Visitor instead of Builder, too (but these are internal for now so at least that part can get renamed).

I'll add that I also had a bit of trouble comprehending the naming. I think it might be worthwhile to rename "column" to something else. Technically, its just a contiguous array, but array is overloaded. Are these technically all concatenation helpers?

At Dane's suggestion these are now to_pysequence() and to_columns_pysequence(). I think these can/should get renamed when we see exactly how/if they are used (it may be that everybody will want custom logic and the high-level convert method will get unused).

jorisvandenbossche · 2024-05-16T14:01:23Z

python/src/nanoarrow/visitor.py

+            :func:`c_array_stream`.
+        schema : schema-like, optional
+            An optional schema, passed to :func:`c_array_stream`.
+        handle_nulls : callable


This keyword is only used when the output is a buffer and not when it is a list?

Yes, and only for explicitly nullable fields. Maybe there should be a handle_valid and handle_nulls? Or maybe just a handle_result()?

I kept this as is for now but I think there will eventually need to be some way to inject more customization into the process (including exactly when the null handler gets called).

python/src/nanoarrow/visitor.py

jorisvandenbossche · 2024-05-16T15:15:22Z

python/src/nanoarrow/visitor.py


    def begin(self, total_elements: Union[int, None] = None) -> None:
        for child_visitor in self._child_visitors:
            child_visitor.begin(total_elements)

    def visit_chunk_view(self, array_view: CArrayView) -> Any:
+        if array_view.null_count > 0:
+            raise ValueError("null_count > 0 encountered in ColumnsBuilder")


Can you add a brief comment about this in the code?

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

danepitkin · 2024-05-16T20:06:13Z

python/src/nanoarrow/visitor.py

+        """
+        return ListBuilder.visit(self)
+
+    def to_column_list(self, handle_nulls=None) -> Tuple[List[str], List[Sequence]]:


would to_table potentially be more descriptive?

Maybe not, cause it truly is a pylist of na.arrays. But it is tabular in nature.

I went with to_columns_pysequence() for now...it's not quite a table and I am not sure inventing one here is the solution either. Unfortunately it's also the most useful function! Hopefully its usage will give some guidance as to what it should be named or if it needs to be removed/live somewhere else.

danepitkin · 2024-05-16T20:08:06Z

python/src/nanoarrow/visitor.py

+        """
+        return ColumnsBuilder.visit(self, handle_nulls=handle_nulls)
+
+    def to_column(self, handle_nulls=None) -> Sequence:


Am I correct in thinking this is a concatenation function? e.g. chunked array -> array

Maybe merge_chunks or consolidate_chunks or unify_chunks could be a more descriptive name?

But it does concatenation + conversion to python lists at the same time, so just naming it about the first is also not descriptive enough

Yes, concatenation is a side effect of the fact that the non-Arrow pydata universe can't handle chunked data. This function is specifically trying to get data out of Arrow land. Your suggestion of to_pysequence() is much better than to_column() so I went with that!

danepitkin · 2024-05-16T20:11:25Z

python/src/nanoarrow/visitor.py

-    def finish(self) -> List:
-        return self._lst
+    def finish(self) -> Any:
+        return self._visitor.finish()


 class ColumnsBuilder(ArrayStreamVisitor):


I'll add that I also had a bit of trouble comprehending the naming. I think it might be worthwhile to rename "column" to something else. Technically, its just a contiguous array, but array is overloaded. Are these technically all concatenation helpers?

paleolimbot · 2024-05-17T12:54:25Z

I hear everybody on the naming thing! "Column" is not great because it doesn't have a precedent here (the closest thing would be pyarrow.Table.column(), which is still giving arrow arrays), and "Builder" is already used to describe the conversion of to an array. "Concatenate" might imply that we're returning an Array, which we could do or build using this machinery, but is not really the desired endpoint here.

There is precedent for the term "convert" in the R bindings ( https://arrow.apache.org/nanoarrow/latest/r/reference/convert_array_stream.html ) and Arrow C++ ( https://github.com/apache/arrow/blob/2dbc5e26dcbc6826b4eb7a330fa8090836f6b727/cpp/src/arrow/util/converter.h#L40 ), and so I gave that terminology a try in the last few commits.

The crux of what these helpers are trying to do is to get a stream of arrays (possibly of indeterminate length) out of Arrow land to be represented by something else. The default something else has to be limited to the Python standard library because of the zero dependencyness, which is "pybuffer or list". In the R bindings you can do things like:

convert_array_stream(stream)  # default conversion
convert_array_stream(stream, tibble::tibble())  # explicit output prototype

Here, visitable.convert() could do the same thing (although won't in this PR because it's a can of worms, and maybe not ever if nobody ends up using the high-level interface).

array.convert() # default conversion
array.convert(np.int32) # ...would get you an np.array with dtype int32

I will also mark these as "experimental" such that it's clear we're settling on the terminology/behaviour/scope here.

danepitkin

I like the improvements! I did end up adding a more comments on naming. Let me know what you think.

danepitkin · 2024-05-17T15:25:42Z

python/src/nanoarrow/visitor.py

+        """
+        return ColumnListConverter.visit(self, handle_nulls=handle_nulls)
+
+    def convert(self, *, handle_nulls=None) -> Sequence:


I do like convert! My last recommendation would be to_pysequence instead. to_pybuffer would also work if pybuffer is split from pylist.

ehh maybe not use pysequence because that would mean the object implements the python sequence protocol.

although the type hint does say a sequence is returned.. do pybuffer and pylist both implement the sequence protocol?

danepitkin · 2024-05-17T15:27:19Z

python/src/nanoarrow/visitor.py

            v.finish() for v in self._child_visitors
        ]
+
+
+class ListConverter(ArrayStreamVisitor):


What if we named these all ArrayStreamTo<Type>Converter?

danepitkin · 2024-05-17T15:29:57Z

python/src/nanoarrow/visitor.py

@@ -124,56 +248,209 @@ def finish(self) -> Any:
        return None


-class ListBuilder(ArrayStreamVisitor):
-    def __init__(self, schema, *, iterator_cls=PyIterator, array_view=None):
+class DispatchingConverter(ArrayStreamVisitor):


I think I would prefer ArrayStreamToPySequenceConverter here. Sorry to continue nitpicking the names, this is one of the last ones that I would change if that helps.

I went with ToPySequenceConverter (for brevity) but it's a great point that ti's unclear if an XXXConverter is converting to or from an XXX!

danepitkin · 2024-05-17T15:43:35Z

python/src/nanoarrow/visitor.py



-class ColumnsBuilder(ArrayStreamVisitor):
-    def __init__(self, schema, *, array_view=None):
+class ColumnListConverter(ArrayStreamVisitor):


I still think naming this table or tabular would be more helpful to a user. Maybe ArrayStreamToTabularConverter or perhaps we define a Table typedef wrapper and call it ArrayStreamToTableConverter?

Table = namedtuple('table', ['column_names', 'columns'])

My main feedback about using "column" is that its a subcomponent of a table or matrix, but there isn't any tabular object in nanoarrow right now. For example, Pandas calls these "Series" and uses "column" terminology when the object is a dataframe.

I did keep the "column" terminology" here...the intention with the return value of this function is that the list of sequences will be subcomponents of a table...we're just agnostic here to exactly which "table" they are going to end up in.

danepitkin

Overall LGTM! Thanks for being receptive to all of our feedback.

danepitkin · 2024-05-17T18:33:45Z

python/src/nanoarrow/visitor.py

+        >>> columns
+        [nanoarrow.c_lib.CBuffer(int64[24 b] 1 2 3), ['a', 'b', 'c']]
+        """
+        return ToColumnListConverter.visit(self, handle_nulls=handle_nulls)


optional nit: do we want to update this to be ToColumnsPySequenceConverter now?

paleolimbot added 6 commits May 13, 2024 14:01

start on column builders

e924e9d

maybe working tests

2049e63

prototype null handling

4526c0b

some null handling

f611ee8

first stab at dispatch

4fd4c2d

nulls in columns

3792f4c

paleolimbot mentioned this pull request May 14, 2024

feat(python): Add visitor pattern + builders for column sequences #454

Merged

fix doctests

68fa3ff

jorisvandenbossche reviewed May 14, 2024

View reviewed changes

paleolimbot and others added 7 commits May 14, 2024 14:01

Apply suggestions from code review

b9be69a

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

remove masked array handling

dd6c4f5

document null handlers

a427fca

use base class

62d5dd3

fix doctests

b474b29

better doc

14e3067

test non-nullable dispatch

b249e99

paleolimbot marked this pull request as ready for review May 14, 2024 20:40

paleolimbot mentioned this pull request May 15, 2024

feat(python): Implement numpy conversion #438

Closed

danepitkin reviewed May 15, 2024

View reviewed changes

paleolimbot added 4 commits May 16, 2024 09:51

use two columns in docs

eb1aacc

rename and export nulls_debug

ae44f7d

test null error

67a7b44

clarify null_count=-1

6c1cde8

jorisvandenbossche reviewed May 16, 2024

View reviewed changes

paleolimbot and others added 6 commits May 16, 2024 11:11

Apply suggestions from code review

db7400c

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

remove outdate param def

9ccd5a5

improve handle_nulls docs

a7d5b90

use None as sentinel for "all valid"

60f101c

document is_valid as None

b734af0

clarify sentinel handling

561f8ca

paleolimbot added 2 commits May 16, 2024 12:11

fix sentinel handler

63adc2b

fix test

beb1ce6

jorisvandenbossche reviewed May 16, 2024

View reviewed changes

paleolimbot and others added 4 commits May 16, 2024 12:34

get names from the schema andn ot from the child visitors

8899fcf

Update python/src/nanoarrow/visitor.py

8180f31

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

Update python/src/nanoarrow/visitor.py

c2d6813

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

add comment

9a57453

danepitkin reviewed May 16, 2024

View reviewed changes

paleolimbot added 3 commits May 17, 2024 08:27

Builder -> Converter

7a2fb86

to_column -> convert

408e2be

change names

5ee1fc6

paleolimbot added 3 commits May 17, 2024 10:14

remove usage of "column"

e1d1690

force kwarg for nulls, mark experimental

b881976

fix doctests

62c85fd

danepitkin reviewed May 17, 2024

View reviewed changes

paleolimbot added 5 commits May 17, 2024 14:44

less confusing converter names

d2730a2

convert -> to_pysequence

ca0d5eb

a few more refs

78e666b

use arrayviewvisitor instead of arraystream visitor

4c9d2d2

fix some references

3412b0f

danepitkin approved these changes May 17, 2024

View reviewed changes

one more rename

9d27bd1

paleolimbot merged commit d1b9924 into apache:main May 17, 2024
6 checks passed

paleolimbot deleted the python-column-buffers branch May 17, 2024 20:14

paleolimbot added this to the nanoarrow 0.5.0 milestone May 22, 2024

		return handle


		def nulls_debug() -> Callable[[CBuffer, Sequence], Tuple[CBuffer, Sequence]]:

		schema : schema-like, optional
		An optional schema, passed to :func:`c_array_stream`.

		if len(is_valid) > 0:
		raise ValueError("Null present with null_handler=nulls_forbid()")

		The default sentinel value will result in ``nan`` assigned to null
		values in numeric and boolean outputs.

feat(python): Add column-wise buffer builder #464

feat(python): Add column-wise buffer builder #464

Conversation

paleolimbot commented May 13, 2024 • edited Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danepitkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paleolimbot May 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paleolimbot commented May 17, 2024

danepitkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danepitkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paleolimbot commented May 13, 2024 •

edited

Loading

paleolimbot May 16, 2024 •

edited

Loading