GH-38325: [Python] Expand the Arrow PyCapsule Interface with C Device Data support #40708

jorisvandenbossche · 2024-03-21T13:59:29Z

Rationale for this change

We defined a protocol exposing the C Data Interface (schema, array and stream) in Python through PyCapsule objects and dunder methods __arrow_c_schema/array/stream__ (#35531 / #37797).

We also expanded the C Data Interface with device capabilities: https://arrow.apache.org/docs/dev/format/CDeviceDataInterface.html (#34972).

This expands the Python exposure of the interface with support for the newer Device structs.

What changes are included in this PR?

Update the specification to defined two additional dunders:

__arrow_c_device_array__ returns a pair of PyCapsules containing a C ArrowSchema and ArrowDeviceArray, where the latter uses "arrow_device_array" for the capsule name
__arrow_c_device_stream__ returns a PyCapsule containing a C ArrowDeviceArrayStream, where the capsule must have a name of "arrow_device_array_stream"

Are these changes tested?

Spec-only change

GitHub Issue: [Python] Expose the device interface through the Arrow PyCapsule protocol #38325

…Device Data support

github-actions · 2024-03-21T13:59:56Z

⚠️ GitHub issue #38325 has been automatically assigned in GitHub to PR creator.

pitrou · 2024-03-21T15:09:54Z

docs/source/format/CDataInterface/PyCapsuleInterface.rst

+method on those objects, which works the same as ``__arrow_c_array__`` except
+for returning a ArrowDeviceArray structure instead of a ArrowArray structure:
+
+.. py:method:: __arrow_c_device_array__(self, requested_schema: object | None = None) -> Tuple[object, object]


I see we already did it above, but it's not useful to add machine-oriented type annotations to a human-readable doc. The parameter types are described explicitly below.

The HTML rendering is not terrible but it's not great either, as the signature looks crowded: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html#arrowarray-export

cc @wjones127

I'm fine removing them. In general, I like having the type hints as they make signatures easy to understand from my human perspective. They can be less ambiguous than a description of a type. But given we don't have a PyCapsule type, I agree they don't add much value here.

We also still have the type hints version in the https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html#protocol-typehints section a bit below

Will remove them here.

pitrou · 2024-03-21T15:12:47Z

I don't think this is controversial, but perhaps this can be publicized on the ML to entice more feedback?

It could perhaps even be publicized towards the DLPack and Numba developers / communities.

paleolimbot

This looks great to me and I look forward to implementing in nanoarrow!

docs/source/format/CDataInterface/PyCapsuleInterface.rst

pitrou · 2024-03-26T14:25:14Z

docs/source/format/CDataInterface/PyCapsuleInterface.rst

+protocol (e.g. only add ``__arrow_c_device_array__``, and not add ``__arrow_c_array__``).
+Libraries that have data structures that can live both on CPU or non-CPU devices
+can implement both versions of the protocol (in that case, the standard methods
+should raise an error when trying to export non-CPU data).


Hmm, perhaps we should leave that up to the producer for the time being, instead of making a recommendation?

On the other hand, for a CPU-only consumer that only checks the __arrow_c_array__ version, it would be good that this consumer can be ensured the pointers it is going to interpret are valid for CPU memory? How can this consumer otherwise avoid segfaults if getting passed a non-CPU object?

I mean the producer should be allowed to make a CPU copy.

pitrou · 2024-03-26T14:25:57Z

docs/source/format/CDataInterface/PyCapsuleInterface.rst

+non-CPU memory, it is recommeded to _only_ implement the device version of the
+protocol (e.g. only add ``__arrow_c_device_array__``, and not add ``__arrow_c_array__``).
+Libraries that have data structures that can live both on CPU or non-CPU devices
+can implement both versions of the protocol (in that case, the standard methods


"standard" is a bit misleading as all methods are part of the spec. Perhaps "CPU-only"?

Yes, I did "define" the standard methods as the CPU only ones in the second paragraph listing the two sets of methods, in the idea to not have to constantly repeat CPU-only (eg also the paragraph just above (the third paragraph) uses that terminology).
But maybe that's not helping, and I can certainly also just consistently use "CPU-only" and "device-aware" to refer to the different versions of the methods.

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

…e-spec

paleolimbot

I like these changes!

docs/source/format/CDataInterface/PyCapsuleInterface.rst

Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>

docs/source/format/CDataInterface/PyCapsuleInterface.rst

pitrou

This LGTM, modulo minor nits already pointed to by @zeroshade . Thank you!

zeroshade

LGTM after my previous comments are addressed

Co-authored-by: Matt Topol <zotthewizard@gmail.com>

…e-spec

jorisvandenbossche · 2024-06-25T09:18:14Z

@github-actions crossbow submit preview-docs

github-actions · 2024-06-25T09:20:27Z

Revision: 182910e

Submitted crossbow builds: ursacomputing/crossbow @ actions-add7f4281d

Task	Status
preview-docs

jorisvandenbossche · 2024-06-25T10:32:31Z

There rendered changes look good to me: http://crossbow.voltrondata.com/pr_docs/40708/format/CDataInterface/PyCapsuleInterface.html

…yArrow (#40717) ### Rationale for this change PyArrow implementation for the specification additions being proposed in #40708 ### What changes are included in this PR? New `__arrow_c_device_array__` method to `pyarrow.Array` and `pyarrow.RecordBatch`, and support in the `pyarrow.array(..)`, `pyarrow.record_batch(..)` and `pyarrow.table(..)` functions to consume objects that have those methods. ### Are these changes tested? Yes (for CPU only for now, #40385 is a prerequisite to test this for CUDA) * GitHub Issue: #38325

conbench-apache-arrow · 2024-06-26T17:39:29Z

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 9dec272.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them.

…Device Data support (apache#40708) ### Rationale for this change We defined a protocol exposing the C Data Interface (schema, array and stream) in Python through PyCapsule objects and dunder methods `__arrow_c_schema/array/stream__` (apache#35531 / apache#37797). We also expanded the C Data Interface with device capabilities: https://arrow.apache.org/docs/dev/format/CDeviceDataInterface.html (apache#34972). This expands the Python exposure of the interface with support for the newer Device structs. ### What changes are included in this PR? Update the specification to defined two additional dunders: * `__arrow_c_device_array__` returns a pair of PyCapsules containing a C ArrowSchema and ArrowDeviceArray, where the latter uses "arrow_device_array" for the capsule name * `__arrow_c_device_stream__` returns a PyCapsule containing a C ArrowDeviceArrayStream, where the capsule must have a name of "arrow_device_array_stream" ### Are these changes tested? Spec-only change * GitHub Issue: apache#38325 Lead-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Dewey Dunnington <dewey@dunnington.ca> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

…a in PyArrow (apache#40717) ### Rationale for this change PyArrow implementation for the specification additions being proposed in apache#40708 ### What changes are included in this PR? New `__arrow_c_device_array__` method to `pyarrow.Array` and `pyarrow.RecordBatch`, and support in the `pyarrow.array(..)`, `pyarrow.record_batch(..)` and `pyarrow.table(..)` functions to consume objects that have those methods. ### Are these changes tested? Yes (for CPU only for now, apache#40385 is a prerequisite to test this for CUDA) * GitHub Issue: apache#38325

apacheGH-38325: [Python] Expand the Arrow PyCapsule Interface with C …

6c2eaa6

…Device Data support

github-actions bot added Component: Documentation awaiting committer review Awaiting committer review labels Mar 21, 2024

jorisvandenbossche mentioned this pull request Mar 21, 2024

[Python] Expose the device interface through the Arrow PyCapsule protocol #38325

Closed

pitrou reviewed Mar 21, 2024

View reviewed changes

paleolimbot approved these changes Mar 21, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review awaiting merge Awaiting merge labels Mar 21, 2024

jorisvandenbossche mentioned this pull request Mar 21, 2024

GH-38325: [Python] Implement PyCapsule interface for Device data in PyArrow #40717

Merged

add section on guidelines around device support

4e6f140

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 26, 2024

pitrou reviewed Mar 26, 2024

View reviewed changes

docs/source/format/CDataInterface/PyCapsuleInterface.rst Outdated Show resolved Hide resolved

pitrou reviewed Mar 26, 2024

View reviewed changes

docs/source/format/CDataInterface/PyCapsuleInterface.rst Outdated Show resolved Hide resolved

pitrou reviewed Mar 26, 2024

View reviewed changes

docs/source/format/CDataInterface/PyCapsuleInterface.rst Outdated Show resolved Hide resolved

pitrou reviewed Mar 26, 2024

View reviewed changes

docs/source/format/CDataInterface/PyCapsuleInterface.rst Outdated Show resolved Hide resolved

pitrou reviewed Mar 26, 2024

View reviewed changes

docs/source/format/CDataInterface/PyCapsuleInterface.rst Outdated Show resolved Hide resolved

pitrou reviewed Mar 26, 2024

View reviewed changes

docs/source/format/CDataInterface/PyCapsuleInterface.rst Outdated Show resolved Hide resolved

pitrou reviewed Mar 26, 2024

View reviewed changes

Apply suggestions from code review

be14cf7

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 26, 2024

jorisvandenbossche added 2 commits March 26, 2024 18:06

Merge remote-tracking branch 'upstream/main' into 38325-capsule-devic…

3235c7c

…e-spec

remove type hints in main spec

5ac1858

github-actions bot removed the awaiting changes Awaiting changes label Mar 26, 2024

jorisvandenbossche added 3 commits June 20, 2024 17:33

Merge remote-tracking branch 'upstream/main' into 38325-capsule-devic…

e40e3ec

…e-spec

add kwargs handling for potential future keywords

0e94a00

clarify CPU-only version of the protocol in case of non-CPU data

4c5c8b5

paleolimbot reviewed Jun 21, 2024

View reviewed changes

docs/source/format/CDataInterface/PyCapsuleInterface.rst Outdated Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 21, 2024

Update docs/source/format/CDataInterface/PyCapsuleInterface.rst

4949f88

Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 21, 2024

zeroshade reviewed Jun 21, 2024

View reviewed changes

docs/source/format/CDataInterface/PyCapsuleInterface.rst Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 21, 2024

zeroshade reviewed Jun 21, 2024

View reviewed changes

docs/source/format/CDataInterface/PyCapsuleInterface.rst Outdated Show resolved Hide resolved

pitrou approved these changes Jun 24, 2024

View reviewed changes

zeroshade approved these changes Jun 24, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Jun 24, 2024

jorisvandenbossche and others added 3 commits June 25, 2024 11:11

Update docs/source/format/CDataInterface/PyCapsuleInterface.rst

fded4d4

Co-authored-by: Matt Topol <zotthewizard@gmail.com>

fix typo and use CPU-only instead of standard

a1ae082

Merge remote-tracking branch 'upstream/main' into 38325-capsule-devic…

182910e

…e-spec

jorisvandenbossche merged commit 9dec272 into apache:main Jun 26, 2024
7 checks passed

jorisvandenbossche removed the awaiting merge Awaiting merge label Jun 26, 2024

jorisvandenbossche deleted the 38325-capsule-device-spec branch June 26, 2024 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-38325: [Python] Expand the Arrow PyCapsule Interface with C Device Data support #40708

GH-38325: [Python] Expand the Arrow PyCapsule Interface with C Device Data support #40708

jorisvandenbossche commented Mar 21, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Mar 21, 2024

pitrou Mar 21, 2024

pitrou Mar 21, 2024

wjones127 Mar 21, 2024

jorisvandenbossche Mar 26, 2024

pitrou commented Mar 21, 2024

paleolimbot left a comment

pitrou Mar 26, 2024

jorisvandenbossche Mar 26, 2024 •

edited

Loading

pitrou Mar 26, 2024

pitrou Mar 26, 2024

jorisvandenbossche Mar 26, 2024

paleolimbot left a comment

pitrou left a comment

zeroshade left a comment

jorisvandenbossche commented Jun 25, 2024

github-actions bot commented Jun 25, 2024

jorisvandenbossche commented Jun 25, 2024

conbench-apache-arrow bot commented Jun 26, 2024

GH-38325: [Python] Expand the Arrow PyCapsule Interface with C Device Data support #40708

GH-38325: [Python] Expand the Arrow PyCapsule Interface with C Device Data support #40708

Conversation

jorisvandenbossche commented Mar 21, 2024 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

github-actions bot commented Mar 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou commented Mar 21, 2024

paleolimbot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Mar 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paleolimbot left a comment

Choose a reason for hiding this comment

pitrou left a comment

Choose a reason for hiding this comment

zeroshade left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Jun 25, 2024

github-actions bot commented Jun 25, 2024

jorisvandenbossche commented Jun 25, 2024

conbench-apache-arrow bot commented Jun 26, 2024

jorisvandenbossche commented Mar 21, 2024 •

edited by github-actions bot

Loading

jorisvandenbossche Mar 26, 2024 •

edited

Loading