-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add to_arrow_device
function to cudf interop using nanoarrow
#15047
Add to_arrow_device
function to cudf interop using nanoarrow
#15047
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nanoarrow notes to take or leave!
cpp/src/interop/to_arrow_device.cu
Outdated
ArrowArrayInitFromType(tmp.get(), NANOARROW_TYPE_STRUCT); | ||
|
||
ArrowArrayAllocateChildren(tmp.get(), table.num_columns()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may also want to use ArrowArrayInitFromSchema()
, which will initialize the whole tree of arrays up front. This might simplify some later code and lets ArrowArrayFinish()
with a validation level do its thing. I might be slower (I haven't checked).
JFYI I'll be on vacation for the next week or so. Feel free to proceed without my input, just letting y'all know that I will catch up when I get back. |
Co-authored-by: Jake Hemstad <jhemstad@nvidia.com>
@zeroshade thank you for building out this proposal. We will review and provide feedback on the draft. 🙏 |
Would it be possible to get some more reviews here? I've been attempting to keep this up to date but I haven't dug too much further into building this until I get some confirmation that everyone is okay with this approach and that it is worthwhile to continue in this pattern versus refactoring or changing the approach. |
Thank you @zeroshade for keeping this up to date. At the moment we are juggling GTC-related activities so please excuse the delay in reviews. One piece of feedback from the team is that we would prefer to pull in the nanoarrow headers at build time rather than adding the file to the source tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the slow response. I came back from vacation to a long backlog and too much to juggle, and I didn't want to give this PR a cursory review. I spent a decent chunk of my review reading the nanoarrow API docs to make sure I understood the code here so I may have missed some things in the code, but overall I think this PR is a great start! The approach looks good and I think we should move forward.
I have a couple of bigger picture questions:
- I'm wondering how we want this to interoperate with the existing arrow interop in libcudf. In order to get libcudf working more directly off the C Data interface, do you think it would be possible to rewrite our existing
to_arrow
as basically (pseudocode)cudaMemcpy(to_arrow_device(table), DtoH)
? - Possibly also a question for @paleolimbot: is there any plan for working with scalars? Scalars are a bit of thorn in our side right now, especially in the Python/C++ interop layer because of how many edge cases run into issues with different pyarrow APIs. Do you foresee a way to rewrite our scalar interop in terms of the C Data interface/nanoarrow functions coming at some point? I would absolutely settle for a solution that boils down to "python scalar->python array->C++ array->C++ scalar" and in reverse (since I don't imagine there are imminent plans to add scalars directly to the interface), but a big part of why I wrote Enable direct ingestion and production of Arrow scalars #14121 was because even that first step "python scalar->python array" was lossy and sometimes raised errors in pyarrow. I know there are some nascent nanoarrow Python bindings, so maybe that's what we want to move towards in the long term.
As per Greg's comment, I think it would be good to get the nanoarrow files at build time. We typically do that with CPM.cmake via rapids-cmake. I'm not sure how familiar you are with that approach, but there are some good examples in rapids-cmake. The rmm and gtest are good starting points that don't have too many additional complexities baked in. That said, if that CMake is unfamiliar to you don't worry about it, we can change the vendoring strategies at a later point once the PR is more mature.
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
Yup unrelated. We pushed new CI images containing a conda update that removed a CLI option we relied on. We've temporarily downgraded and rebuilt our images to unblock things, but also the PR David linked is the long-term fix so that we can upgrade again.
That's a fair question. When the contained type is itself easily brace-initialized like in this case I believe you can get the same rvalue-like behavior by adding a second set of braces, which I believe will invoke the initializer list constructor of a |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the hard work here @zeroshade! I'm approving assuming you don't plan to touch from_arrow_device
in this PR (which I'm totally on board with, I think this PR is large and complicated enough as is).
@vyasr yea, I plan on trying to tackle |
Just a heads-up, NVIDIA employees in the US are off today and tomorrow so you probably won't see the next review until Monday. |
Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
/ok to test |
Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
/ok to test |
/merge |
Thanks @zeroshade! |
This appears to have broken libcudf static library builds. See #15434 |
Fixes debug build failures resulting from changes from #15047. Here are some of the errors reported by the compiler: ``` Building CXX object tests/CMakeFiles/INTEROP_TEST.dir/interop/to_arrow_device_test.cpp.o FAILED: tests/CMakeFiles/INTEROP_TEST.dir/interop/to_arrow_device_test.cpp.o /usr/local/bin/g++ -DFMT_HEADER_ONLY=1 -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE -DNANOARROW_DEBUG -DSPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_INFO -DSPDLOG_FMT_EXTERNAL -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_DISABLE_ABI_NAMESPACE -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DTHRUST_IGNORE_ABI_NAMESPACE_ERROR -I/cudf/cpp -I/cudf/cpp/src -I/cudf/cpp/build/_deps/dlpack-src/include -I/cudf/cpp/build/_deps/jitify-src -I/cudf/cpp/include -I/cudf/cpp/build/include -I/cudf/cpp/build/_deps/cccl-src/thrust/thrust/cmake/../.. -I/cudf/cpp/build/_deps/cccl-src/libcudacxx/lib/cmake/libcudacxx/../../../include -I/cudf/cpp/build/_deps/cccl-src/cub/cub/cmake/../.. -I/cudf/cpp/build/_deps/nvtx3-src/c/include -I/cudf/cpp/build/_deps/nanoarrow-src/src -I/cudf/cpp/build/_deps/nanoarrow-build/generated -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /conda/envs/rapids/include -fdiagnostics-color=always -I/conda/envs/rapids/targets/x86_64-linux/include -L/conda/envs/rapids/targets/x86_64-linux/lib -L/conda/envs/rapids/targets/x86_64-linux/lib/stubs -g -std=gnu++17 -fPIE -Wall -Werror -Wno-unknown-pragmas -Wno-error=deprecated-declarations -pthread -MD -MT tests/CMakeFiles/INTEROP_TEST.dir/interop/to_arrow_device_test.cpp.o -MF tests/CMakeFiles/INTEROP_TEST.dir/interop/to_arrow_device_test.cpp.o.d -o tests/CMakeFiles/INTEROP_TEST.dir/interop/to_arrow_device_test.cpp.o -c /cudf/cpp/tests/interop/to_arrow_device_test.cpp In file included from /cudf/cpp/tests/interop/to_arrow_device_test.cpp:17: /cudf/cpp/tests/interop/nanoarrow_utils.hpp: In function 'void populate_list_from_col(ArrowArray*, cudf::lists_column_view)': /cudf/cpp/tests/interop/nanoarrow_utils.hpp:220:26: error: ignoring return value of 'ArrowErrorCode ArrowBufferSetAllocator(ArrowBuffer*, ArrowBufferAllocator)' declared with attribute 'warn_unused_result' [-Werror=unused-result] 220 | ArrowBufferSetAllocator(ArrowArrayBuffer(arr, 0), noop_alloc); | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /cudf/cpp/tests/interop/nanoarrow_utils.hpp:224:26: error: ignoring return value of 'ArrowErrorCode ArrowBufferSetAllocator(ArrowBuffer*, ArrowBufferAllocator)' declared with attribute 'warn_unused_result' [-Werror=unused-result] 224 | ArrowBufferSetAllocator(ArrowArrayBuffer(arr, 1), noop_alloc); | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /cudf/cpp/tests/interop/to_arrow_device_test.cpp: In member function 'void BaseArrowFixture::compare_arrays(const ArrowSchema*, const ArrowArray*, const ArrowArray*)': /cudf/cpp/tests/interop/to_arrow_device_test.cpp:268:24: error: ignoring return value of 'ArrowErrorCode cudfArrowSchemaViewInit(ArrowSchemaView*, const ArrowSchema*, ArrowError*)' declared with attribute 'warn_unused_result' [-Werror=unused-result] 268 | ArrowSchemaViewInit(&schema_view, schema, nullptr); /cudf/cpp/tests/interop/to_arrow_device_test.cpp: In member function 'virtual void ToArrowDeviceTest_DateTimeTable_Test::TestBody()': /cudf/cpp/tests/interop/to_arrow_device_test.cpp:353:27: error: ignoring return value of 'ArrowErrorCode cudfArrowSchemaSetTypeStruct(ArrowSchema*, int64_t)' declared with attribute 'warn_unused_result' [-Werror=unused-result] 353 | ArrowSchemaSetTypeStruct(expected_schema.get(), 1); /cudf/cpp/tests/interop/to_arrow_device_test.cpp:355:29: error: ignoring return value of 'ArrowErrorCode cudfArrowSchemaSetTypeDateTime(ArrowSchema*, ArrowType, ArrowTimeUnit, const char*)' declared with attribute 'warn_unused_result' [-Werror=unused-result] (many more) ``` Warning are turned into errors so the build fails. Fix simply adds the `NANOARROW_THROW_NOT_OK` to the offending calls. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Vyas Ramasubramani (https://github.com/vyasr) - Mike Wilson (https://github.com/hyperbolic2346) URL: #15463
) Adding a corresponding `from_arrow_device` function following up from #15047. This continues the work towards addressing #14926. Authors: - Matt Topol (https://github.com/zeroshade) - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) URL: #15458
Description
Introduce new
to_arrow_device
andto_arrow_schema
functions to utilize theArrowDeviceArray
structure for zero-copy passing of libcudf::table.Add nanoarrow as a vendored lib and a script to update it.
Initial step towards addressing #14926
Checklist