-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): basic export through PyCapsules #320
feat(python): basic export through PyCapsules #320
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #320 +/- ##
==========================================
+ Coverage 86.99% 89.10% +2.11%
==========================================
Files 71 4 -67
Lines 10318 101 -10217
==========================================
- Hits 8976 90 -8886
+ Misses 1342 11 -1331 ☔ View full report in Codecov by Sentry. |
python/src/nanoarrow/_lib.pyx
Outdated
cdef ArrowSchema* c_schema_out = <ArrowSchema*> malloc(sizeof(ArrowSchema)) | ||
c_schema_out.release = NULL | ||
ArrowSchemaDeepCopy(self._ptr, c_schema_out) | ||
|
||
return PyCapsule_New(c_schema_out, 'arrow_schema', &pycapsule_schema_deleter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to reorder slightly for the (unlikely) case where ArrowDeepCopy() != NANOARROW_OK
and/or a future contributor adds code that returns or raises an exception.
malloc(sizeof ArrowSchema)
.release = NULL
PyCapsule_New() + set deleter
result = ArrowSchemaDeepCopy() +
Error.raise()if not
NANOARROW_OK`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I haven't really been handling errors.
""" | ||
if requested_schema is not None: | ||
raise NotImplementedError("requested_schema") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably self._assert_valid()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and we should do that for all three, and will add tests for it as well
python/src/nanoarrow/_lib.pyx
Outdated
# counted and can be released separately | ||
|
||
# shallow copy | ||
cdef ArrowArray* c_array_out = <ArrowArray*> malloc(sizeof(ArrowArray)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be clearest to create the capsule directly after allocating the array and setting release = NULL
(for those of us that aren't as good at knowing what Cython code can and cannot throw exceptions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(because if somebody added code that threw an exception before the capsule was created, the malloc'ed memory here will leak)
python/tests/test_capsules.py
Outdated
@@ -54,6 +54,13 @@ def test_schema_import(): | |||
assert schema.format == "+s" | |||
assert schema._to_string(recursive=True) == "struct<some_name: int32>" | |||
|
|||
# roundtrip | |||
pa_schema2 = pa.schema(schema) | |||
pa_schema2.equals(pa_schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pa_schema2.equals(pa_schema) | |
assert pa_schema2.equals(pa_schema) |
cdef ArrowArrayStream* c_stream_out = <ArrowArrayStream*> malloc( | ||
sizeof(ArrowArrayStream) | ||
) | ||
memcpy(c_stream_out, self._ptr, sizeof(ArrowArrayStream)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it seems trivial, but I think that the allocate -> set release to NULL -> create capsule with deleter is safest (i.e., minimize the number of lines where added code that throws an exception or returns will result in a leak).
Thanks for the feedback! I should have paid more attention to the implementation in pyarrow, because we did something similar there to as early as possible create the capsule, for exactly the reasons that you commented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thank you!
Follow-up on #318
Exports the different objects using different mechanism: