-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Implement bindings to IPC writer #586
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Added some minor comments/questions, didn't go through everything, but already posting what I have for today
@@ -542,6 +542,29 @@ def __iter__(self): | |||
"to iterate over elements of this Array" | |||
) | |||
|
|||
def serialize(self, dst=None) -> Union[bytes, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose using bytes | None
is only allowed if not supporting older versions of python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Optional
is OK but |
is 3.10 or higher (we should be using Optional everywhere but I didn't know about it when I started writing type hints. At some point we can go back and change them all).
python/src/nanoarrow/ipc.py
Outdated
@@ -236,9 +243,198 @@ def __repr__(self) -> str: | |||
return f"<{class_label} <invalid>>" | |||
|
|||
|
|||
class Writer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a class docstring for Writer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to understand a bit the different classes here. The already existing Stream
object is essentially the "reader" equivalent?
And for this writer, does that only support the stream format or also the file format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stream
is a bit of a compromise...we can't just "subclass a record batch reader" like is done in Arrow C++/Pyarrow, but we need a place to put the code that deals with file IO...so the stream is sort of an internal stepping stone to eliminate anything dealing with files in nanoarrow.array_stream.ArrayStream
. It should have a better name now that its purpose is more clear.
Writer
is definitely a user-facing concept and should probably be a StreamWriter
(since it doesn't do files)
self._writer.write_end_of_stream() | ||
self.release() | ||
|
||
def write_stream(self, obj, schema=None, *, write_schema=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could in theory also be a write_array
(or maybe rather write_batch
) method that takes a single array to write to the stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be! An array can be implicitly converted to a stream and will work here, but I'll add one for symmetry (and some day it might be faster!)
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
This PR implements bindings to the IPC writer in the nanoarrow C library. This adds:
ipc.StreamWriter()
class roughly mirroring pyarrow'sipc.Stream()
Schema.serialize()
andArray.serialize()
to match pyarrow'sserialize()
methods.