Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Improve syntax for implementing ArrowArrayStream from C++ #336

Merged
merged 8 commits into from
Dec 19, 2023

Conversation

paleolimbot
Copy link
Member

An early commit implemented nanoarrow::EmptyArrayStream and nanoarrow::VectorArrayStream in the nanoarrow.hpp C++ helpers. The intention at the time was to make it "easy"/idiomatic to use a C++ object to back an ArrowArrayStream; however, it was in practice difficult to actually make work (I tried briefly and gave up when writing a dummy ADBC driver for this blog post.

This PR deprecates the original syntax and migrates it to the following.

Implementation:

class StreamImpl {
 public:
  // Public methods (e.g., constructor) used from C++ to initialize relevant data

  // Idiomatic exporter to move data + lifecycle responsibility to an instance
  // managed by the ArrowArrayStream callbacks
  void ToArrayStream(struct ArrowArrayStream* out) {
    ArrayStreamFactory<StreamImpl>::InitArrayStream(new StreamImpl(...), out);
  }

 private:
  // Make relevant methods available to the ArrayStreamFactory
  friend class ArrayStreamFactory<StreamImpl>;

  // Method implementations (called from C, not normally interacted with from C++)
  int GetSchema(struct ArrowSchema* schema) { return ENOTSUP; }
  int GetNext(struct ArrowArray* array) { return ENOTSUP; }
  const char* GetLastError() { nullptr; }
};

Usage:

// Call constructor and/or public methods to initialize relevant data
StreamImpl impl;

// Export to ArrowArrayStream after data are finalized
UniqueArrayStream stream;
impl.ToArrayStream(stream.get());

I'm open to suggestions on how to make that better! It might be that the ToArrayStream() bit is confusing (i.e., just use ArrayStreamFactory<>::InitArrayStream(new StreamImpl()) directory) but it also seemed better to keep the lines where a raw pointer was floating around to be entirely contained within the StreamImpl class.

It also fixes an issue with the XXX_pointer() functions, which because of the way they were declared, required that nanoarrow.hpp be confusingly included after nanoarrow_ipc.hpp. The correct way to do this (I think) was to declare the template and add implementations (rather than use overloads).

@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ef92588) 88.04% compared to head (912c9df) 88.07%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #336      +/-   ##
==========================================
+ Coverage   88.04%   88.07%   +0.02%     
==========================================
  Files          72       72              
  Lines       11370    11390      +20     
==========================================
+ Hits        10011    10032      +21     
+ Misses       1359     1358       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@paleolimbot paleolimbot marked this pull request as ready for review December 13, 2023 19:17
@paleolimbot paleolimbot merged commit 3e02822 into apache:main Dec 19, 2023
27 checks passed
@paleolimbot paleolimbot deleted the hpp-better branch December 19, 2023 16:41
@paleolimbot paleolimbot added this to the nanoarrow 0.4.0 milestone Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants