Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove automatic buffering in ipc::reader::FileReader for for consistent buffering #6132

Merged
merged 2 commits into from
Jul 28, 2024

Conversation

V0ldek
Copy link
Contributor

@V0ldek V0ldek commented Jul 26, 2024

Which issue does this PR close?

Closes #6099.

Rationale for this change

Current writer API automatically wraps the supplied std::io::Writer impl into a BufWriter.
It is cleaner and more idiomatic to have the default be using the supplied impl directly, as the user might already have a BufWriter or an impl that doesn't actually benefit from buffering at all.

StreamReader does a similar thing, but it also exposes a try_new_unbuffered that bypasses the internal wrap. These APIs are incosistent between reader/writer and also between Stream and File versions, but there's no strong reason for that.

What changes are included in this PR?

Here we propose a consistent and non-buffered by default API:

  • try_new does not wrap the passed reader/writer,
  • try_new_buffered is a convenience function that does wrap the reader/writer into a BufReader/BufWriter,
  • all four publicly exposed IPC reader/writers follow the above consistently, i.e. StreamReader, FileReader, StreamWriter, FileWriter.

An additional tweak: removed the generic type bounds from struct definitions on the four types, as that is the idiomatic Rust approach (see e.g. stdlib's HashMap that has no bounds on the struct definition, only the impl requires Hash + Eq).

Are there any user-facing changes?

  • [BREAKING] function arrow_ipc::reader::StreamReader<R>::try_new_unbuffered is deprecated to be removed in a future version;
  • [BREAKING] function arrow_ipc::reader::StreamReader<std::io::BufReader<R>>::try_new is renamed to try_new_buffered;
  • function arrow_ipc::reader::FileReader<std::io:::BufReader<R>>::try_new_buffered is added;
  • function arrow_ipc::writer::StreamWriter<std::io::BufWriter<W>>::try_new_buffered is added;
  • function arrow_ipc::writer::FileWriter<std::io::BufWriter<W>>::try_new_buffered is added.

The fact that try_new is now unbuffered is a silent change from the API perspective. A number of rustdocs explicitly referenced the internal buffering and have been reworded accordingly.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 26, 2024
@V0ldek
Copy link
Contributor Author

V0ldek commented Jul 26, 2024

This is a breaking change and should be labelled with api-change, but I can't add it as an external contributor :)

@alamb alamb added the api-change Changes to the arrow API label Jul 27, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @V0ldek -- I think this is a nice improvement. I had some small comment suggestions, but we could potentially do them as follow on PRs

arrow-ipc/src/reader.rs Show resolved Hide resolved
arrow-ipc/src/reader.rs Show resolved Hide resolved
arrow-ipc/src/writer.rs Show resolved Hide resolved
arrow-ipc/src/writer.rs Show resolved Hide resolved
arrow-ipc/src/writer.rs Show resolved Hide resolved
Current writer API automatically wraps the supplied std::io::Writer
impl into a BufWriter.
It is cleaner and more idiomatic to have the default be using the
supplied impl directly, as the user might already have a BufWriter
or an impl that doesn't actually benefit from buffering at all.

StreamReader does a similar thing, but it also exposes a `try_new_unbuffered`
that bypasses the internal wrap.

Here we propose a consistent and non-buffered by default API:
- `try_new` does not wrap the passed reader/writer,
- `try_new_buffered` is a convenience function that does wrap
  the reader/writer into a BufReader/BufWriter,
- all four publicly exposed IPC reader/writers follow the above consistently,
  i.e. `StreamReader`, `FileReader`, `StreamWriter`, `FileWriter`.

Those are breaking changes.

An additional tweak: removed the generic type bounds from struct definitions
on the four types, as that is the idiomatic Rust approach (see e.g. stdlib's
HashMap that has no bounds on the struct definition, only the impl requires
Hash + Eq).

See apache#6099 for the discussion.
Applied a few suggestions, made `Error` sections more consistent.
@V0ldek V0ldek force-pushed the v0ldek/ipc-consistent-reader-writer-api branch from f907e78 to c4f08a9 Compare July 27, 2024 18:43
@alamb alamb changed the title change ipc::reader and writer APIs for consistent buffering Remove automatic buffering in ipc::reader::FileReader for for consistent buffering Jul 28, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @V0ldek -- I think this is a very nice improvement

arrow-ipc/src/reader.rs Show resolved Hide resolved
@alamb alamb merged commit 5f5a82c into apache:master Jul 28, 2024
27 checks passed
@V0ldek V0ldek deleted the v0ldek/ipc-consistent-reader-writer-api branch July 28, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow flushing or non-buffered writes from arrow::ipc::writer::StreamWriter
2 participants