Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document ChunkReader (#4118) #4147

Closed
wants to merge 2 commits into from

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #4118

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

pub trait ChunkReader: Length + Send + Sync {
type T: Read + Send;
/// Get a serially readable slice of the current reader
/// This should fail if the slice exceeds the current bounds
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually true, the FileSource doesn't do this

@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 27, 2023
Comment on lines 61 to 62
/// Systems looking to mask high-IO latency through prefetching, such as encountered with
/// object storage, should consider fetching the relevant byte ranges into [`Bytes`]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning that get_read could possibly read more than length bytes?

Copy link
Contributor Author

@tustvold tustvold Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I more meant handling this outside of get_read, i.e. don't use ChunkReader for these use-cases 😅

Will see if I can't clarify the wording tomorrow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL

Comment on lines +61 to +64
/// Systems looking to mask high-IO latency through prefetching, such as encountered with
/// object storage, should consider instead fetching the relevant parts of the file into
/// [`Bytes`], and then feeding this into the synchronous APIs, instead of implementing
/// [`ChunkReader`] directly. Arrow users can make use of the [async_reader] which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, let me try to clarify this. So this basically means that systems that want to have prefetching, should not rely on this "characteristic" of ChunkReader to implement prefetching but instead implement their synchronous APIs?

Copy link
Contributor Author

@tustvold tustvold Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much, if you want prefetching you can either use async_reader which does it for you, or implement something yourself 😄

One could ask the reasonable question, why does ChunkReader exist then, to which the answer is because I've not removed it yet #1163 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should just remove the length parameter? What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, the length parameter is a bit confusing. Except for a hint, looks like it doesn't do too much here.

@tustvold tustvold marked this pull request as draft April 28, 2023 19:24
@tustvold
Copy link
Contributor Author

Closing in favour of #4156

@tustvold tustvold closed this Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Documentation of Parquet ChunkReader
2 participants