Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GetEnumerator #244

Merged
merged 3 commits into from
Jan 14, 2022
Merged

Fix GetEnumerator #244

merged 3 commits into from
Jan 14, 2022

Conversation

philjdf
Copy link
Contributor

@philjdf philjdf commented Jan 11, 2022

Would close #242

@jgiannuzzi
Copy link
Member

Given the ABI change in this PR, and given that 5.0.0-beta3 has now been out and stable for a while, I would like to release 5.0.0 first, and then release 6.0.0-beta1 immediately after with the upgrade to Arrow 6.0.0 and this PR.

@philjdf @adamreeve @adeboyed what are your thoughts on that?

@adeboyed
Copy link
Contributor

adeboyed commented Jan 12, 2022

Could this PR be included in the 5.0.0 release?

Upgrading to Arrow 6 may potentially lead to extended testing to ensure that the new features are supported and any API changes are made.

@jgiannuzzi
Copy link
Member

Upgrading to Arrow 6 may potentially lead to extended testing to ensure that the new features are supported and any API changes are made.

If you are talking about ParquetSharp itself, then there are no API changes stemming from the upgrade to Arrow 6, and no new features. All the work has already been done in #230 and is just waiting for 5.0.0 stable to be released.
Let me know if you meant something else!

Could this PR be included in the 5.0.0 release?

If you mean 5.0.0 stable, then I would be quite reluctant in having an ABI change included without releasing another beta first. We could however include it in another 5.0.0 beta. This would push the work on the 6.0.0 series further however, which I'm not really keen on, especially as #68 and #229 would be fixed by it.

@jgiannuzzi
Copy link
Member

@adeboyed could you please let me know today if my comment above makes sense to you — if yes, I'll go ahead and release 5.0.0 and 6.0.0-beta1 (@philjdf and @adamreeve already agreed offline with this approach)

@adeboyed
Copy link
Contributor

Hi Jonathan, this approach makes sense to me.
As an aside, we should change the comment here:

In the 5.0.X beta versions, reading nested structures was introduced. However, nesting information about nulls is lost when reading columns with Repetition Level optional inside structs with Repetition Level optional. ParquetSharp does not provide information about whether the column or the enclosing struct is null.
to reflect that this logic has made it into a release version.

@jgiannuzzi
Copy link
Member

Good catch @adeboyed! I made that change in PR #246 and released 5.0.0 stable.

@jgiannuzzi
Copy link
Member

The upgrade to Arrow 6.0.1 has also been merged (PR #247).

@jgiannuzzi jgiannuzzi merged commit 43409ec into G-Research:master Jan 14, 2022
@jgiannuzzi
Copy link
Member

ParquetSharp 6.0.1-beta1 has been released with this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ParquetSharp cannot read arrays for large files.
4 participants