Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37582: [Go][Parquet] Implement Float16 logical type #37599

Merged
merged 11 commits into from
Nov 13, 2023

Conversation

benibus
Copy link
Collaborator

@benibus benibus commented Sep 6, 2023

Rationale for this change

There is an active proposal for a Float16 logical type in Parquet (apache/parquet-format#184) with C++/Python implementations in progress (#36073), so we should add one for Go as well.

What changes are included in this PR?

  • Adds LogicalType definitions and methods for Float16
  • Adds support for Float16 column statistics and comparators
  • Adds support for interchange between Parquet and Arrow's half-precision float

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

@github-actions
Copy link

github-actions bot commented Sep 6, 2023

⚠️ GitHub issue #37582 has been automatically assigned in GitHub to PR creator.

Copy link
Collaborator Author

@benibus benibus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should cover the base requirements, but LMK if I missed anything. There's a bit of weird special-casing involved with the column statistics since float16 requires an independent Statistics type despite not being physical - but maybe there's a better way.

go/parquet/metadata/statistics_types.tmpldata Outdated Show resolved Hide resolved
for {
f16 := float16.FromBits(uint16(r.Uint64n(math.MaxUint16 + 1)))
f64 := float64(f16.Float32())
if !math.IsNaN(f64) && !math.IsInf(f64, 0) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other randFloat functions don't exclude infinities. Should we change that? The checks for approximate equality will fail when comparing infinities of the same sign.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to leave the NaN's and infinities to ensure we're covering those cases for testing. Alternately we could just make sure we test those cases separately

@benibus benibus marked this pull request as ready for review September 14, 2023 18:21
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 14, 2023
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great, just a few nitpicks.

go/arrow/float16/float16.go Show resolved Hide resolved
go/parquet/file/column_writer_types.gen.go.tmpl Outdated Show resolved Hide resolved
go/parquet/file/column_writer_types.gen.go.tmpl Outdated Show resolved Hide resolved
for {
f16 := float16.FromBits(uint16(r.Uint64n(math.MaxUint16 + 1)))
f64 := float64(f16.Float32())
if !math.IsNaN(f64) && !math.IsInf(f64, 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to leave the NaN's and infinities to ensure we're covering those cases for testing. Alternately we could just make sure we test those cases separately

go/arrow/float16/float16.go Show resolved Hide resolved
go/parquet/schema/reflection_test.go Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 27, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 18, 2023
@benibus
Copy link
Collaborator Author

benibus commented Oct 18, 2023

Note that the updated parquet.thrift file that this uses is currently contained in the C++ PR #36073, which will need to be synced with apache/parquet-format#184 once it's finalized.

So, fair warning: regenerating the thrift files from this branch alone will basically break everything (for now).

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 18, 2023
@zeroshade
Copy link
Member

Overall this is looking good to me, though I won't merge this until after the format PR with the thrift changes gets finalized and merged.

@zeroshade
Copy link
Member

@benibus the fix for the failing CIs has been merged, can you rebase and update this so that we can get a clean CI on it? I believe the float16 format has been merged so once you rebase and we have all green i'll be happy to merge this!

@benibus
Copy link
Collaborator Author

benibus commented Oct 31, 2023

Rebased. A couple CI failures still, but they look unrelated.

@zeroshade
Copy link
Member

Can you please resolve the conflicts (and potentially rebase if necessary)? We can check the CI afterwards

@benibus
Copy link
Collaborator Author

benibus commented Nov 13, 2023

Alright, I've fixing the conflicts and rebased again. It looks like everything's green now.

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zeroshade zeroshade merged commit bff5fb9 into apache:main Nov 13, 2023
24 checks passed
@zeroshade zeroshade removed the awaiting change review Awaiting change review label Nov 13, 2023
@github-actions github-actions bot added the awaiting merge Awaiting merge label Nov 13, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…37599)

### Rationale for this change

There is an active proposal for a Float16 logical type in Parquet (apache/parquet-format#184) with C++/Python implementations in progress (apache#36073), so we should add one for Go as well.

### What changes are included in this PR?

- [x] Adds `LogicalType` definitions and methods for `Float16`
- [x] Adds support for `Float16` column statistics and comparators
- [x] Adds support for interchange between Parquet and Arrow's half-precision float

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes

* Closes: apache#37582

Authored-by: benibus <bpharks@gmx.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit bff5fb9.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…37599)

### Rationale for this change

There is an active proposal for a Float16 logical type in Parquet (apache/parquet-format#184) with C++/Python implementations in progress (apache#36073), so we should add one for Go as well.

### What changes are included in this PR?

- [x] Adds `LogicalType` definitions and methods for `Float16`
- [x] Adds support for `Float16` column statistics and comparators
- [x] Adds support for interchange between Parquet and Arrow's half-precision float

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes

* Closes: apache#37582

Authored-by: benibus <bpharks@gmx.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Go][Parquet] Implement Float16 logical type
2 participants