Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38718: [Go][Format][Integration] Add StringView/BinaryView to Go implementation #35769

Merged
merged 38 commits into from
Nov 14, 2023

Conversation

zeroshade
Copy link
Member

@zeroshade zeroshade commented May 25, 2023

Rationale for this change

See #35628 for the rationale and description of the StringView/BinaryView array types.

This change is adding Go as a second implementation of it.

What changes are included in this PR?

Add Array Types for StringView and BinaryView along with StringViewType and BinaryViewType and necessary enums and builders. These arrays can be round tripped through JSON and IPC.

Are these changes tested?

Yes, unit tests have been added and integration tests run

@zeroshade zeroshade requested review from lidavidm and bkietz May 25, 2023 17:58
@github-actions
Copy link

go/arrow/datatype_binary.go Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels May 25, 2023
go/arrow/datatype_binary.go Outdated Show resolved Hide resolved
Comment on lines +220 to +93
func (*BinaryViewType) binary() {}
func (*BinaryViewType) view() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convenience interfaces. in datatype.go there's BinaryDataType which defines an interface adding IsUtf8() bool which requires a binary() method so that you can easily write code that works for all of these binary interfaces via type switches or interfaces.

Then I added this view() method for a BinaryViewDataType so you can write code targeting both BinaryView and StringView without needing to duplicate code by just using the interface in the type switch.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 26, 2023
@zeroshade zeroshade marked this pull request as ready for review June 15, 2023 16:45
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 15, 2023
@zeroshade
Copy link
Member Author

CC @felipecrv @benibus

Copy link
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting optimization. I'm not entirely sure why these new types of builders/arrays are named XXXView, considering that, from what I understand, they store small strings inline in a fixed width data array and encode the index/offset for longer strings, with the remainder of the string being stored externally. However, I don't have a better name to suggest at the moment :)

In general, as a user of this library, I would find it useful to add more comments to methods whose usage isn't immediately clear from the name.

return &a.values[a.array.data.offset+i]
}

func (a *BinaryView) Value(i int) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no compiler guarantee in Go that the slice returned by this method will remain unchanged (whether intentionally or unintentionally) by users of this API, I would suggest adding a comment to this method. This comment should clearly specify that it's unsafe to make any kind of changes to the returned slice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a concern present everywhere in Arrow though, so a comment here could be understood as implying that places without a comment like this allow buffers to be mutated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a reflex from my Rust experience where this slice will be immutable. In the context of Go, I agree with you that sporadically adding a comment might be counterproductive if we do not apply this globally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the top level, there do already exist comments that state that it is intended that all Arrow Arrays be immutable. I agree with the concern that adding a comment here specifically could be counterproductive. If you can think of a good place to put such a comment that would be more universal, I'd be more than happy to do so.

go/arrow/array/binarybuilder.go Show resolved Hide resolved
go/arrow/array/binary.go Show resolved Hide resolved
Copy link
Contributor

@candiduslynx candiduslynx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a first round

dev/archery/archery/integration/datagen.py Outdated Show resolved Hide resolved
format/Schema.fbs Outdated Show resolved Hide resolved
go/arrow/datatype.go Outdated Show resolved Hide resolved
go/arrow/datatype_binary.go Outdated Show resolved Hide resolved
go/arrow/datatype_binary.go Outdated Show resolved Hide resolved
func (sh *StringHeader) InlineData() (data string) {
debug.Assert(sh.IsInline(), "calling InlineData on non-inline StringHeader")

h := (*reflect.StringHeader)(unsafe.Pointer(&data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any diff with go1.19 implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea. The Len in the StringHeader for tinygo is a uintptr instead of an int which is what go1.19 has. Compare line 33 of each file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how sad, but now I see why it's warranted.
However, a comment to note the diff for those implementations would be nice to have (not required, but nice to have nonetheless).

go/arrow/internal/flatbuf/BinaryView.go Outdated Show resolved Hide resolved
go/arrow/type_traits_string_view.go Outdated Show resolved Hide resolved
go/arrow/array/binary.go Show resolved Hide resolved
go/arrow/array/binary.go Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Jun 21, 2023
Co-authored-by: Alex Shcherbakov <candiduslynx@users.noreply.github.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 30, 2023
go/arrow/datatype_viewheader.go Outdated Show resolved Hide resolved
zeroshade and others added 2 commits November 13, 2023 12:01
Co-authored-by: Alex Shcherbakov <candiduslynx@users.noreply.github.com>
go/arrow/array/binary.go Outdated Show resolved Hide resolved
go/arrow/array/binary.go Show resolved Hide resolved
go/arrow/array/binary.go Show resolved Hide resolved
go/arrow/array/bufferbuilder.go Outdated Show resolved Hide resolved
go/arrow/datatype_viewheader.go Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Nov 13, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Nov 13, 2023
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Nov 14, 2023
@bkietz bkietz changed the title GH-35627: [Go][Format][Integration] Add StringView/BinaryView to Go implementation GH-38718: [Go][Format][Integration] Add StringView/BinaryView to Go implementation Nov 14, 2023
@bkietz bkietz merged commit 26149d9 into apache:main Nov 14, 2023
28 of 29 checks passed
@bkietz bkietz removed the awaiting merge Awaiting merge label Nov 14, 2023
@zeroshade zeroshade deleted the string-view branch November 14, 2023 20:55
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 26149d9.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…o Go implementation (apache#35769)

### Rationale for this change
See apache#35628 for the rationale and description of the StringView/BinaryView array types.

This change is adding Go as a second implementation of it.

### What changes are included in this PR?

Add Array Types for `StringView` and `BinaryView` along with `StringViewType` and `BinaryViewType` and necessary enums and builders. These arrays can be round tripped through JSON and IPC.

### Are these changes tested?
Yes, unit tests have been added and integration tests run

* Closes: [apache#38718](apache#38718)
* Closes: apache#38718

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Alex Shcherbakov <candiduslynx@users.noreply.github.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Go][Integration] Add support for Utf8View to the go implementation
6 participants