Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ArrowBytesViewMap and ArrowBytesViewSet #11421

Closed

Conversation

XiangpengHao
Copy link
Contributor

@XiangpengHao XiangpengHao commented Jul 12, 2024

Which issue does this PR close?

Please wait until apache/arrow-rs#6047 is merged

Part of #11418. We can merge to main, if the string-view is merged to main first.

Rationale for this change

We need them to better support efficient string view aggregate.

What changes are included in this PR?

They are not much different from ArrowBytesMap, except that they are simpler, in terms of hash table entry layout.

This PR only implements the type, but did not use it anywhere, since the PR is already quite large imo.

Are these changes tested?

Are there any user-facing changes?

@XiangpengHao
Copy link
Contributor Author

Please hold for review. Reusing the underlying buffer turns out to be a bad idea because it will prevent the buffers from being released. I think it's better to copy the unique strings to new buffers.

@alamb alamb marked this pull request as draft July 15, 2024 10:51
@alamb
Copy link
Contributor

alamb commented Jul 15, 2024

Thanks @XiangpengHao

I have proposed merging the current string-view branch into main here: #11402

However, if we did that, it would only have access to changes that were released in 51.1.0 (for example, not apache/arrow-rs#6047)

Some options:

  1. Keep the existing string-view branch open for longer (and update it to point at the head of arrow-rs)
  2. Merge string-view to main and make a new branch (string-view2?) with more changes

🤔

@XiangpengHao
Copy link
Contributor Author

I think either works, and we do need a branch that tracks the head of arrow-rs, as I expect a few more changes to arrow will be necessary for us to land string view in DataFusion

@alamb
Copy link
Contributor

alamb commented Jul 15, 2024

I think either works, and we do need a branch that tracks the head of arrow-rs, as I expect a few more changes to arrow will be necessary for us to land string view in DataFusion

My preference then is to make a new branch (mostly to keep the change delta to a minimum). Let me try and get that PR merged shortly and we can make a new string-view2 branch to target

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants