Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove DictionaryArray::keys_array #391

Closed
jhorstmann opened this issue Jun 2, 2021 · 4 comments · Fixed by #419
Closed

Remove DictionaryArray::keys_array #391

jhorstmann opened this issue Jun 2, 2021 · 4 comments · Fixed by #419
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@jhorstmann
Copy link
Contributor

The keys_array method is a performance pitfall if used in inner loops like in a group by operation. At the time the method was added, the related keys method was returning an iterator, making it difficult to randomly access elements. Since then the code was refactored so that keys already returns a reference to an Array, which has much less overhead than the code in keys_array.

This means the keys_array method can be removed and its usages directly replaced with keys.

This would be a backwards incompatible change. For the stable release branch we can think of marking the method as deprecated and direct users to keys instead, which might updates to the next major release then less painful.

@jhorstmann jhorstmann added the enhancement Any new improvement worthy of a entry in the changelog label Jun 2, 2021
@jhorstmann
Copy link
Contributor Author

@alamb for the release process would it be best to create two consecutive PRs? The first marking the method as deprecated, which can be cherry picked into the branch, and then a followup one which deletes the method in the master branch. Or should we go with a longer deprecation period?

@alamb
Copy link
Contributor

alamb commented Jun 2, 2021

@jhorstmann my plan was just to release master as arrow-rs-5.0 (next major release) without going through any deprecation period.

I just don't want to push an incompatible API change to a release that cargo will automatically update (and thus break anyone using the arrow 4.x release line without them taking any action).

By releasing as part of 5.0 projects will have to explicitly update to arrow; 5.x in their Cargo.toml files and can handle whatever updates are necessary at that time (and on their schedule)

cc @jorgecarleitao @andygrove @nevi-me @Dandandan

@jorgecarleitao
Copy link
Member

AFAI understand, @jhorstmann 's point is that we can create two commits: a backward compatible one that adds a "deprecated", which can be released in 4.X, and a follow-up one backward incompatible one that removes the method.

This way, we can cherry-pick the first one into 4.X, informing users about the deprecation, and then remove the method in 5.X.

@alamb
Copy link
Contributor

alamb commented Jun 2, 2021

As long as whatever deprecation warning we add does not break CI for people (e.g. who have clippy set to make errors out of warnings) I think a PR to active_release marking the API as being changed would be fine.

I have had the unpleasant experience of my projects breaking due to some new release of a python package being published upstream and my CI picked it up automatically and started failing when we had made no changes to my code.

@alamb alamb added arrow Changes to the arrow crate api-change Changes to the arrow API labels Jul 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants