-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35112: [Python] Expose keys_sorted in python MapType #35113
GH-35112: [Python] Expose keys_sorted in python MapType #35113
Conversation
|
LGTM! I'll let someone with committers rights finalize this review. I believe you can ignore the appveyor error, since there's been issues on main with pytests timing out. |
The binding looks good to me also +1, thank you for the contribution! I do have a general, probably silly question, about the keyword in general. Looking at the C++ and the tests, it is meant as a "metadata" keyword and not a "check" that the data is actually sorted, right? What I mean is, you can have a >>> ty = pa.map_(pa.string(), pa.int8(), keys_sorted=True)
>>> v = [('b', 2), ('a', 1)]
>>> s = pa.scalar(v, type=ty)
>>> s
<pyarrow.MapScalar: [('b', 2), ('a', 1)]> And Dane is correct, the failing tests are not connected to this PR. |
Good catch, @AlenkaF . Is that a possible bug that the underlying C++ implementation doesn't maintain a sorted order? |
I am not sure. I think the type only has the So maybe the question is do we want to add a check for it in PyArrow Array and Scalar? |
Are we then saying that:
For context, I'm not really using that field. I just need to be able to access it in order to create slightly modified copies of schemas. For example if I want to change the type of nested fields (int32 -> int64). Then I need to make copy of | So maybe the question is do we want to add a check for it in PyArrow Array and Scalar? Maybe we should create a follow up issue to do this. It would involve making some change that may break some stuff at runtime (if someone was previously providing unsorted data with As far as this MR is concerned, I think we should just improve the doc for that field (and probably update the doc in the C++ MapType class). |
Yeah, the issue of I agree that for this PR the aim is to expose the parameter as a property so we are able to get the information from the Data Type. My suggestion for the docs would be to only make it explicit in PyArrow for example: "Should the entries be sorted according to keys." |
…on-expose-keys-sorted-in-map-type
@AlenkaF thanks for the suggestion, I've updated the comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Benchmark runs are scheduled for baseline = 9f852d4 and contender = 1deb740. 1deb740 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
…#35113) ### Rationale for this change It not possible to read `keys_sorted` in the python API ### What changes are included in this PR? - expose keys_sorted in `cdef class MapType` / types.pxi - add tests ### Are these changes tested? yes ### Are there any user-facing changes? We're exposing keys_sorted but I guess the documentation will update itself from the `"""` pydoc (?) This is not an API breaking change * Closes: apache#35112 Authored-by: aandres <aandres@tradewelltech.co> Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
…#35113) ### Rationale for this change It not possible to read `keys_sorted` in the python API ### What changes are included in this PR? - expose keys_sorted in `cdef class MapType` / types.pxi - add tests ### Are these changes tested? yes ### Are there any user-facing changes? We're exposing keys_sorted but I guess the documentation will update itself from the `"""` pydoc (?) This is not an API breaking change * Closes: apache#35112 Authored-by: aandres <aandres@tradewelltech.co> Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
…#35113) ### Rationale for this change It not possible to read `keys_sorted` in the python API ### What changes are included in this PR? - expose keys_sorted in `cdef class MapType` / types.pxi - add tests ### Are these changes tested? yes ### Are there any user-facing changes? We're exposing keys_sorted but I guess the documentation will update itself from the `"""` pydoc (?) This is not an API breaking change * Closes: apache#35112 Authored-by: aandres <aandres@tradewelltech.co> Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
Rationale for this change
It not possible to read
keys_sorted
in the python APIWhat changes are included in this PR?
cdef class MapType
/ types.pxiAre these changes tested?
yes
Are there any user-facing changes?
We're exposing keys_sorted but I guess the documentation will update itself from the
"""
pydoc (?)This is not an API breaking change