-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40153: [Python] Avoid using np.take in Array.to_numpy() #40295
Conversation
@github-actions crossbow submit -g python |
Revision: 7fc1ec7 Submitted crossbow builds: ursacomputing/crossbow @ actions-a31e58e0d0 |
@@ -2515,6 +2515,8 @@ Status ConvertChunkedArrayToPandas(const PandasOptions& options, | |||
std::shared_ptr<ChunkedArray> arr, PyObject* py_ref, | |||
PyObject** out) { | |||
if (options.decode_dictionaries && arr->type()->id() == Type::DICTIONARY) { | |||
// XXX we should return an error as below if options.zero_copy_only | |||
// is true, but that would break compatibility with existing tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment we essentially just ignore the zero_copy_only
keyword in case of a DictionaryArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 5c4869d. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
…che#40295) ### Rationale for this change `Array.to_numpy` calls `np.take` to linearize dictionary arrays. This fails on 32-bit Numpy builds because we give Numpy 64-bit indices and Numpy would like to downcast them. ### What changes are included in this PR? Avoid calling `np.take`, instead using our own dictionary decoding routine. ### Are these changes tested? Yes. A test failure is fixed on 32-bit. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
Array.to_numpy
callsnp.take
to linearize dictionary arrays. This fails on 32-bit Numpy builds because we give Numpy 64-bit indices and Numpy would like to downcast them.What changes are included in this PR?
Avoid calling
np.take
, instead using our own dictionary decoding routine.Are these changes tested?
Yes. A test failure is fixed on 32-bit.
Are there any user-facing changes?
No.