-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Test failures on 32-bit x86 #40153
Comments
pandas counterpart: pandas-dev/pandas#57523 |
Perhaps you want to submit a PR for this? From what I can tell, this should be mostly a matter of fixing the tests. |
I can try but I can't promise I'll come up with anything that makes sense. I have almost no knowledge of numpy, and none of arrow/pandas, so it'll be all guesswork. |
…t platforms Use `uintptr_t` rather than `intptr_t` to fix `OverflowError`, visible e.g. when running `tests/interchange/test_conversion.py` tests on 32-bit platforms.
Also cc @jorisvandenbossche |
Looking at the other failures:
I haven't figured this out yet. It might be a bug in pandas. FWICS we're constructing two arrays, and they end up having different types inside.
This could be a bug in pyarrow (or arrow itself) — either it didn't build with Large File Support, or it doesn't use correct types somewhere in between.
I think the first one is because |
Update the size assumptions in tests to account for size differences on 32-bit platforms: `pd.object_` is 4 bytes rather than 8 bytes, and `pa.schema` is twice smaller.
Update the size assumptions in tests to account for size differences on 32-bit platforms: `pd.object_` is 4 bytes rather than 8 bytes, and `pa.schema` is twice smaller.
…forms (#40158) Use `uintptr_t` rather than `intptr_t` to fix `OverflowError`, visible e.g. when running `tests/interchange/test_conversion.py` tests on 32-bit platforms. ### Rationale for this change This fixes the `OverflowError`s from #40153, and makes `pyarrow/tests/interchange/` all pass on 32-bit x86. ### What changes are included in this PR? - change the type used to store pointer from `intptr_t` to `uintptr_t` to provide coverage for pointers above `0x80000000`. ### Are these changes tested? These changes are covered by the tests in `pyarrow/tests/interchange`. ### Are there any user-facing changes? It fixes `OverflowError` that can be triggered by working with pandas data types, possibly more (though I'm not sure if this qualifies as a "crash"). * Closes: #40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
### What changes are included in this PR? Add a Debian-based i386 test build for Python, similar to the existing one for C++. ### Are these changes tested? Yes. The test suite step in the new build will fail until GH-40153 is entirely fixed. ### Are there any user-facing changes? No. * Closes: #40159 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Should we reopen the bug for the remaining issues or should I file a new one? |
Oops, sorry, I didn't mean to close it. Let's reopen it. |
### Rationale for this change This fixes two tests on 32-bit platforms (tested on x86 specifically). ### What changes are included in this PR? - update the `pd.object_` size assumption to 4 bytes on 32-bit platforms - update the `pa.schema` size assumptions to be twice smaller on 32-bit platforms ### Are these changes tested? The changes fix tests. ### Are there any user-facing changes? Only test fixes. * Closes: #40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
I've opened #40176 for the large file-related failures. |
By the way, I've gotten a reply that the two cases of |
…t platforms (apache#40158) Use `uintptr_t` rather than `intptr_t` to fix `OverflowError`, visible e.g. when running `tests/interchange/test_conversion.py` tests on 32-bit platforms. ### Rationale for this change This fixes the `OverflowError`s from apache#40153, and makes `pyarrow/tests/interchange/` all pass on 32-bit x86. ### What changes are included in this PR? - change the type used to store pointer from `intptr_t` to `uintptr_t` to provide coverage for pointers above `0x80000000`. ### Are these changes tested? These changes are covered by the tests in `pyarrow/tests/interchange`. ### Are there any user-facing changes? It fixes `OverflowError` that can be triggered by working with pandas data types, possibly more (though I'm not sure if this qualifies as a "crash"). * Closes: apache#40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…che#40164) ### What changes are included in this PR? Add a Debian-based i386 test build for Python, similar to the existing one for C++. ### Are these changes tested? Yes. The test suite step in the new build will fail until apacheGH-40153 is entirely fixed. ### Are there any user-facing changes? No. * Closes: apache#40159 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…apache#40165) ### Rationale for this change This fixes two tests on 32-bit platforms (tested on x86 specifically). ### What changes are included in this PR? - update the `pd.object_` size assumption to 4 bytes on 32-bit platforms - update the `pa.schema` size assumptions to be twice smaller on 32-bit platforms ### Are these changes tested? The changes fix tests. ### Are there any user-facing changes? Only test fixes. * Closes: apache#40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…n build (apache#40176) ### Rationale for this change Python large file tests fail on 32-bit platforms. ### What changes are included in this PR? 1. Fix passing `int64_t` position to the Python file methods when a Python file object is wrapped in an Arrow `RandomAccessFile` 2. Disallow creating a `MemoryMappedFile` spanning more than the `size_t` maximum, instead of silently truncating its length ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Thanks a lot! I can confirm that they fix the remaining issues. |
Thank you for putting the time into this. |
### Rationale for this change `Array.to_numpy` calls `np.take` to linearize dictionary arrays. This fails on 32-bit Numpy builds because we give Numpy 64-bit indices and Numpy would like to downcast them. ### What changes are included in this PR? Avoid calling `np.take`, instead using our own dictionary decoding routine. ### Are these changes tested? Yes. A test failure is fixed on 32-bit. ### Are there any user-facing changes? No. * GitHub Issue: #40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ms (#40294) ### Rationale for this change `Tensor.__getbuffer__` would silently assume that `Py_ssize_t` is the same width as `int64_t`, which is true only on 64-bit platforms. ### What changes are included in this PR? Create an internal buffer of `Py_ssize_t` values mirroring a Tensor's shape and strides, to avoid relying on the aforementioned assumption. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
### Rationale for this change `test_gdb.py` would fail on 32-bit platforms because the gdb extension errors out when a timestamp value is larger than the platform's time_t. ### What changes are included in this PR? 1. Catch `OverflowError` from the Python datetime module when trying to format a timestamp 2. Tweak the expected test results on 32-bit ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Issue resolved by pull request 40293 |
…latforms (apache#40294) ### Rationale for this change `Tensor.__getbuffer__` would silently assume that `Py_ssize_t` is the same width as `int64_t`, which is true only on 64-bit platforms. ### What changes are included in this PR? Create an internal buffer of `Py_ssize_t` values mirroring a Tensor's shape and strides, to avoid relying on the aforementioned assumption. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…#40293) ### Rationale for this change `test_gdb.py` would fail on 32-bit platforms because the gdb extension errors out when a timestamp value is larger than the platform's time_t. ### What changes are included in this PR? 1. Catch `OverflowError` from the Python datetime module when trying to format a timestamp 2. Tweak the expected test results on 32-bit ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…t platforms (apache#40158) Use `uintptr_t` rather than `intptr_t` to fix `OverflowError`, visible e.g. when running `tests/interchange/test_conversion.py` tests on 32-bit platforms. ### Rationale for this change This fixes the `OverflowError`s from apache#40153, and makes `pyarrow/tests/interchange/` all pass on 32-bit x86. ### What changes are included in this PR? - change the type used to store pointer from `intptr_t` to `uintptr_t` to provide coverage for pointers above `0x80000000`. ### Are these changes tested? These changes are covered by the tests in `pyarrow/tests/interchange`. ### Are there any user-facing changes? It fixes `OverflowError` that can be triggered by working with pandas data types, possibly more (though I'm not sure if this qualifies as a "crash"). * Closes: apache#40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…che#40164) ### What changes are included in this PR? Add a Debian-based i386 test build for Python, similar to the existing one for C++. ### Are these changes tested? Yes. The test suite step in the new build will fail until apacheGH-40153 is entirely fixed. ### Are there any user-facing changes? No. * Closes: apache#40159 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…apache#40165) ### Rationale for this change This fixes two tests on 32-bit platforms (tested on x86 specifically). ### What changes are included in this PR? - update the `pd.object_` size assumption to 4 bytes on 32-bit platforms - update the `pa.schema` size assumptions to be twice smaller on 32-bit platforms ### Are these changes tested? The changes fix tests. ### Are there any user-facing changes? Only test fixes. * Closes: apache#40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…n build (apache#40176) ### Rationale for this change Python large file tests fail on 32-bit platforms. ### What changes are included in this PR? 1. Fix passing `int64_t` position to the Python file methods when a Python file object is wrapped in an Arrow `RandomAccessFile` 2. Disallow creating a `MemoryMappedFile` spanning more than the `size_t` maximum, instead of silently truncating its length ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…che#40295) ### Rationale for this change `Array.to_numpy` calls `np.take` to linearize dictionary arrays. This fails on 32-bit Numpy builds because we give Numpy 64-bit indices and Numpy would like to downcast them. ### What changes are included in this PR? Avoid calling `np.take`, instead using our own dictionary decoding routine. ### Are these changes tested? Yes. A test failure is fixed on 32-bit. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…latforms (apache#40294) ### Rationale for this change `Tensor.__getbuffer__` would silently assume that `Py_ssize_t` is the same width as `int64_t`, which is true only on 64-bit platforms. ### What changes are included in this PR? Create an internal buffer of `Py_ssize_t` values mirroring a Tensor's shape and strides, to avoid relying on the aforementioned assumption. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…#40293) ### Rationale for this change `test_gdb.py` would fail on 32-bit platforms because the gdb extension errors out when a timestamp value is larger than the platform's time_t. ### What changes are included in this PR? 1. Catch `OverflowError` from the Python datetime module when trying to format a timestamp 2. Tweak the expected test results on 32-bit ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Describe the bug, including details regarding any error messages, version, and platform.
When running the test suite on 32-bit x86, I'm getting the following test failures:
Tracebacks
Full build & test log (2.5M): pyarrow.txt
This is arrow 15.0.0 on Gentoo, with x86 systemd-nspawn container. I've used
-O2 -march=pentium-m -mfpmath=sse -pipe
as compiler flags, to rule out i387-specific issues.Some of these might be problems inside pandas. I'm going to file a bug about the test failures there in a minute, and link it here afterwards.
Component(s)
Python
The text was updated successfully, but these errors were encountered: