-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: UnexpectedError in Python Binding When Accessing S3 on aarch64 #5483
Comments
Hi, given the fact that you are using a slim image, have you checked that if ca certs have been installed? |
Hi Xuanwo, Thank you for your prompt response. I’ve verified that the Package: ca-certificates
Status: install ok installed
Priority: standard
Section: misc
Installed-Size: 384
Maintainer: Julien Cristau <jcristau@debian.org>
Architecture: all
Multi-Arch: foreign
Version: 20230311
Depends: openssl (>= 1.1.1), debconf (>= 0.5) | debconf-2.0
Breaks: ca-certificates-java (<< 20121112+nmu1)
Enhances: openssl
Description: Common CA certificates
Contains the certificate authorities shipped with Mozilla's browser to allow
SSL-based applications to check for the authenticity of SSL connections.
.
Please note that Debian can neither confirm nor deny whether the
certificate authorities whose certificates are included in this package
have in any way been audited for trustworthiness or RFC 3647 compliance.
Full responsibility to assess them belongs to the local system
administrator. |
Thank you for the detailed information. I will try reproducing it and figure it out. |
Hi @tacheng9502, I set up the environment as described in https://github.com/tacheng9502/opendal-docker-s3-bug but was unable to reproduce the issue. :) docker run --env-file .env opendal-docker-bug
Error: NotFound (permanent) at stat, context: { uri: https://s3.ap-northeast-1.amazonaws.com/xxxxx/path/to/object.jpg, response: Parts { status: 404, version: HTTP/1.1, headers: {"x-amz-request-id": "K15FAYP2PAKA8H9Y", "x-amz-id-2": "fJ8JVJASq11pV9Htu+BsZpf4+/LqefYqJclEBlwF9g/3JjhBgQL/raTF+yiLWNY3cBBgPmJPuVQ=", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Tue, 31 Dec 2024 07:02:51 GMT", "server": "AmazonS3"} }, service: s3, path: path/to/object.jpg } Could you share the output of?
|
I believe this issue occurs in the Python bindings for:
It does not occur on x86 - Linux. import opendal
kwargs = {
'region': 'us-west-2',
'access_key_id': 'XXXXXXXXXXXXXXXX',
'secret_access_key': 'ZZZZZZZZZZZZZZZZZZZZZZZZ',
'endpoint': 'https://s3.amazonaws.com',
'bucket': 'yyyyyyyyyyyyyyy',
}
op = opendal.Operator(scheme="s3", **kwargs)
res = op.stat(path='requirements.txt').mode.is_file()
print(res) When running this on Amazon Linux 2023 t4g.medium (Arm64), the same error occurs. In both cases, On the other hand, if I build the Python bindings manually on Arm64 without using pip, it works correctly. I would appreciate it if you could look into this. |
I will take handle of this |
@kikumoto @tacheng9502 would you mind to install https://opendal-infra-s3.s3.ap-northeast-1.amazonaws.com/opendal-0.45.13-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl and test it? |
@Zheaoli on Amazon Linux 2023 t4g.medium
The content of demo1.py is the sample code I provided above. Thank you very much. |
OK fine, here's the result https://github.com/apache/opendal/blob/main/.github/workflows/release_python.yml#L66 I think we set I think we can change the environment to export CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc
export AR_aarch64_unknown_linux_gnu=aarch64-linux-gnu-ar It will work( |
OK, here's the debug result. I think maybe it's the final result First, we need ARM C Language Extension for ring lib. https://github.com/briansmith/ring/blob/main/include/ring-core/arm_arch.h#L78-L84 // We require the ARM assembler provide |__ARM_ARCH| from Arm C Language
// Extensions (ACLE). This is supported in GCC 4.8+ and Clang 3.2+. MSVC does
// not implement ACLE, but we require Clang's assembler on Windows.
#if !defined(__ARM_ARCH)
#error "ARM assembler must define __ARM_ARCH"
#endif OK, let's take a look at the build container now, we use the GCC in the manylinux-cross image. FYI https://github.com/rust-cross/manylinux-cross/blob/main/manylinux2014/x86_64/Dockerfile#L40 and https://github.com/crosstool-ng/crosstool-ng/archive/02d1503f6769be4ad8058b393d4245febced459f.tar.gz
I see someone saying that we can use #ifndef __ARM_ARCH
# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
|| defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
|| defined(__ARM_ARCH_7EM__)
# define __ARM_ARCH 7
# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
|| defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) \
|| defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__) \
|| defined(__ARM_ARCH_6M__)
# define __ARM_ARCH 6
# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5T__) \
|| defined(__ARM_ARCH_5E__) || defined(__ARM_ARCH_5TE__) \
|| defined(__ARM_ARCH_5TEJ__)
# define __ARM_ARCH 5
# else
# define __ARM_ARCH 4
# endif
#endif So if we pass an env like So, I think we may have two choice
|
Hi, @Zheaoli, thank you so much for the debugging! Upgrading the Python build image to manylinux_2_28 might introduce unexpected breaking changes, so I’m more interested in a solution that works with manylinux_2_17 instead. Could you provide further explanation on why setting |
Upgrade the GCC from 4.8.5 to 11.x |
Will this affect the glibc version we are using? |
Or perhaps it's a good time for us to upgrade to |
There is no guarantee that system gcc 11 (cross) compiler will be compatible with glibc version less than the system glibc version, so it's very likely it will affect glibc version. The current aarch64 manylinux cross docker image uses gcc 4.8.5 because I don't know how to build a redhat devtoolset gcc cross compiler for aarch64. In pypa/manylinux, the x86_64 version uses devtoolset gcc to achieve manylinux compatibility when using higher gcc version, so in theory you can also do that when cross compiling as long as you can build a devtoolset like gcc toolchain. https://git.centos.org/rpms/devtoolset-10/tree/c7 Unfortunately I have no idea how to build one that can compile to aarch64 on x86_64, see also pypa/manylinux#1012. |
Thank you @messense for providing more information. Cc everyone involved in this issue: @kikumoto, @Zheaoli and @tacheng9502 I have initiated a discussion about upgrading our toolchain to manylinux 2.28. Please share your thoughts and cast your vote there: #5521 |
We had the exact same issue in Daft on aarch64 + linux! The fix that worked for us was the following
which let us cross compile aarch64 on x64 for 2.24! Thanks @messense for maintaining these images! link to our yaml: https://github.com/Eventual-Inc/Daft/blob/main/.github/workflows/python-publish.yml#L78 |
Also if you are running on red hat based linux + aarch64 and are using jemalloc, you have to also bump up the page size to 2 ^ 16. You can do that via:
|
FYI, manylinux_2_24 is deprecated so theses images are not updated anymore. |
@messense Unfortunately we see many users on versions of Amazon Linux 2 running glibc 2.26. For many of these folks upgrading the AMI is out of scope for their organization. |
Yes, I am trying to build a newer gcc version on 2014 |
Hi, @tacheng9502 and @kikumoto, could you help verify whether https://test.pypi.org/project/opendal/0.45.14/#files has resolved your issue? pip install --index-url https://test.pypi.org/simple/ opendal |
@Xuanwo on Amazon Linux 2023 t4g.medium
Thank you very much. |
Bug Description
I encountered an error when using the Python binding within a Docker container. The error occurs while attempting to interact with an S3 bucket. The error message indicates an issue with sending an HTTP request.
Error Message
Reproducible Repository
https://github.com/tacheng9502/opendal-docker-s3-bug
Environment Details
OpenDAL Version: 0.45.13
Python Version: 3.10.16
Docker Base Image: python:3.10-slim, python:3.10, public.ecr.aws/lambda/python:3.10
Additional Context
The issue happens in both the Python slim/full image and AWS Lambda image. However, the same operation works fine outside the Docker container, so this issue seems to be specific to the Docker environment. I’ve verified that the S3 credentials and permissions are correctly set up.
The text was updated successfully, but these errors were encountered: