Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: UnexpectedError in Python Binding When Accessing S3 on aarch64 #5483

Closed
Tracked by #5515
tacheng9502 opened this issue Dec 31, 2024 · 24 comments · Fixed by #5522
Closed
Tracked by #5515

bug: UnexpectedError in Python Binding When Accessing S3 on aarch64 #5483

tacheng9502 opened this issue Dec 31, 2024 · 24 comments · Fixed by #5522
Assignees

Comments

@tacheng9502
Copy link

tacheng9502 commented Dec 31, 2024

Bug Description

I encountered an error when using the Python binding within a Docker container. The error occurs while attempting to interact with an S3 bucket. The error message indicates an issue with sending an HTTP request.

Error Message

opendal.UnexpectedError: Unexpected (temporary) at stat, context: { url: https://s3.my-region.amazonaws.com/mybucket/path/to/object.jpg, called: http_util::Client::send, service: s3, path: path/to/object.jpg } => send http request, source: error sending request for url (https://s3.my-region.amazonaws.com/mybucket/path/to/object.jpg)

Reproducible Repository

https://github.com/tacheng9502/opendal-docker-s3-bug

Environment Details

OpenDAL Version: 0.45.13
Python Version: 3.10.16
Docker Base Image: python:3.10-slim, python:3.10, public.ecr.aws/lambda/python:3.10

Additional Context

The issue happens in both the Python slim/full image and AWS Lambda image. However, the same operation works fine outside the Docker container, so this issue seems to be specific to the Docker environment. I’ve verified that the S3 credentials and permissions are correctly set up.

@tacheng9502 tacheng9502 changed the title Error in Python Binding When Accessing S3 from Docker bug: Error in Python Binding When Accessing S3 from Docker Dec 31, 2024
@tacheng9502 tacheng9502 changed the title bug: Error in Python Binding When Accessing S3 from Docker bug: UnexpectedError in Python Binding When Accessing S3 from Docker Dec 31, 2024
@Xuanwo
Copy link
Member

Xuanwo commented Dec 31, 2024

Hi, given the fact that you are using a slim image, have you checked that if ca certs have been installed?

@tacheng9502
Copy link
Author

tacheng9502 commented Dec 31, 2024

Hi, given the fact that you are using a slim image, have you checked that if ca certs have been installed?

Hi Xuanwo,

Thank you for your prompt response.

I’ve verified that the ca-certificates package is installed in the Python slim image, and I also tested with the full base image. Unfortunately, I’m still encountering the same issue. The same issue also happened in the AWS Lambda image - public.ecr.aws/lambda/python:3.10.

Package: ca-certificates
Status: install ok installed
Priority: standard
Section: misc
Installed-Size: 384
Maintainer: Julien Cristau <jcristau@debian.org>
Architecture: all
Multi-Arch: foreign
Version: 20230311
Depends: openssl (>= 1.1.1), debconf (>= 0.5) | debconf-2.0
Breaks: ca-certificates-java (<< 20121112+nmu1)
Enhances: openssl
Description: Common CA certificates
 Contains the certificate authorities shipped with Mozilla's browser to allow
 SSL-based applications to check for the authenticity of SSL connections.
 .
 Please note that Debian can neither confirm nor deny whether the
 certificate authorities whose certificates are included in this package
 have in any way been audited for trustworthiness or RFC 3647 compliance.
 Full responsibility to assess them belongs to the local system
 administrator.

@Xuanwo
Copy link
Member

Xuanwo commented Dec 31, 2024

Thank you for the detailed information. I will try reproducing it and figure it out.

@Xuanwo
Copy link
Member

Xuanwo commented Dec 31, 2024

Hi @tacheng9502, I set up the environment as described in https://github.com/tacheng9502/opendal-docker-s3-bug but was unable to reproduce the issue.

:) docker run --env-file .env opendal-docker-bug
Error: NotFound (permanent) at stat, context: { uri: https://s3.ap-northeast-1.amazonaws.com/xxxxx/path/to/object.jpg, response: Parts { status: 404, version: HTTP/1.1, headers: {"x-amz-request-id": "K15FAYP2PAKA8H9Y", "x-amz-id-2": "fJ8JVJASq11pV9Htu+BsZpf4+/LqefYqJclEBlwF9g/3JjhBgQL/raTF+yiLWNY3cBBgPmJPuVQ=", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Tue, 31 Dec 2024 07:02:51 GMT", "server": "AmazonS3"} }, service: s3, path: path/to/object.jpg }

Could you share the output of?

curl -v https://s3.region.amazonaws.com/xxxxx/path/to/object.jpg

@kikumoto
Copy link

kikumoto commented Jan 5, 2025

@Xuanwo

I believe this issue occurs in the Python bindings for:

  • Linux
  • aarch64

It does not occur on x86 - Linux.

import opendal

kwargs = {
    'region': 'us-west-2',
    'access_key_id': 'XXXXXXXXXXXXXXXX',
    'secret_access_key': 'ZZZZZZZZZZZZZZZZZZZZZZZZ',
    'endpoint': 'https://s3.amazonaws.com',
    'bucket': 'yyyyyyyyyyyyyyy',
}

op = opendal.Operator(scheme="s3", **kwargs)
res = op.stat(path='requirements.txt').mode.is_file()
print(res)

When running this on Amazon Linux 2023 t4g.medium (Arm64), the same error occurs.
However, it works correctly on t3.medium (x86).

In both cases, opendal was installed via pip.

On the other hand, if I build the Python bindings manually on Arm64 without using pip, it works correctly.
From this, I suspect there might be an issue in the cross-compilation process for aarch64 Linux in the Python bindings.

I would appreciate it if you could look into this.

@Xuanwo
Copy link
Member

Xuanwo commented Jan 6, 2025

Cc @Zheaoli & @messense, do you have any ideas?

@Zheaoli
Copy link
Member

Zheaoli commented Jan 6, 2025

I will take handle of this

@Zheaoli
Copy link
Member

Zheaoli commented Jan 6, 2025

@kikumoto
Copy link

kikumoto commented Jan 6, 2025

@Zheaoli
Thank you for your quick response.
I tried it, and it worked fine.

on Amazon Linux 2023 t4g.medium

$ python -m venv venv2
$ . venv2/bin/activate
$ pip install -U pip

$ pip list
Package Version
------- -------
pip     24.3.1

$ python -V
Python 3.12.4

$ pip install https://opendal-infra-s3.s3.ap-northeast-1.amazonaws.com/opendal-0.45.13-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Collecting opendal==0.45.13
  Downloading https://opendal-infra-s3.s3.ap-northeast-1.amazonaws.com/opendal-0.45.13-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (16.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.0/16.0 MB 13.1 MB/s eta 0:00:00
Installing collected packages: opendal
Successfully installed opendal-0.45.13

$ pip list
Package Version
------- -------
opendal 0.45.13
pip     24.3.1

$ python demo1.py
True

The content of demo1.py is the sample code I provided above.

Thank you very much.

@Zheaoli
Copy link
Member

Zheaoli commented Jan 6, 2025

OK fine, here's the result

https://github.com/apache/opendal/blob/main/.github/workflows/release_python.yml#L66

I think we set -D__ARM_ARCH=8 before. I think there will be some problems for some algorithm. Actually it's not recommended by the author briansmith/ring#1728 (comment)

I think we can change the environment to

export CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc
export AR_aarch64_unknown_linux_gnu=aarch64-linux-gnu-ar

It will work(

@Zheaoli
Copy link
Member

Zheaoli commented Jan 7, 2025

OK, here's the debug result. I think maybe it's the final result

First, we need ARM C Language Extension for ring lib. https://github.com/briansmith/ring/blob/main/include/ring-core/arm_arch.h#L78-L84

// We require the ARM assembler provide |__ARM_ARCH| from Arm C Language
// Extensions (ACLE). This is supported in GCC 4.8+ and Clang 3.2+. MSVC does
// not implement ACLE, but we require Clang's assembler on Windows.
#if !defined(__ARM_ARCH)
#error "ARM assembler must define __ARM_ARCH"
#endif

OK, let's take a look at the build container now, we use the GCC in the manylinux-cross image. FYI https://github.com/rust-cross/manylinux-cross/blob/main/manylinux2014/x86_64/Dockerfile#L40 and https://github.com/crosstool-ng/crosstool-ng/archive/02d1503f6769be4ad8058b393d4245febced459f.tar.gz

CT_CONFIG_VERSION="3"
CT_OBSOLETE=y
CT_ARCH_ARM=y
CT_ARCH_64=y
CT_TARGET_VENDOR="ol7u9"
CT_KERNEL_LINUX=y
CT_LINUX_USE_ORACLE=y
CT_LINUX_ORACLE_V_4_14=y
CT_LINUX_ORACLE_VERSION="4.14.35-2025.400.8"
CT_BINUTILS_USE_ORACLE=y
CT_BINUTILS_ORACLE_V_2_27_44=y
CT_BINUTILS_LINKER_LD_GOLD=y
CT_BINUTILS_GOLD_THREADS=y
CT_BINUTILS_LD_WRAPPER=y
CT_BINUTILS_PLUGINS=y
CT_GLIBC_USE_ORACLE=y
CT_GLIBC_ORACLE_V_2_17_317_0_3=y
CT_GCC_USE_ORACLE=y
CT_GCC_ORACLE_V_4_8=y
CT_GCC_ORACLE_VERSION="4.8.5-44.0.5"
CT_CC_LANG_CXX=y

I see someone saying that we can use __ARM_ARCH to activate ACLE since GCC 4.8.5. But it's not true. It would be GCC 5.2.0 actually. FYI gcc-mirror/gcc@b1760f7

#ifndef __ARM_ARCH
# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
     || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
     || defined(__ARM_ARCH_7EM__)
#  define __ARM_ARCH 7
# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
        || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) \
        || defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__) \
	|| defined(__ARM_ARCH_6M__)
#  define __ARM_ARCH 6
# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5T__) \
	|| defined(__ARM_ARCH_5E__) || defined(__ARM_ARCH_5TE__) \
	|| defined(__ARM_ARCH_5TEJ__)
#  define __ARM_ARCH 5
# else
#  define __ARM_ARCH 4
# endif
#endif

So if we pass an env like CFLAGS_aarch64_unknown_linux_gnu: "-D__ARM_ARCH=8", the ACLE is not working in the final binary result. It will cause an unexcepted error when we use rustls to handle the TLS connection

So, I think we may have two choice

  1. Upgrade to manylinux_2_28, the compiler is newer
  2. set export CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc

@Xuanwo
Copy link
Member

Xuanwo commented Jan 7, 2025

Hi, @Zheaoli, thank you so much for the debugging!

Upgrading the Python build image to manylinux_2_28 might introduce unexpected breaking changes, so I’m more interested in a solution that works with manylinux_2_17 instead.

Could you provide further explanation on why setting CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc resolves this issue?

@Zheaoli
Copy link
Member

Zheaoli commented Jan 7, 2025

Could you provide further explanation on why setting CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc resolves this issue?

Upgrade the GCC from 4.8.5 to 11.x

@Xuanwo
Copy link
Member

Xuanwo commented Jan 7, 2025

Upgrade the GCC from 4.8.5 to 11.x

Will this affect the glibc version we are using?

@Xuanwo
Copy link
Member

Xuanwo commented Jan 7, 2025

Or perhaps it's a good time for us to upgrade to manylinux_2_28. How many older distributions are still reliant on glibc 2.17?

@messense
Copy link
Member

messense commented Jan 8, 2025

There is no guarantee that system gcc 11 (cross) compiler will be compatible with glibc version less than the system glibc version, so it's very likely it will affect glibc version.

The current aarch64 manylinux cross docker image uses gcc 4.8.5 because I don't know how to build a redhat devtoolset gcc cross compiler for aarch64. In pypa/manylinux, the x86_64 version uses devtoolset gcc to achieve manylinux compatibility when using higher gcc version, so in theory you can also do that when cross compiling as long as you can build a devtoolset like gcc toolchain.

https://git.centos.org/rpms/devtoolset-10/tree/c7

Unfortunately I have no idea how to build one that can compile to aarch64 on x86_64, see also pypa/manylinux#1012.

@Xuanwo
Copy link
Member

Xuanwo commented Jan 8, 2025

Thank you @messense for providing more information.

Cc everyone involved in this issue: @kikumoto, @Zheaoli and @tacheng9502

I have initiated a discussion about upgrading our toolchain to manylinux 2.28. Please share your thoughts and cast your vote there: #5521

@Xuanwo Xuanwo changed the title bug: UnexpectedError in Python Binding When Accessing S3 from Docker bug: UnexpectedError in Python Binding When Accessing S3 on aarch64 Jan 8, 2025
@samster25
Copy link

We had the exact same issue in Daft on aarch64 + linux! The fix that worked for us was the following

        target: aarch64-unknown-linux-gnu
        manylinux: auto
        container: messense/manylinux_2_24-cross:aarch64

which let us cross compile aarch64 on x64 for 2.24!

Thanks @messense for maintaining these images!

link to our yaml:

https://github.com/Eventual-Inc/Daft/blob/main/.github/workflows/python-publish.yml#L78

@samster25
Copy link

Also if you are running on red hat based linux + aarch64 and are using jemalloc, you have to also bump up the page size to 2 ^ 16.

You can do that via:

export JEMALLOC_SYS_WITH_LG_PAGE=16

@messense
Copy link
Member

messense commented Jan 8, 2025

FYI, manylinux_2_24 is deprecated so theses images are not updated anymore.

@samster25
Copy link

@messense Unfortunately we see many users on versions of Amazon Linux 2 running glibc 2.26. For many of these folks upgrading the AMI is out of scope for their organization.

@Zheaoli
Copy link
Member

Zheaoli commented Jan 8, 2025

The current aarch64 manylinux cross docker image uses gcc 4.8.5 because I don't know how to build a redhat devtoolset gcc cross compiler for aarch64. In pypa/manylinux, the x86_64 version uses devtoolset gcc to achieve manylinux compatibility when using higher gcc version, so in theory you can also do that when cross compiling as long as you can build a devtoolset like gcc toolchain.

Yes, I am trying to build a newer gcc version on 2014

@Xuanwo
Copy link
Member

Xuanwo commented Jan 8, 2025

Hi, @tacheng9502 and @kikumoto, could you help verify whether https://test.pypi.org/project/opendal/0.45.14/#files has resolved your issue?

pip install --index-url https://test.pypi.org/simple/ opendal

@kikumoto
Copy link

kikumoto commented Jan 8, 2025

@Xuanwo
I tried it, and it worked fine.

on Amazon Linux 2023 t4g.medium

$ python -m venv venv3
$ . venv3/bin/activate
$ pip install -U pip

$ pip list
Package Version
------- -------
pip     24.3.1

$ pip install --index-url https://test.pypi.org/simple/ opendal
Looking in indexes: https://test.pypi.org/simple/
Collecting opendal
  Downloading https://test-files.pythonhosted.org/packages/29/ea/66243458f4c5feca24d2434265743fe16ba83e4a61dfb93726828e4646fd/opendal-0.45.14-cp311-abi3-manylinux_2_28_aarch64.whl.metadata (3.6 kB)
Downloading https://test-files.pythonhosted.org/packages/29/ea/66243458f4c5feca24d2434265743fe16ba83e4a61dfb93726828e4646fd/opendal-0.45.14-cp311-abi3-manylinux_2_28_aarch64.whl (16.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.0/16.0 MB 6.9 MB/s eta 0:00:00
Installing collected packages: opendal
Successfully installed opendal-0.45.14

$ pip list
Package Version
------- -------
opendal 0.45.14
pip     24.3.1

$ python demo1.py
True

Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment