Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribute pip wheels for the architecture they are built for #1043

Closed
MatthiasKohl opened this issue Feb 6, 2024 · 13 comments
Closed

Distribute pip wheels for the architecture they are built for #1043

MatthiasKohl opened this issue Feb 6, 2024 · 13 comments

Comments

@MatthiasKohl
Copy link

System Info

Any platform other than x86_64

Reproduction

On e.g. aarch64, pip install bitsandbytes.

Expected behavior

pip install bitsandbytes will succeed at first on aarch64 because the distribution on PyPi has the any architecture.
However, the distributed library is built for x86_64, and this cannot work.

It would be expected that there is a pip distribution for each architecture supported by bitsandbytes.

@matthewdouglas
Copy link
Member

Hi @MatthiasKohl,

At the moment the only supported platform for the PyPI wheels is manylinux1_x86_64. However, please see #997 and the ongoing discussion in #1032 where this is being addressed.

Status overview:

@rickardp
Copy link
Contributor

rickardp commented Feb 6, 2024

@MatthiasKohl Can you try the wheels built here?

https://github.com/TimDettmers/bitsandbytes/actions/runs/7792455224

They are built for Linux aarch64, but I don't have an Arm device that can run Cuda so I never tested them. It would be very valuable to get some feedback on how they run. I believe the source level changes were tested on a Jetson device earlier

@wkpark
Copy link
Contributor

wkpark commented Feb 6, 2024

Linux cuda arm64 build is not arm64. it is x64_86. (Linux cpu arm64 is just fine.)

@rickardp
Copy link
Contributor

rickardp commented Feb 6, 2024

Linux cuda arm64 build is not arm64. it is x64_86. (Linux cpu arm64 is just fine.)

https://github.com/TimDettmers/bitsandbytes/blob/88ab630315d9a79973302182d79653b1dfa0918a/.github/workflows/python-package.yml#L138

The cuda versions are indeed built for arm64. Or are you saying that the arm64 image produces x64 output

@matthewdouglas
Copy link
Member

@rickardp

The cuda versions are indeed built for arm64. Or are you saying that the arm64 image produces x64 output

The wheel in the artifact bdist_wheel_ubuntu-latest_aarch64_3.11.zip is tagged as x86_64: bitsandbytes-0.43.0.dev0-cp311-cp311-linux_x86_64.whl. The CUDA binaries appear to be correct aarch64, but the included CPU binary (libbitsandbytes.so) is the x86_64 version.

I had suspected something was off on the aarch64 builds, so that's why I left that one as "unsure" status. Fixing this up is something I can try to take on.

@wkpark
Copy link
Contributor

wkpark commented Feb 6, 2024

can you point me out the link of arm64 CUDA artifact for Linux?
All libs found in the CUDA build for Linux aarch64 I’ve tested are identified x64 by file command.

@rickardp

The cuda versions are indeed built for arm64. Or are you saying that the arm64 image produces x64 output

The wheel in the artifact bdist_wheel_ubuntu-latest_aarch64_3.11.zip is tagged as x86_64: bitsandbytes-0.43.0.dev0-cp311-cp311-linux_x86_64.whl. The CUDA binaries appear to be correct aarch64, but the included CPU binary (libbitsandbytes.so) is the x86_64 version.

I had suspected something was off on the aarch64 builds, so that's why I left that one as "unsure" status. Fixing this up is something I can try to take on.

@matthewdouglas
Copy link
Member

@wkpark

From this run: https://github.com/TimDettmers/bitsandbytes/actions/runs/7806181468

shared_library_cuda_ubuntu-latest_aarch64_12.1.0

bdist_wheel_ubuntu-latest_aarch64_3.11 - Note that this wheel filename is x86_64 but the CUDA binaries are aarch64.

$ file libbitsandbytes_cuda121.so
libbitsandbytes_cuda121.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, 
BuildID[sha1]=01e1a24531d5c7741f06093d1ae8b1183ccb16e3, not stripped

$ file libbitsandbytes_cpu.so
libbitsandbytes_cpu.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked,
BuildID[sha1]=ea67203ec181c91354b0b6b7aac681d26b957e20, not stripped

$ file libbitsandbytes_cuda121_nocublaslt.so
libbitsandbytes_cuda121_nocublaslt.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, 
BuildID[sha1]=2ad481a3d9d9e76d1331cc6bfb7504f0db624854, not stripped

@MatthiasKohl
Copy link
Author

@matthewdouglas If I understand the build logs and the CI workflow here correctly, then for the build-shared-libs job, you're simply using the native architecture of the runner, so this will output x86 binaries even for aarch64 target.
OTOH, for build-shared-libs-cuda, you're using docker qemu to get a multi-platform image and then use the target arch correctly there.

So I think there's two solutions:

  1. Use native runners, e.g. CircleCI provides free ARM-based resources: https://circleci.com/product/features/resource-classes
  2. Use the multi-platform docker for the build-shared-libs job as well

Finally, I don't see the publish job here, but I don't see the different arch wheels on PyPi yet, so is that planned for the 0.43 release?
Thank you for looking into this!

@rickardp
Copy link
Contributor

rickardp commented Feb 7, 2024

Yes, something has stopped working. This used to work before all the rebasing craziness.

So I think there's two solutions

There's no need for native runners. The performant way is to simply cross compile, which is what I used to do. The docker-based approach would also work, but is slower. The latter is needed for CUDA only (as CUDA does not allow native cross compiling).

Finally, I don't see the publish job here

Publish to PyPi happens when you create a GitHub release. It's there but requires setting up the secrets. Possibly only @TimDettmers can do this?

@wkpark
Copy link
Contributor

wkpark commented Feb 7, 2024

we use cross compiler for CPU builds. in this cases cross compiler should set -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ for example. and for CUDA builds, we use docker --platform linux/${{ matrix.arch }} option (that was missing for some old test workflows).
(fixed in PR #1035 artifacts. https://github.com/TimDettmers/bitsandbytes/actions/runs/7815757145)

@matthewdouglas
Copy link
Member

There's no need for native runners. The performant way is to simply cross compile, which is what I used to do. The docker-based approach would also work, but is slower. The latter is needed for CUDA only (as CUDA does not allow native cross compiling).

@rickardp I agree the aarch64 CPU build can be cross-compiled. Eventually a solution is needed for running the tests both for CPU and GPU.

Finally, I don't see the publish job here, but I don't see the different arch wheels on PyPi yet, so is that planned for the 0.43 release? Thank you for looking into this!

@MatthiasKohl Everything for platforms other than Linux x86_64 is very much WIP. Eventually there's going to be release wheels, but it's still early in that process. The main branch is pretty active right now with PRs and merges.

@akx
Copy link
Contributor

akx commented Feb 22, 2024

Publish to PyPi happens when you create a GitHub release. It's there but requires setting up the secrets.

I would recommend setting up https://docs.pypi.org/trusted-publishers/ instead. See e.g. https://github.com/python-babel/babel/blob/40e60a1f6cf178d9f57fcc14f157ea1b2ab77361/.github/workflows/ci.yml#L83-L102 for an example.

@Titus-von-Koeller
Copy link
Collaborator

Hey all,

thanks very much for raising this, you were totally right. Since taking over maintenance we have worked hard to get on top of the maintenance backlog. Thanks to the wonderful collab with @matthewdouglas @akx @rickardp @wkpark we've finally managed to get this right with the most recent release. Therefore, I'm closing this issue. Thanks everyone for the patience and contributions!


Dear all,

Since the current release (last week, 8th of March) we now have official support for Windows 🎉 (which we did not have before) via

pip install bitsandbytes>=0.43.0

We're closing all old Windows issues and are asking everyone to try installing with this new version as outlined above and validate the install with python -m bitsandbytes which should spit out a bunch of stuff and then SUCCESS. Please let us know if everything worked correctly in this new umbrella / catch-all issue. Thanks 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants