Distribute pip wheels for the architecture they are built for #1043

MatthiasKohl · 2024-02-06T15:34:44Z

System Info

Any platform other than x86_64

Reproduction

On e.g. aarch64, pip install bitsandbytes.

Expected behavior

pip install bitsandbytes will succeed at first on aarch64 because the distribution on PyPi has the any architecture.
However, the distributed library is built for x86_64, and this cannot work.

It would be expected that there is a pip distribution for each architecture supported by bitsandbytes.

The text was updated successfully, but these errors were encountered:

matthewdouglas · 2024-02-06T18:03:56Z

Hi @MatthiasKohl,

At the moment the only supported platform for the PyPI wheels is manylinux1_x86_64. However, please see #997 and the ongoing discussion in #1032 where this is being addressed.

Status overview:

Windows x86_64 + CUDA support is well underway. The main branch is buildable!
Linux aarch64 + CUDA builds are part of the GH Actions workflows, though I am not sure of status.
Support for macOS arm64 + MPS is discussed in [RFC] Cross-Platform Refactor: Mac M1 support #1020.
Support for CPU-only x86_64 (Windows/Linux) and aarch64 (Linux) is discussed in [RFC] Cross-Platform Refactor: CPU-only implementation #1021
Intel has been working on backends for their CPUs and GPUs: [RFC] Extend bitsandbytes to support Intel hardware platforms #894, Enable common device abstraction for 8bits/4bits #898
Draft for Linux x86_64 + AMD ROCm is Add ROCm support #756

rickardp · 2024-02-06T18:17:04Z

@MatthiasKohl Can you try the wheels built here?

https://github.com/TimDettmers/bitsandbytes/actions/runs/7792455224

They are built for Linux aarch64, but I don't have an Arm device that can run Cuda so I never tested them. It would be very valuable to get some feedback on how they run. I believe the source level changes were tested on a Jetson device earlier

wkpark · 2024-02-06T19:03:52Z

Linux cuda arm64 build is not arm64. it is x64_86. (Linux cpu arm64 is just fine.)

rickardp · 2024-02-06T21:42:39Z

Linux cuda arm64 build is not arm64. it is x64_86. (Linux cpu arm64 is just fine.)

https://github.com/TimDettmers/bitsandbytes/blob/88ab630315d9a79973302182d79653b1dfa0918a/.github/workflows/python-package.yml#L138

The cuda versions are indeed built for arm64. Or are you saying that the arm64 image produces x64 output

matthewdouglas · 2024-02-06T23:17:08Z

@rickardp

The cuda versions are indeed built for arm64. Or are you saying that the arm64 image produces x64 output

The wheel in the artifact bdist_wheel_ubuntu-latest_aarch64_3.11.zip is tagged as x86_64: bitsandbytes-0.43.0.dev0-cp311-cp311-linux_x86_64.whl. The CUDA binaries appear to be correct aarch64, but the included CPU binary (libbitsandbytes.so) is the x86_64 version.

I had suspected something was off on the aarch64 builds, so that's why I left that one as "unsure" status. Fixing this up is something I can try to take on.

wkpark · 2024-02-06T23:33:46Z

can you point me out the link of arm64 CUDA artifact for Linux?
All libs found in the CUDA build for Linux aarch64 I’ve tested are identified x64 by file command.

@rickardp

The cuda versions are indeed built for arm64. Or are you saying that the arm64 image produces x64 output

The wheel in the artifact bdist_wheel_ubuntu-latest_aarch64_3.11.zip is tagged as x86_64: bitsandbytes-0.43.0.dev0-cp311-cp311-linux_x86_64.whl. The CUDA binaries appear to be correct aarch64, but the included CPU binary (libbitsandbytes.so) is the x86_64 version.

I had suspected something was off on the aarch64 builds, so that's why I left that one as "unsure" status. Fixing this up is something I can try to take on.

matthewdouglas · 2024-02-07T00:45:07Z

@wkpark

From this run: https://github.com/TimDettmers/bitsandbytes/actions/runs/7806181468

shared_library_cuda_ubuntu-latest_aarch64_12.1.0

bdist_wheel_ubuntu-latest_aarch64_3.11 - Note that this wheel filename is x86_64 but the CUDA binaries are aarch64.

$ file libbitsandbytes_cuda121.so
libbitsandbytes_cuda121.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, 
BuildID[sha1]=01e1a24531d5c7741f06093d1ae8b1183ccb16e3, not stripped

$ file libbitsandbytes_cpu.so
libbitsandbytes_cpu.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked,
BuildID[sha1]=ea67203ec181c91354b0b6b7aac681d26b957e20, not stripped

$ file libbitsandbytes_cuda121_nocublaslt.so
libbitsandbytes_cuda121_nocublaslt.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, 
BuildID[sha1]=2ad481a3d9d9e76d1331cc6bfb7504f0db624854, not stripped

MatthiasKohl · 2024-02-07T08:46:16Z

@matthewdouglas If I understand the build logs and the CI workflow here correctly, then for the build-shared-libs job, you're simply using the native architecture of the runner, so this will output x86 binaries even for aarch64 target.
OTOH, for build-shared-libs-cuda, you're using docker qemu to get a multi-platform image and then use the target arch correctly there.

So I think there's two solutions:

Use native runners, e.g. CircleCI provides free ARM-based resources: https://circleci.com/product/features/resource-classes
Use the multi-platform docker for the build-shared-libs job as well

Finally, I don't see the publish job here, but I don't see the different arch wheels on PyPi yet, so is that planned for the 0.43 release?
Thank you for looking into this!

rickardp · 2024-02-07T09:02:46Z

Yes, something has stopped working. This used to work before all the rebasing craziness.

So I think there's two solutions

There's no need for native runners. The performant way is to simply cross compile, which is what I used to do. The docker-based approach would also work, but is slower. The latter is needed for CUDA only (as CUDA does not allow native cross compiling).

Finally, I don't see the publish job here

Publish to PyPi happens when you create a GitHub release. It's there but requires setting up the secrets. Possibly only @TimDettmers can do this?

wkpark · 2024-02-07T11:14:00Z

we use cross compiler for CPU builds. in this cases cross compiler should set -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ for example. and for CUDA builds, we use docker --platform linux/${{ matrix.arch }} option (that was missing for some old test workflows).
(fixed in PR #1035 artifacts. https://github.com/TimDettmers/bitsandbytes/actions/runs/7815757145)

matthewdouglas · 2024-02-07T14:39:07Z

There's no need for native runners. The performant way is to simply cross compile, which is what I used to do. The docker-based approach would also work, but is slower. The latter is needed for CUDA only (as CUDA does not allow native cross compiling).

@rickardp I agree the aarch64 CPU build can be cross-compiled. Eventually a solution is needed for running the tests both for CPU and GPU.

Finally, I don't see the publish job here, but I don't see the different arch wheels on PyPi yet, so is that planned for the 0.43 release? Thank you for looking into this!

@MatthiasKohl Everything for platforms other than Linux x86_64 is very much WIP. Eventually there's going to be release wheels, but it's still early in that process. The main branch is pretty active right now with PRs and merges.

akx · 2024-02-22T06:22:43Z

Publish to PyPi happens when you create a GitHub release. It's there but requires setting up the secrets.

I would recommend setting up https://docs.pypi.org/trusted-publishers/ instead. See e.g. https://github.com/python-babel/babel/blob/40e60a1f6cf178d9f57fcc14f157ea1b2ab77361/.github/workflows/ci.yml#L83-L102 for an example.

Titus-von-Koeller · 2024-03-15T19:10:38Z

Hey all,

thanks very much for raising this, you were totally right. Since taking over maintenance we have worked hard to get on top of the maintenance backlog. Thanks to the wonderful collab with @matthewdouglas @akx @rickardp @wkpark we've finally managed to get this right with the most recent release. Therefore, I'm closing this issue. Thanks everyone for the patience and contributions!

Dear all,

Since the current release (last week, 8th of March) we now have official support for Windows 🎉 (which we did not have before) via

pip install bitsandbytes>=0.43.0

We're closing all old Windows issues and are asking everyone to try installing with this new version as outlined above and validate the install with python -m bitsandbytes which should spit out a bunch of stuff and then SUCCESS. Please let us know if everything worked correctly in this new umbrella / catch-all issue. Thanks 🤗

rickardp mentioned this issue Feb 7, 2024

Fix cross compilation on linux #1050

Merged

Titus-von-Koeller closed this as completed Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribute pip wheels for the architecture they are built for #1043

Distribute pip wheels for the architecture they are built for #1043

MatthiasKohl commented Feb 6, 2024

matthewdouglas commented Feb 6, 2024

rickardp commented Feb 6, 2024

wkpark commented Feb 6, 2024 •

edited

Loading

rickardp commented Feb 6, 2024

matthewdouglas commented Feb 6, 2024

wkpark commented Feb 6, 2024

matthewdouglas commented Feb 7, 2024

MatthiasKohl commented Feb 7, 2024

rickardp commented Feb 7, 2024

wkpark commented Feb 7, 2024 •

edited

Loading

matthewdouglas commented Feb 7, 2024

akx commented Feb 22, 2024

Titus-von-Koeller commented Mar 15, 2024

Distribute pip wheels for the architecture they are built for #1043

Distribute pip wheels for the architecture they are built for #1043

Comments

MatthiasKohl commented Feb 6, 2024

System Info

Reproduction

Expected behavior

matthewdouglas commented Feb 6, 2024

rickardp commented Feb 6, 2024

wkpark commented Feb 6, 2024 • edited Loading

rickardp commented Feb 6, 2024

matthewdouglas commented Feb 6, 2024

wkpark commented Feb 6, 2024

matthewdouglas commented Feb 7, 2024

MatthiasKohl commented Feb 7, 2024

rickardp commented Feb 7, 2024

wkpark commented Feb 7, 2024 • edited Loading

matthewdouglas commented Feb 7, 2024

akx commented Feb 22, 2024

Titus-von-Koeller commented Mar 15, 2024

wkpark commented Feb 6, 2024 •

edited

Loading

wkpark commented Feb 7, 2024 •

edited

Loading