Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support arm64 in cpubuilder dockerfile using multi-arch builds. #11

Merged
merged 3 commits into from
Sep 19, 2024

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Sep 19, 2024

This should let us replace https://github.com/iree-org/iree/blob/main/build_tools/docker/dockerfiles/base-arm64.Dockerfile with this cpubuilder dockerfile, making progress on iree-org/iree#15332. It uses a multi-architecture build rather than a fully forked file, which seems to work reasonably well with some local testing. I can even run the arm64 dockerfile on my x86_64 host.

Various related changes are included here:

@@ -73,7 +74,8 @@ jobs:
uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
with:
context: .
file: dockerfiles/cpubuilder_ubuntu_jammy_ghr_x86_64.Dockerfile
file: dockerfiles/cpubuilder_ubuntu_jammy_ghr.Dockerfile
platforms: linux/amd64,linux/arm64
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting these multi-arch/platform builds to show up in github packages like so: https://github.com/myoung34/docker-github-actions-runner/pkgs/container/docker-github-actions-runner/275948281?tag=ubuntu-jammy

image

@ScottTodd
Copy link
Member Author

cc @MacDue @banach-space

@banach-space
Copy link

cc @MacDue @banach-space

Nice! I've skimmed through and all looks good to me. From what I can tell, the only platform-dependant part is the installation of ccache, right? I can also give this a spin tomorrow to double check this works on Aarch64 hosts (pre-commit CI as a service 😅).

Btw, how can I add qemu locally?

@ScottTodd
Copy link
Member Author

From what I can tell, the only platform-dependant part is the installation of ccache, right?

Yep! The rest leans on already published multi-arch packages and portable scripts. Turns out multi-arch builds work well when they are single stage. We made things overcomplicated on iree-org/iree#14372 with the multistage builds (base-[platform] --> feature-a-[platform] --> feature-a-and-b-[platform])

I can also give this a spin tomorrow to double check this works on Aarch64 hosts (pre-commit CI as a service 😅).

We still have an arm64 runner in the IREE repo for now (that's the only runner that Google is still providing), so I was planning on porting https://github.com/iree-org/iree/blob/main/.github/workflows/ci_linux_arm64_clang.yml after landing this and getting the first images published to GitHub's registry. Might need a bit of back and forth as I can't test the workflows and dockerfile changes in a single PR.

Btw, how can I add qemu locally?

The install steps from
https://github.com/iree-org/iree/blob/782f372b070eadd593a727004cf61dc84aabc634/build_tools/docker/dockerfiles/base-arm64.Dockerfile#L75-L80
should still work, but I want to see if https://github.com/docker/setup-qemu-action solves the same problem, so then we can stop mirroring that file. Instructions for how to generate that file were posted here on Discord:

Hey, an easy way I've found to build for AArch64 on x86_64, is to run the Dockerfile under QEMU (turtles all the way down :P).

First run:
(Note, see: https://github.com/multiarch/qemu-user-static for more details, this sets up some binfmt_misc magic on your host system to allow it to run AArch64 (and other) binaries directly via qemu-*-user).

docker run --rm --privileged multiarch/qemu-user-static:register

Then make a folder somewhere with this Dockerfile: https://gist.githubusercontent.com/MacDue/65e1351e2edfacc1a6c52f8b86e77a1d/raw/b9aeacc27149190e633358ca6b68b6a1349dfe46/Dockerfile
Also, download the QEMU sources into that same folder via wget https://download.qemu.org/qemu-8.2.0.tar.xz (the download must be called qemu-8.2.0.tar.xz).

Finally, run: DOCKER_BUILDKIT=1 docker build --network host --file Dockerfile --output out .

Once the build completes ./out/qemu-aarch64 will contain an AArch64 binary for qemu-aarch64.

Copy link
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This look really nice!

@ScottTodd ScottTodd merged commit 38a7a45 into iree-org:main Sep 19, 2024
1 check passed
@ScottTodd ScottTodd deleted the cpubuilder-arm64 branch September 19, 2024 21:15
@@ -38,7 +38,8 @@ jobs:
uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
with:
context: .
file: dockerfiles/cpubuilder_ubuntu_jammy_x86_64.Dockerfile
file: dockerfiles/cpubuilder_ubuntu_jammy.Dockerfile
platforms: linux/amd64,linux/arm64
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/iree-org/base-docker-images/actions/runs/10949036140/job/30401394373#step:5:135

 ERROR: Multi-platform build is not supported for the docker driver.
Switch to a different driver, or turn on the containerd image store, and try again.
Learn more at https://docs.docker.com/go/build-multi-platform/
Error: buildx failed with: Learn more at https://docs.docker.com/go/build-multi-platform/

Aww... I hit the same error locally while developing. The recommended steps in those docs worked for me, but I need to see what the docker/build-push-action needs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, successful build and push. The workflow run time went up from 2 minutes to 20 minutes, in large part due to the ccache build from source.

Before: https://github.com/iree-org/base-docker-images/actions/runs/10948990985
After: https://github.com/iree-org/base-docker-images/actions/runs/10949139831

New package: https://github.com/iree-org/base-docker-images/pkgs/container/cpubuilder_ubuntu_jammy

@ScottTodd
Copy link
Member Author

I can also give this a spin tomorrow to double check this works on Aarch64 hosts (pre-commit CI as a service 😅).

We still have an arm64 runner in the IREE repo for now (that's the only runner that Google is still providing), so I was planning on porting https://github.com/iree-org/iree/blob/main/.github/workflows/ci_linux_arm64_clang.yml after landing this and getting the first images published to GitHub's registry. Might need a bit of back and forth as I can't test the workflows and dockerfile changes in a single PR.

Starting to test at https://github.com/iree-org/iree/actions/runs/10949782485/job/30403753082 with this code: https://github.com/iree-org/iree/compare/users/scotttodd/ci-arm64?expand=1. setup-qemu-action did not do what I wanted :P

ScottTodd added a commit that referenced this pull request Sep 20, 2024
Follow-up to #11.

This emulator was originally installed in
iree-org/iree#16331. We've been carrying it
around since then.

I'm not thrilled with the file sitting in a cloud storage bucket (GCS or
Azure). The file is 5MB so maybe we could check it in here via git LFS?
Or we could build it from source via our automation.
ScottTodd added a commit to iree-org/iree that referenced this pull request Sep 23, 2024
Progress on #15332. This was the
last active use of
[`build_tools/docker/`](https://github.com/iree-org/iree/tree/main/build_tools/docker),
so we can now delete that directory:
#18566.

This uses the same "cpubuilder" dockerfile as the x86_64 builds, which
is now built for multiple architectures thanks to
iree-org/base-docker-images#11. As before, we
install a qemu binary in the dockerfile, this time using the approach in
iree-org/base-docker-images#13 instead of a
forked dockerfile.

Prior PRs for context:
* #14372
* #16331

Build time varies pretty wildly depending on cache hit rate and the
phase of the moon:

| Scenario | Cache hit rate | Time | Logs |
| -- | -- | -- | -- |
Cold cache | 0% | 1h45m |
[Logs](https://github.com/iree-org/iree/actions/runs/10962049593/job/30440393279)
Warm (?) cache | 61% | 48m |
[Logs](https://github.com/iree-org/iree/actions/runs/10963546631/job/30445257323)
Warm (hot?) cache | 98% | 16m |
[Logs](https://github.com/iree-org/iree/actions/runs/10964289304/job/30447618503?pr=18569)

CI history
(https://github.com/iree-org/iree/actions/workflows/ci_linux_arm64_clang.yml?query=branch%3Amain)
shows that regular 97% cache hit rates and 17 minute job times are
possible. I'm not sure why one test run only got 61% cache hits. This
job only runs nightly, so that's not a super high priority to
investigate and fix.

If we migrate the arm64 runner off of GCP
(#18238) we can further simplify
this workflow by dropping its reliance on `gcloud auth
application-default print-access-token` and the `docker_run.sh` script.
Other workflows are now using `source setup_sccache.sh` and some other
code.
ScottTodd added a commit to iree-org/iree that referenced this pull request Sep 27, 2024
In iree-org/base-docker-images#11, I replaced
the single-architecture `cpubuilder_ubuntu_jammy_x86_64` dockerfile with
a multi-architecture `cpubuilder_ubuntu_jammy` dockerfile. This checks
that the `linux/amd64` platform build of the dockerfile still works for
our usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants