-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for macOS universal2 builds for ARM-based Macs #473
Comments
Looks like a new version of wheel with pypa/wheel#390 will be out soon. Once that's out, if we include that version and use Python 3.9.1, then we should be able to build a universal2 wheel (though I'm not sure how to enable it yet). We will need to download the Universal2 installer instead of the regular one. It's got a 11.0 in the name, but it is supposed to work with 10.9+. |
If I good understand It may also need macOS 11.0 as base system (maybe 10.15). If I good remember there should be Xcode version check before the decision about installer version. |
Xcode 12, so yes. |
The universal2 support in CPython currently requires building on macOS 11. That is not a hard system requirement though, the actual requirement is using Xcode 12 (or the command line tools for Xcode 12). The current code looks at the macOS version because that's was easier to get going. I'll definitely look into replacing that code by something that tests for the compiler version instead of the macOS version. That said, I wouldn't mind if someone provided a PR for that ;-) |
Being worked on in #484. |
Just my 2 cents about building From a packager point of view (so, here, user of
From an end-user point of view (I'm a Mac end-user if that matters):
This means if wheels are all packaged as To sum-up, am I against Reading material that might be of interest: |
Thank you @mayeut. You know, your argument is making a lot of sense to me. cibuildwheel offers to 'build and test your wheels on all the platforms', but when making a universal2 wheel on x86_64, the arm64 part is completely untested. Adding onto that the fact that Apple have said Apple Silicon is the future of the mac, IMO it is only a matter of time (probably a few months) before we see CI services offering macOS-arm64 runners. Once we live in that world, it's clear that the best way to run cibuildwheel will be to run one x86_64 runner to build & test x86_64 wheels, and run another arm64 runner for the arm64 wheels. So I believe we should treat this early universal2 support as a stop-gap - something that users might choose to opt-in to, but we should save our ultimate API design for a world where Mac arm64 CI exists. |
@mayeut I agree with you. But the problem is that Python provides I think that now is a good time to go back to #317. Then it will be possible to test univeral wheel on both machines. I think that MacOS x86_64 will have long support. I love the idea to have a selection of build universal wheel or separate wheel per architecture. I still think that part of this job should be done in delicate, which should check if the wheel is really universal. |
Assuming any M1 runners have a full install of macOS they can run both x86_64 and arm64 code. I've been using this feature to test my own projects (for a long time, the same mechanism works with older fat binaries). steps:
|
So if the package provides only There are plans to provide pure @ronaldoussoren Did you know how will behave pip from |
Did you have to make fat extension modules with the fat binaries before? (Python 3.5 is a fat binary, FYI) So can't you make x86_64, arm64, and universal2 extensions from a Python Universal2 install? I don't think there's as much point in providing separate Python installers, Python just isn't that big. But extensions can be huge. And if users provide a x86_64 wheel anyway, why not make the other one ARM only? The only benefit to a universal wheel is you can download it once and run on both arch's. But how often is your disk connected to two different arch computers? Usually, you do a separate package directory for each runner even in HPC/cloud - and for personal computers, maybe useful if you share a folder via the cloud - but you shouldn't do this with environments in general. The one use case could be making zip apps, but those really don't get used for binary code already due to OS difference (honestly, don't see them used much at all).
The problem is that it will be a while before we have M1 runners on CI. I don't think Apple provides a arm64 emulator for Intel...
Pip currently selects the most specific wheel. So you can put a pure-python universal wheel and a set of binary ones, and if the platform matches, you get the binary one, otherwise you get the universal one. So I would assume arm would match before universal2. |
That's interesting. So once we have M1 runners, we can do both x86_64 and arm64 tests on the same machine. Still, I think there's an argument that separate x86_64 and arm64 build/test runs might be preferable (rather than building in two test steps for universal2 wheels), see @mayeut's other reasons above.
No, they do not. So we do need a universal2 solution, at least in the short term. But I'm happy for it to be a little imperfect (we probably won't be able to test arm64 portions, initially), so long as we know where we want to end up once Apple Silicon CI runners are available.
This matches my understanding, too. I think (somebody please correct me if I'm wrong) the wheels are chosen in the order of this list. Note that the native arch |
As far as I can tell, universal was added 23 days ago to that list. So that means that any copy of pip more than 23 days old will not be able to download and use a universal2 wheel on an Intel machine? Actually less than that, since it had to be updated via the vendoring and released. If that's the case, we really cannot stop making x86 wheels any time soon - so it really would be nice to be able to make an ARM wheel. I don't see technically why you couldn't compile an ARM-only wheel on an Intel machine if you can compile a Univeral2 one. |
In #202 (dealing with tests for
You can run
I agree, it might require tricks like the one done for python 3.5 (and mentioned in #484 (comment)) |
So, building on top of #482:
When we start getting Apple Silicon runners, then we can make sure the Note 1: If you wanted to do native on Windows & Linux, but Universal2 on macOS, it might be hard to nicely write this without specific environment variables like |
Yeah, I think that's pretty much bang on, there @henryiii. I think I agree with all of it. I'll just add, we have to consider what we do with build identifiers as well. I think once we have the It's worth mentioning, people might be getting confused about how build identifiers and BUILD/SKIP differ from the |
Yes, the one somewhat open point was that these mix. Say I don't want to do emulation (it's slow), but I do want to build Apple Silicon wheels. If I just do |
I'd like to see that packages start providing "universal2" wheels instead of architecture specific ones.
Not at this time. The current plan is to drop the x86_64 installer for Python 3.10 and only have a universal2 installer for now.
I don't know for sure, but looking at the packaging code I'd say that the pip will install a native package when available. |
For a while, almost all libraries will have to produce both x86_64 and either universal2 or arm64_64. It makes more sense / saves bandwidth and disk space to produce x86_64 and arm64. Once pip 20.3 is very common, then they could be combined - though since pip downloads the correct file automatically, I don't really see any advantage to Universal2 unless you are building by hand - something like cibuildwheel is just as happy making both sets of wheels. Or possibly if you are including a The built-in Python 3.8 and homebrew's Python 3.9 on macOS 11 do not include pip 20.3 yet. Which is really bad, actually, since macOS 11 even on Intel still requires 20.3 to download even regular wheels. Everything breaks immediately on trying to build NumPy. But at least I asked for it by updating to 11.0. :) But for 10.15 and before, there are a lot of older Pip's and having things like NumPy try to download from source because there's only a Universal2 wheel would be a disaster. So libraries have to provide two wheels for now.
Since we can build all three from this one installer, maybe we should likely just switch to using this installer for 3.9 (like we only download the non-fat installers for all Python's that support it), and then use our workaround to build x86_64 unless asked to do differently. Unless for some reason it can't build |
The advantage of a univeral2 wheel is that there's only a single wheel for macOS, not multiple. As I wrote earlier it is possible to build and test both architectures in a single go when using an M1 builder. Building "universal2" wheels is pretty trivial, this works out of the box when you don't have to build C libraries and most libraries I've wrapped myself build as universal out of the box as well (one exception to the rule are libraries that compile different sets of files for different architectures, such as OpenSSL) The users that really need this are those that redistribute wheels and in particular users of py2app, pyinstaller and the like. With universal2 wheels it is possible to build an application bundle that's a Universal Application. With per-architecture wheels this is close to impossible because those tools use the packages installed on the system. So, please provide an easy way to build "universal2" wheels. |
Note that most software will have the same test results for both architectures. In the past the exception to this were low-level packages using architecture-specific code (for example by using libffi). This time there are some system level changes as well, although the only ones I know of are (1) all arm64 code must be signed (the compiler will automatically add ad-hoc signatures for this), en (2) the low-level mach timer APIs have a different resolution. To make testing fun: I've seen some reports that the Rosetta emulation software does not implement some vector instructions. That could affect testing some numeric code when optimising for the x86_64 CPU's in Apple hardware. |
One of the biggest things to be checked in the test is verification if all needed libraries are available. As was mentioned above the biggest problem with intel wheels is the lack of some dependencies. |
The time penalty for translation shouldn't be an issue for CI, even for interactive use the overhead of initial translation isn't too bad (and that's on my DTK, M1 systems should be significantly faster). The primary issue with Rosetta 2 is that this is optional software, which means future M1 CI runnings in the various public CI systems might not have it installed.
From what I've read (no references, sorry) using unsupported instructions will crash as runtime, testing for them at runtime (IIRC using CPUID) should work. That requires explicit support in software and likely isn't done (especially because clang and mach-o don't support the GCC function attribute 'target_clones' that allows compiling a function for a number of CPUs with dynamic selection of the best variant) BTW. Isn't "not being able to test all SIMD variants" an issue in general unless you can arrange to run tests on a system that supports all those variants?
I can test, with the caveat that I only have access to a DTK system and not an M1 system. I'm not sure if that effects Rosetta 2 emulation. I have ordered an M1 laptop though, and that should arrive this year.
The code suggests as much. That's something I don't like at all, I'd prefer to get a "universal2" wheel when running a "universal2" python.
As mentioned elsewhere I'd prefer to see "universal2" wheels everywhere and no architecture specific wheels. I guess it is not possible to avoid building x86_64 wheels for now because you need a pretty recent copy of pip for "universal2" support, but other than that architecture specific wheels have no clear advantages and do have a disadvantage: building a Universal Application using py2app and pyinstaller requires using "universal2" wheels. |
Thanks for that feedback, do you have estimates on translation times ? let's say with a first translation of
Yes, you're right. Even though I doubt we have or will ever have any way to test
Thanks, here are the steps (not sure
Looking forward to see if it works and how many tests are skipped.
Let me try to answer all those points.
As an end-user on macOS, I don't want my (costly) SSD running out of space because of things that are not needed that means I would expect the installer to strip unneeded arch at installation stage (not applicable for "portable" apps or more advanced usage depending on your target audience) and that when I do Given the use-case you propose, I would probably be in favor of having the 3 wheels built as a default setting (i.e. not choose between architecture specific on one hand and
This would probably require some changes to occur in End-users would get smaller wheels while " This would also probably require some changes in Regarding I think that this issue raises points that are going far beyond the scope of |
Good idea. |
Chiming in on the way this gets added to
So I'm hesitant to mix the two meanings of But maybe this is a discussion that's more appropriate in #484, as these are for |
I quite like that idea, if that's possible! One caveat is that I would expect/prefer to see exactly one flow that's advised; it's going to be only confusing if there's multiple ways of doing things without a good technical reason to do so? At any rate, the |
While it depends on how this shapes up, I'm strongly in favor of Cross-compilation (still referring to Linux here) is harder. It's extremely useful - it lets you save most of that 10x speed penalty mentioned above. But it's hard - you have to set things up to target something you are not running on, and things like setuptools even seem to hard code in incorrect shebang lines when they make scripts if you are cross-compiling - the idea that the running Python is the target Python is harder to wrap one's head around. macOS is special - though it's much closer to cross-compiling than it is to emulation (at least, building AS on Intel is). Because Apple controls the toolchain, excellent support for cross compiling, and specially "universal" compiling where both possibilities are compiled, is pretty easy and commonly supported. Many programs (for now) are shipping in only universal forms - like CMake (which is currently causing me pain due to the filename change in several places). And this is the direction that Python is moving. However, there's a big difference for Python packages - those are almost always downloaded by your package manager (pip), not the user. Other package managers universally (no pun intended) are not using Universal downloads - Conda, Homebrew, etc. They (including pip) already have multiple downloads for different situations, and adding x86_64 and ARM64 is not an issue at all; and the space and download savings is 2x! A large Python environment with something like PyTorch or Tensorflow can be over a GB when you factor in dependencies; if those packages only shipped universal wheels, both arch's would have venvs that would double in size. PyPI's downloads for macOS would double, etc. Now if Pip for example could strip a universal wheel when it unpacks it, the storage space issue would be solved - I have no idea what's possible for merging and splitting universal binaries/wheels. There are a few reasons to like universal wheels, yes, but most users creating an environment with pip will be adversely affected by being forced to download universal wheels when their package manager knows exactly what arch it's on. Imagine if we had universal wheels for linux, that packed i686, x86_64, ARM64, PowerPC, etc? That would be a mess, and I don't see why only universal wheels for macOS is much better. You only have a few copies of Python, so having that universal is fine - if it's a few MB, it's not a big deal. (And I'd always get Python from homebrew anyway - I don't think I have ever downloaded the one from Python.org to one of my Macs). Now, for cibuildwheel users, there are several possibilities:
Anyway, getting back to the topic at hand, armed with the points above: Selecting the cross compilation arch and the emulation arch conceptually are a bit irritating when mixed: If you have to add
For each selector, it only is enabled if supported. If both
Sadly, there is some platform overlap - if you want to only emulate-build arm64 linux, but also want arm64 macOS builds, and we've added cross-compile support, you'd need separate runs of cibuildwheel to support that. But it's pretty minimal. Once we support building on Apple Silicon hosts, then |
(Note, this is the design I'm thinking of, not averse to others, but haven't thought them through as much) |
When we talk about macos problems. Did the new clang version supports OpenMP? Or, if code needs its usage, then still gcc usage is mandatory? Because I meet packages compiled with gcc, which may not support unversal2 compilation. |
I've not thought that much in terms of implementation details or even actual user facing options yet (might have time to do that the week after next).
Well, if we want only 1 workflow, the "complex" one will always work. I can only see 2 issues with this flow:
It will require support, probably in In the meantime, the
Yes, all the tools exist for this. I think the best place to integrate this would be in I can think of 3 cases for dependencies if building using
IMHO, given those 3 cases, I think the build twice / merge once is the only option that works everywhere and that building (rather than merging) This flow might create some concerns once Apple Silicon runners are available in CI. |
I feel we (cibuildwheel) probably shouldn't try too hard to force a particular workflow - exactly the same thing will not work across projects, especially this early stage. I do think the simplest thing for most projects, and the path officially supported by Apple for applications (as I mentioned, I think packages in a package managed system are slightly different) is Universal binaries - it's quite possible the "simplest" workflow would be to build universal binaries, and then split off at least an x86_64 wheel. Only if a package cannot build universal should we have the workaround path of building separately (and, in the future, this path may end up becoming the main one if Apple Silicon runners become common, with a merge instead of a split). Building packages could be quite tricky without Universal wheels for all dependencies. What happens if I build a package and it relies on libX via pyproject.toml - if setup.py imports it, I need the x86_64 or Universal wheel, then if I build against it, I need the arm64 or Universal wheel. Without Universal wheels, pip has to know about what I'm trying to build to get the right package. Pip either has to be smart about what I'm doing or I have to have a way to force universal wheels even when there's a better match (which is exactly the right behavior most of the time). One argument against
Clang has supported OpenMP for at least three years, maybe more? I know I first wrote about it for High Sierra. Apple doesn't build libomp for you and therefore it's not as simple as |
Phew. Great discussion. Lots of decisions to be made here. It seems that we've agreed that while To move forward, we'll need an API design that works today (when M1 CI runners don't exist), but will grow to work when they arrive, and will fit into the So to think about the interface, I like @henryiii 's line of thinking as a starting point.
This problem goes away if we have an
I quite like this approach. This is certainly expressive enough to cover the possibility space and give the user enough control. I do wonder, though, to @YannickJadoul's point, if we might be over-complicating this a little. That said, if we can design good enough defaults, maybe most users wouldn't need to touch it. So, for defaults, let's try a scenario: Option 1
So in this case, the x86_64 runner builds and x86_64 wheel, and universal2 is built there but not tested. The arm64 runner builds and tests This kinda stinks, because the user has to manually configure somewhere in the CI from which runner to get the universal2 wheel, and the arm64 one is actually somehow 'better' because it's been tested. It's also overspecified, because having an arch in CROSS_COMPILE_ARCHS and ARCHS means cross compile and emulate to test, but then what does having So maybe rather than Option 2
Now, setting Option 3
Then in the interim, before arm64 runners are available, we document that to get a universal2 wheel, set Option 4Or, even the simplest option of all, would be to not even use Apologies for the long post. But it's been useful to run a few scenarios. I'm leaning more towards option 3, myself, because once arm64 runners arrive, it will provide the simplest config whose defaults do the 'most right' thing, and it doesn't seem crazy hard to understand. But option 4 also has some merits, in that it's simpler, less to understand, and since x86_64 will likely phase out in the long-term, will probably be where we all end up. But it might take us 2-3 years to get to that point! Curious to hear opinions. |
The reason for having variables in the first place come down to these two points:
Universal vs. native is not in the list above - that's a bit of a special case. The issue that might come up is that How about PS: Note that "universal2" is not really an arch, and it's already macOS specific. So adding it to ARCHS seems odd - the above avoids that. In regards to an earlier point: I think Pip 20.3+ will be common on macOS well before it becomes common on Linux (CentOS 7 has Pip 9), but Homebrew and Apple's command line tools still both provide < 20.3. In fact, cibuildwheel also does until the current update PRs go in! :) |
Checking back in here after some work has gone into #484. Currently, the strategy I'm working on is that we'll use the existing
I realise this isn't a perfect solution - universal2 isn't really an arch, and there's no distinction between cross-compiling and emulation in our API. But I think those concerns are mostly theoretical, and this provides a pragmatic way forward without increasing the API surface area too much. An aside: while working on this, I had some vague thoughts about our build selectors (#516) |
There is a plan to change installer from x86_64 to universal. Please see:
pypa/wheel#387 (comment)
The text was updated successfully, but these errors were encountered: