-
-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing symbol error when running older versions of MacOS #122
Conversation
4eccf3b
to
4f39a0b
Compare
I'll look at the diff later, but my recollection is CPython didn't do weak linking properly until a 3.9 point release. I believe things were finally implemented somewhat properly when Apple M1s became a thing and they were forced to use the macOS 11.0 SDK to target M1 and this exposed them to portability issues when x86-64 binaries were built with the 11.0 SDK. It took them a while to do weak linking properly, as there were definitely random symbols not doing it properly. And my recollection is some initial attempts at introducing weak linking didn't actually work properly. I think that's why there is code in this repo for force disabling specific symbols. The code to support weak linking is scattered all over the CPython code base and much of it was backported. Since much of it was all tied up in M1 support and 3.8 didn't get those backports until much later, I wouldn't at all be surprised if 3.8 has bugs in this area. As for this project, I very much dislike the fact that we prevent the use of some new symbols instead of using weak linking. This does have performance implications and I'd love to see us do the right thing. A thing I struggle with is automated enforcement of symbol visibility. The Rust code in this repo for verifying some binary properties does detect some banned symbols. But it doesn't currently recognize - and allow - weakly linked symbols. I'd love to change this. I suspect there's even room to parse Another issue here is presence of weak symbols impacts downstream projects linking the raw object files. We might need to annotate weak symbols or other linker metadata in the |
Oh, and something else to keep in mind is that because this project doesn't (yet) produce universal/fat Mach-O binaries, we can get away with requiring macOS 11.0+ targeting for aarch64. So we likely don't need to disable symbols on aarch64 Darwin unless they were introduced past macOS 11.0. |
I have some patches locally to look for weak references in the Mach-O files. Throwing it at the most recent distributions exposes the following weak references: 3.8 aarch64:
3.8 x86-64
3.9 aarch64:
3.9 x86-64
3.10 aarch64:
3.10 x86-64
I just thought this would be an interesting reference to have. In theory, all of the weak references should correspond to symbols in macOS 10.10+ (at least for x86-64 builds) because one would think CPython would only make the symbol weak when targeting an older macOS platform level than when the symbol was introduced. However, I'm not sure if this is actually true. |
macOS SDKs have TBD files, which are YAML files that define library information, such as which symbols are exported. We can use these TBD files to validate that symbol references in our Mach-O files reference symbols that actually exist in the targeted SDK version. This commit implements that functionality. There are a few issues with this: 1. Older SDKs don't have `SDKSettings.json` files. So our SDK search is failing. 2. The 10.9 SDK (which we still target) doesn't ship TBD files. So in the current implementation we only validate down to the 10.14 SDK. This is not sufficient to detect the bug that #122 purportedly addresses.
Building on top of the previous commit, this commit teaches our Apple SDK validation to handle Apple SDKs with only an SDKSettings.plist, not an SDKSettings.json. This enables us to parse macOS SDKs 10.10+. 10.9 SDKs still don't validate since they lack .tbd files. With this change, I'm still not seeing any missing symbols. So if #122 is fixing something, it must be with the 10.9 SDK or there must be an error elsewhere, possibly in this validation code.
I haven't forgotten about this PR! I've been wanting to add automation that validates macOS portability of the resulting binaries for a while now and this PR spurred me to do it. Partially because unwanted symbols have caused errors and regressions. Partially because I want to do the right thing and weakly link as much as possible so distributions run on modern macOS get the benefits the code is capable of. Anyway, as the timeline shows I just pushed some commits that attempt to validate undefined Mach-O symbols against the state as advertised by the macOS SDKs. I think I've got non-weak symbol validation working against SDKs 10.10+. No 10.9 at the moment because 10.9 doesn't ship YAML The code as written doesn't show any problems with missing symbols for current distributions for all SDK versions 10.10+. So now I just have more questions. Critically, why are you getting undefined symbols but my validation isn't saying they are missing? Either you are doing something causing an introduction of the non-weak symbols and/or my validation code isn't correct. Could you please provide more details to help debug this? Which symbol(s) are undefined? And what are steps to reproduce the undefined symbol errors? |
Thanks for the update! I have a small binary built with pyoxidizer that packages some pip dependencies and mostly acts as Python. When launching this executable on 10.10 and 10.11 I would get a linking error because of
Looking at the symbols, it's undefined but doesn't seem to be weakly linked, which would explain the above.
(the Let me try to first put together a small repro case so that I can also try it against your latest commits. |
ok here's a small repro case:
When run on a 10.10 machine:
And yep, this is a 3.8 issue, switching |
Tried building with a standalone python 3.8 built off latest main and getting the same linking error. |
I want to say this is a bug in PyOxidizer then. But I'm the author of that project too, so we can continue discussing it here. On my Intel MBP running and using the 12.3 SDK to build targeting 10.9, Do you see that same output? ( If you don't, then that's a bug in the build reproducibility of PyOxidizer. Probably something in an environment variable or config file somewhere changing defaults. If you do see the same output, then it could be a bug in Python 3.8 code for dispatching to weak symbols. As I said before, my recollection is CPython didn't do weak linking correctly until 3.9. I believe those patches were backported to 3.8 and they may have made mistakes with the backports. e.g. there might be call sites where CPython isn't doing the runtime availability checks before calling into a symbol. |
ah you're right, I am also seeing |
Actually, I think I can save you some work: this is a bug in CPython 3.8 not doing runtime availability checks. Compare the following:
There's no availability attributes on 3.8. Without the guard in place, it tries to resolve the weak symbol at runtime, leading to failure on older macOS versions. I'll report the issue to CPython. |
Ah! well that explains it :) |
Actually, I may not report the bug: python/cpython@b29d0a5 says that they intentionally omitted weak linking support from the 3.8 backport to reduce scope. There are references to this decision at python/cpython#85272 (comment). There's probably more discussion floating around. But I can't make sense of things after the migration of the Python issue tracker to GitHub. So, uh, I'm not sure where that leaves us. My knee jerk is I think that means we need to avoid ~all weakly linked symbols coming from CPython 3.8 since they lack runtime availability guards? Note there does appear to be runtime checking for |
The bane of weak symbols on macOS has come back to haunt us. (See indygreg/PyOxidizer#373 for previous battles.) In #122 we tracked down a runtime failure to the fact that CPython 3.8 didn't properly backport weak symbol handling support. So, if you build with a modern SDK targeting an older SDK (which we do as of 63f13fb), the linker will insert a weak symbol. However, CPython doesn't have the runtime guards and will attempt to dereference it, causing a crash. Up to this point, our strategy for handling this mess was to stop using symbols on all Python versions when we found one to be causing an issue. This was crude, but effective. In recent commits, we implemented support for leveraging the macOS SDK .tbd files for validating symbol presence. We can now cross reference undefined symbols in our binaries against what the SDKs tell us is present and screen for missing symbols. This helps us detect strong symbols that aren't present on targeted SDK versions. For weak symbols, I'm not sure if we can statically analyze the Mach-O to determine if a symbol is guarded. I _think_ the guard is a compiler built-in and gets converted to a function call, or maybe inline assembly. We _might_ have to disassemble if we wanted to catch unguarded weakly referenced symbols. Yeah, no. In this commit, we effectively change our strategy for weak symbol handling. Knowing that CPython 3.9+ should have guarded weak symbols everywhere, we only ban symbol use on CPython 3.8, specifically x86-64 3.8 since the aarch64 build targets macOS SDK 11, which has the symbols we need. We also remove the one-off validation check for 2 banned symbols. In its place we add validation that only a specific allow list of weak symbols is present on CPython 3.8 builds. As part of developing this, I found yet more bugs in other programs. CPython had some pragmas forcing symbols to be weak but the pragmas weren't protected by an #if guard. This caused a compiler failure if we prevented the symbols from being defined. libffi was also using mkostemp without runtime guards. I'm unsure if Python would ever call into a function that would attempt to resolve this symbol. But if it does it would crash on 10.9. So we disable that symbol for builds targeting 10.9.
The bane of weak symbols on macOS has come back to haunt us. (See indygreg/PyOxidizer#373 for previous battles.) In #122 we tracked down a runtime failure to the fact that CPython 3.8 didn't properly backport weak symbol handling support. So, if you build with a modern SDK targeting an older SDK (which we do as of 63f13fb), the linker will insert a weak symbol. However, CPython doesn't have the runtime guards and will attempt to dereference it, causing a crash. Up to this point, our strategy for handling this mess was to stop using symbols on all Python versions when we found one to be causing an issue. This was crude, but effective. In recent commits, we implemented support for leveraging the macOS SDK .tbd files for validating symbol presence. We can now cross reference undefined symbols in our binaries against what the SDKs tell us is present and screen for missing symbols. This helps us detect strong symbols that aren't present on targeted SDK versions. For weak symbols, I'm not sure if we can statically analyze the Mach-O to determine if a symbol is guarded. I _think_ the guard is a compiler built-in and gets converted to a function call, or maybe inline assembly. We _might_ have to disassemble if we wanted to catch unguarded weakly referenced symbols. Yeah, no. In this commit, we effectively change our strategy for weak symbol handling. Knowing that CPython 3.9+ should have guarded weak symbols everywhere, we only ban symbol use on CPython 3.8, specifically x86-64 3.8 since the aarch64 build targets macOS SDK 11, which has the symbols we need. We also remove the one-off validation check for 2 banned symbols. In its place we add validation that only a specific allow list of weak symbols is present on CPython 3.8 builds. As part of developing this, I found yet more bugs in other programs. CPython had some pragmas forcing symbols to be weak but the pragmas weren't protected by an #if guard. This caused a compiler failure if we prevented the symbols from being defined. libffi was also using mkostemp without runtime guards. I'm unsure if Python would ever call into a function that would attempt to resolve this symbol. But if it does it would crash on 10.9. So we disable that symbol for builds targeting 10.9.
Ah yep I followed some of those patches from afar since a coworker was involved (mostly the work around supporting building 3.8 for arm64) but I hadn't realized that there were also discussions around back porting weak-linking fixes. So taking a step back, I proposed this change because it's what we were already doing for Regarding 3.8, I guess there's always the option of calling it a day and upgrading to 3.9 but it's not always practical and also people often find themselves stuck on an older version of cpython because of needing to keep support for old OS versions, which is exactly the use case that this is breaking. So maybe a middle ground here would be - as you said - to special case 3.8 and ignore those weakly-linked symbol. For those folks who happen to have to support an older cpython version and fairly old macos versions, I feel like not having support for potentially better newer system calls is the least bad of two non-optimal situations. Also worth noting that while I included all symbols that could lead to similar issues (lazily picked from an internal codebase that had to deal with that over the years), |
Just saw dc87b17, excellent! |
The bane of weak symbols on macOS has come back to haunt us. (See indygreg/PyOxidizer#373 for previous battles.) In #122 we tracked down a runtime failure to the fact that CPython 3.8 didn't properly backport weak symbol handling support. So, if you build with a modern SDK targeting an older SDK (which we do as of 63f13fb), the linker will insert a weak symbol. However, CPython doesn't have the runtime guards and will attempt to dereference it, causing a crash. Up to this point, our strategy for handling this mess was to stop using symbols on all Python versions when we found one to be causing an issue. This was crude, but effective. In recent commits, we implemented support for leveraging the macOS SDK .tbd files for validating symbol presence. We can now cross reference undefined symbols in our binaries against what the SDKs tell us is present and screen for missing symbols. This helps us detect strong symbols that aren't present on targeted SDK versions. For weak symbols, I'm not sure if we can statically analyze the Mach-O to determine if a symbol is guarded. I _think_ the guard is a compiler built-in and gets converted to a function call, or maybe inline assembly. We _might_ have to disassemble if we wanted to catch unguarded weakly referenced symbols. Yeah, no. In this commit, we effectively change our strategy for weak symbol handling. Knowing that CPython 3.9+ should have guarded weak symbols everywhere, we only ban symbol use on CPython 3.8, specifically x86-64 3.8 since the aarch64 build targets macOS SDK 11, which has the symbols we need. We also remove the one-off validation check for 2 banned symbols. In its place we add validation that only a specific allow list of weak symbols is present on CPython 3.8 builds. As part of developing this, I discovered that libffi was also using mkostemp without runtime guards. I'm unsure if Python would ever call into a function that would attempt to resolve this symbol. But if it does it would crash on 10.9. So we disable that symbol for builds targeting 10.9.
The bane of weak symbols on macOS has come back to haunt us. (See indygreg/PyOxidizer#373 for previous battles.) In #122 we tracked down a runtime failure to the fact that CPython 3.8 didn't properly backport weak symbol handling support. So, if you build with a modern SDK targeting an older SDK (which we do as of 63f13fb), the linker will insert a weak symbol. However, CPython doesn't have the runtime guards and will attempt to dereference it, causing a crash. Up to this point, our strategy for handling this mess was to stop using symbols on all Python versions when we found one to be causing an issue. This was crude, but effective. In recent commits, we implemented support for leveraging the macOS SDK .tbd files for validating symbol presence. We can now cross reference undefined symbols in our binaries against what the SDKs tell us is present and screen for missing symbols. This helps us detect strong symbols that aren't present on targeted SDK versions. For weak symbols, I'm not sure if we can statically analyze the Mach-O to determine if a symbol is guarded. I _think_ the guard is a compiler built-in and gets converted to a function call, or maybe inline assembly. We _might_ have to disassemble if we wanted to catch unguarded weakly referenced symbols. Yeah, no. In this commit, we effectively change our strategy for weak symbol handling. Knowing that CPython 3.9+ should have guarded weak symbols everywhere, we only ban symbol use on CPython 3.8, specifically x86-64 3.8 since the aarch64 build targets macOS SDK 11, which has the symbols we need. We also remove the one-off validation check for 2 banned symbols. In its place we add validation that only a specific allow list of weak symbols is present on CPython 3.8 builds. As part of developing this, I discovered that libffi was also using mkostemp without runtime guards. I'm unsure if Python would ever call into a function that would attempt to resolve this symbol. But if it does it would crash on 10.9. So we disable that symbol for builds targeting 10.9.
In my mind 3.8 is the backwards compatible release supporting older machines. So changing just its build configuration to forego modern macOS features/symbols in the name of runtime compatibility is the way to go. This commit I just pushed to It is unfortunate we have to deprive >95% of users with access to the symbols from using them. But I suppose that is the price we have to pay if we want to support such old macOS. If any user is affected by this and really wants the symbols, they can upgrade to CPython 3.9 or 3.10, which continue to support macOS 10.9 on x86-64 and AFAIK don't have issues using weak symbols since CPython added runtime guards to all their weakly linked symbols. Anyway, with 52b35f9 landing on Thank you for helping me get to the bottom of this. What a thorny bug! |
Oh, if you need this in a release, I might do a release over the weekend. We're a little bit behind on upgrading dependencies and I try to having this project track upstream pretty closely, as I know a growing number of projects are relying on these builds. |
Thank you! I learned a bunch along the way so that was also educative :) |
Python (at least 3.8) has trouble dealing with weak symbols that are defined in macos availability macros and ends up linking them strongly which causes it to fail launching on older OSes since it can't find the symbols at runtime. I've noticed issues on 10.10 and 10.11 due to symbols added in 10.12 (mostly
getentropy
) but this PR attempts to deal with this in a slightly more generic manner.I know that Python has been attempting to fix this and maybe 3.9 doesn't have that issue but I haven't looked too closely.