Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing symbol error when running older versions of MacOS #122

Closed
wants to merge 1 commit into from

Conversation

ddeville
Copy link
Contributor

@ddeville ddeville commented Apr 6, 2022

Python (at least 3.8) has trouble dealing with weak symbols that are defined in macos availability macros and ends up linking them strongly which causes it to fail launching on older OSes since it can't find the symbols at runtime. I've noticed issues on 10.10 and 10.11 due to symbols added in 10.12 (mostly getentropy) but this PR attempts to deal with this in a slightly more generic manner.

I know that Python has been attempting to fix this and maybe 3.9 doesn't have that issue but I haven't looked too closely.

@ddeville ddeville force-pushed the weak-symbols-macos branch from 4eccf3b to 4f39a0b Compare April 6, 2022 23:05
@indygreg
Copy link
Collaborator

indygreg commented Apr 7, 2022

I'll look at the diff later, but my recollection is CPython didn't do weak linking properly until a 3.9 point release. I believe things were finally implemented somewhat properly when Apple M1s became a thing and they were forced to use the macOS 11.0 SDK to target M1 and this exposed them to portability issues when x86-64 binaries were built with the 11.0 SDK.

It took them a while to do weak linking properly, as there were definitely random symbols not doing it properly. And my recollection is some initial attempts at introducing weak linking didn't actually work properly. I think that's why there is code in this repo for force disabling specific symbols.

The code to support weak linking is scattered all over the CPython code base and much of it was backported. Since much of it was all tied up in M1 support and 3.8 didn't get those backports until much later, I wouldn't at all be surprised if 3.8 has bugs in this area.

As for this project, I very much dislike the fact that we prevent the use of some new symbols instead of using weak linking. This does have performance implications and I'd love to see us do the right thing.

A thing I struggle with is automated enforcement of symbol visibility. The Rust code in this repo for verifying some binary properties does detect some banned symbols. But it doesn't currently recognize - and allow - weakly linked symbols. I'd love to change this. I suspect there's even room to parse .tbd files in macOS SDKs to verify symbol availability and require weak linking when appropriate. I even wrote a Rust crate (https://crates.io/crates/text-stub-library) for parsing .tbd files with this potential goal in mind.

Another issue here is presence of weak symbols impacts downstream projects linking the raw object files. We might need to annotate weak symbols or other linker metadata in the PYTHON.json file so weak symbols can be handled accordingly. Otherwise, I believe weak symbols can get promoted when linking the object files. So if we introduce weak symbols, we need to be sure PyOxidizer handles them correctly.

@indygreg
Copy link
Collaborator

indygreg commented Apr 7, 2022

Oh, and something else to keep in mind is that because this project doesn't (yet) produce universal/fat Mach-O binaries, we can get away with requiring macOS 11.0+ targeting for aarch64. So we likely don't need to disable symbols on aarch64 Darwin unless they were introduced past macOS 11.0.

@indygreg
Copy link
Collaborator

indygreg commented Apr 8, 2022

I have some patches locally to look for weak references in the Mach-O files. Throwing it at the most recent distributions exposes the following weak references:

3.8 aarch64:

python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/install/lib/libpython3.8.dylib _fstatvfs
python/install/lib/libpython3.8.dylib _inet_aton
python/install/lib/libpython3.8.dylib _lchown
python/install/lib/libpython3.8.dylib _statvfs

3.8 x86-64

python/build/lib/libffi.a:closures.o _mkostemp
python/build/lib/liblzma.a:liblzma_la-stream_encoder_mt.o _clock_gettime
python/build/lib/libtcl8.6.a:tclMacOSXNotify.o ___darwin_check_fd_set_overflow
python/build/lib/libtk8.6.a:tkMacOSXColor.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXWindowEvent.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXWm.o _NSAppearanceNameDarkAqua
python/install/lib/libpython3.8.dylib _NSAppearanceNameDarkAqua
python/install/lib/libpython3.8.dylib ___darwin_check_fd_set_overflow
python/install/lib/libpython3.8.dylib __dyld_shared_cache_contains_path
python/install/lib/libpython3.8.dylib _clock_getres
python/install/lib/libpython3.8.dylib _clock_gettime
python/install/lib/libpython3.8.dylib _clock_settime
python/install/lib/libpython3.8.dylib _dispatch_once_f
python/install/lib/libpython3.8.dylib _faccessat
python/install/lib/libpython3.8.dylib _fchmodat
python/install/lib/libpython3.8.dylib _fchownat
python/install/lib/libpython3.8.dylib _fdopendir$INODE64
python/install/lib/libpython3.8.dylib _fstatat$INODE64
python/install/lib/libpython3.8.dylib _fstatvfs
python/install/lib/libpython3.8.dylib _getentropy
python/install/lib/libpython3.8.dylib _inet_aton
python/install/lib/libpython3.8.dylib _lchown
python/install/lib/libpython3.8.dylib _linkat
python/install/lib/libpython3.8.dylib _mkdirat
python/install/lib/libpython3.8.dylib _mkostemp
python/install/lib/libpython3.8.dylib _openat
python/install/lib/libpython3.8.dylib _readlinkat
python/install/lib/libpython3.8.dylib _renameat
python/install/lib/libpython3.8.dylib _statvfs
python/install/lib/libpython3.8.dylib _symlinkat
python/install/lib/libpython3.8.dylib _unlinkat

3.9 aarch64:

python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/install/lib/libpython3.9.dylib _inet_aton

3.9 x86-64

python/build/lib/libffi.a:closures.o _mkostemp
python/build/lib/liblzma.a:liblzma_la-stream_encoder_mt.o _clock_gettime
python/build/lib/libtcl8.6.a:tclMacOSXNotify.o ___darwin_check_fd_set_overflow
python/build/lib/libtk8.6.a:tkMacOSXColor.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXWindowEvent.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXWm.o _NSAppearanceNameDarkAqua
python/install/lib/libpython3.9.dylib _NSAppearanceNameDarkAqua
python/install/lib/libpython3.9.dylib ___darwin_check_fd_set_overflow
python/install/lib/libpython3.9.dylib __dyld_shared_cache_contains_path
python/install/lib/libpython3.9.dylib _clock_getres
python/install/lib/libpython3.9.dylib _clock_gettime
python/install/lib/libpython3.9.dylib _clock_settime
python/install/lib/libpython3.9.dylib _dispatch_once_f
python/install/lib/libpython3.9.dylib _faccessat
python/install/lib/libpython3.9.dylib _fchmodat
python/install/lib/libpython3.9.dylib _fchownat
python/install/lib/libpython3.9.dylib _fdopendir$INODE64
python/install/lib/libpython3.9.dylib _fstatat$INODE64
python/install/lib/libpython3.9.dylib _futimens
python/install/lib/libpython3.9.dylib _getentropy
python/install/lib/libpython3.9.dylib _inet_aton
python/install/lib/libpython3.9.dylib _linkat
python/install/lib/libpython3.9.dylib _mkdirat
python/install/lib/libpython3.9.dylib _mkostemp
python/install/lib/libpython3.9.dylib _openat
python/install/lib/libpython3.9.dylib _readlinkat
python/install/lib/libpython3.9.dylib _renameat
python/install/lib/libpython3.9.dylib _symlinkat
python/install/lib/libpython3.9.dylib _unlinkat
python/install/lib/libpython3.9.dylib _utimensat

3.10 aarch64:

python/build/Modules/socketmodule.o _inet_aton
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/install/lib/libpython3.10.dylib _inet_aton
python/install/lib/python3.10/config-3.10-darwin/libpython3.10.a:socketmodule.o _inet_aton

3.10 x86-64

python/build/lib/libffi.a:closures.o _mkostemp
python/build/lib/liblzma.a:liblzma_la-stream_encoder_mt.o _clock_gettime
python/build/lib/libtcl8.6.a:tclMacOSXNotify.o ___darwin_check_fd_set_overflow
python/build/lib/libtk8.6.a:tkMacOSXColor.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXWindowEvent.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXWm.o _NSAppearanceNameDarkAqua
python/install/lib/libpython3.10.dylib _NSAppearanceNameDarkAqua
python/install/lib/libpython3.10.dylib ___darwin_check_fd_set_overflow
python/install/lib/libpython3.10.dylib __dyld_shared_cache_contains_path
python/install/lib/libpython3.10.dylib _clock_getres
python/install/lib/libpython3.10.dylib _clock_gettime
python/install/lib/libpython3.10.dylib _clock_settime
python/install/lib/libpython3.10.dylib _dispatch_once_f
python/install/lib/libpython3.10.dylib _faccessat
python/install/lib/libpython3.10.dylib _fchmodat
python/install/lib/libpython3.10.dylib _fchownat
python/install/lib/libpython3.10.dylib _fdopendir$INODE64
python/install/lib/libpython3.10.dylib _fstatat$INODE64
python/install/lib/libpython3.10.dylib _futimens
python/install/lib/libpython3.10.dylib _getentropy
python/install/lib/libpython3.10.dylib _inet_aton
python/install/lib/libpython3.10.dylib _linkat
python/install/lib/libpython3.10.dylib _mkdirat
python/install/lib/libpython3.10.dylib _mkostemp
python/install/lib/libpython3.10.dylib _openat
python/install/lib/libpython3.10.dylib _readlinkat
python/install/lib/libpython3.10.dylib _renameat
python/install/lib/libpython3.10.dylib _symlinkat
python/install/lib/libpython3.10.dylib _unlinkat
python/install/lib/libpython3.10.dylib _utimensat

I just thought this would be an interesting reference to have. In theory, all of the weak references should correspond to symbols in macOS 10.10+ (at least for x86-64 builds) because one would think CPython would only make the symbol weak when targeting an older macOS platform level than when the symbol was introduced. However, I'm not sure if this is actually true.

indygreg added a commit that referenced this pull request Apr 27, 2022
macOS SDKs have TBD files, which are YAML files that define library
information, such as which symbols are exported.

We can use these TBD files to validate that symbol references in
our Mach-O files reference symbols that actually exist in the
targeted SDK version.

This commit implements that functionality.

There are a few issues with this:

1. Older SDKs don't have `SDKSettings.json` files. So our SDK search
   is failing.
2. The 10.9 SDK (which we still target) doesn't ship TBD files.

So in the current implementation we only validate down to the
10.14 SDK. This is not sufficient to detect the bug that #122
purportedly addresses.
indygreg added a commit that referenced this pull request Apr 27, 2022
Building on top of the previous commit, this commit teaches our Apple
SDK validation to handle Apple SDKs with only an SDKSettings.plist, not
an SDKSettings.json. This enables us to parse macOS SDKs 10.10+.

10.9 SDKs still don't validate since they lack .tbd files.

With this change, I'm still not seeing any missing symbols. So
if #122 is fixing something, it must be with the 10.9 SDK or there
must be an error elsewhere, possibly in this validation code.
@indygreg
Copy link
Collaborator

I haven't forgotten about this PR!

I've been wanting to add automation that validates macOS portability of the resulting binaries for a while now and this PR spurred me to do it. Partially because unwanted symbols have caused errors and regressions. Partially because I want to do the right thing and weakly link as much as possible so distributions run on modern macOS get the benefits the code is capable of.

Anyway, as the timeline shows I just pushed some commits that attempt to validate undefined Mach-O symbols against the state as advertised by the macOS SDKs. I think I've got non-weak symbol validation working against SDKs 10.10+. No 10.9 at the moment because 10.9 doesn't ship YAML .tbd files and we'll have to parse actual Mach-O to do the symbol presence verification. And the checks aren't yet running in CI.

The code as written doesn't show any problems with missing symbols for current distributions for all SDK versions 10.10+.

So now I just have more questions.

Critically, why are you getting undefined symbols but my validation isn't saying they are missing? Either you are doing something causing an introduction of the non-weak symbols and/or my validation code isn't correct.

Could you please provide more details to help debug this?

Which symbol(s) are undefined? And what are steps to reproduce the undefined symbol errors?

@ddeville
Copy link
Contributor Author

Thanks for the update!

I have a small binary built with pyoxidizer that packages some pip dependencies and mostly acts as Python. When launching this executable on 10.10 and 10.11 I would get a linking error because of getentropy:

dyld: lazy symbol binding failed: Symbol not found: _getentropy
  Referenced from: /Users/vagrant/Desktop/pybox-2022-03-28-1648505173-darwin-x86_64/./pybox
  Expected in: /usr/lib/libSystem.B.dylib

Looking at the symbols, it's undefined but doesn't seem to be weakly linked, which would explain the above.

osx-10-10-a19b635d:pybox-2022-03-28-1648505173-darwin-x86_64 vagrant$ nm pybox | grep getentropy
000000010117a5e8 d __ZN3std3sys4unix4rand3imp21getentropy_fill_bytes5DLSYM17h08e5edb48162f7ffE
                 U _getentropy
0000000101197c84 b _py_getentropy.getentropy_works

(the py_getentropy.getentropy_works seems to imply that it has the right runtime detection but the symbol being strongly linked means that it can't even load the executable)

Let me try to first put together a small repro case so that I can also try it against your latest commits.

@ddeville
Copy link
Contributor Author

ddeville commented Apr 27, 2022

ok here's a small repro case:

$ cat pyoxidizer.bzl
def make_exe():
    dist = default_python_distribution(python_version="3.8")
    policy = dist.make_python_packaging_policy()
    python_config = dist.make_python_interpreter_config()
    python_config.config_profile = "python"
    exe = dist.to_python_executable(
        name="entronope",
        packaging_policy=policy,
        config=python_config,
    )
    return exe

register_target("exe", make_exe)
resolve_targets()

$ cat build.sh
#!/usr/bin/env bash
set -euo pipefail

pyoxidizer_url="https://github.com/indygreg/PyOxidizer/releases/download/pyoxidizer%2F0.20.0/PyOxidizer-0.20.0-exe-macos-universal.zip"
target_triple="x86_64-apple-darwin"

# This shouldn't be necessary but let's be explicit
export MACOSX_DEPLOYMENT_TARGET=10.10

mkdir -p build

pushd build
curl -L --fail $pyoxidizer_url --output pyoxidizer.zip
unzip -o pyoxidizer.zip
popd

./build/pyoxidizer build exe --release --target-triple $target_triple

When run on a 10.10 machine:

$ ./build/x86_64-apple-darwin/release/exe/entronope
dyld: lazy symbol binding failed: Symbol not found: _getentropy
  Referenced from: /Volumes/VMware Shared Folders/entronope/./build/x86_64-apple-darwin/release/exe/entronope
  Expected in: /usr/lib/libSystem.B.dylib

dyld: Symbol not found: _getentropy
  Referenced from: /Volumes/VMware Shared Folders/entronope/./build/x86_64-apple-darwin/release/exe/entronope
  Expected in: /usr/lib/libSystem.B.dylib

Trace/BPT trap: 5

And yep, this is a 3.8 issue, switching default_python_distribution to use Python 3.9 or 3.10 doesn't exhibit the issue anymore.

@ddeville
Copy link
Contributor Author

Tried building with a standalone python 3.8 built off latest main and getting the same linking error.

@indygreg
Copy link
Collaborator

I want to say this is a bug in PyOxidizer then. But I'm the author of that project too, so we can continue discussing it here.

On my Intel MBP running and using the 12.3 SDK to build targeting 10.9, nm -mg on the resulting executable says _getentropy is weakly linked: (undefined) weak external _getentropy (from libSystem).

Do you see that same output? (nm -mg path/to/exe | grep _getentropy).

If you don't, then that's a bug in the build reproducibility of PyOxidizer. Probably something in an environment variable or config file somewhere changing defaults.

If you do see the same output, then it could be a bug in Python 3.8 code for dispatching to weak symbols. As I said before, my recollection is CPython didn't do weak linking correctly until 3.9. I believe those patches were backported to 3.8 and they may have made mistakes with the backports. e.g. there might be call sites where CPython isn't doing the runtime availability checks before calling into a symbol.

@ddeville
Copy link
Contributor Author

ah you're right, I am also seeing (undefined) weak external _getentropy (from libSystem).

@indygreg
Copy link
Collaborator

Actually, I think I can save you some work: this is a bug in CPython 3.8 not doing runtime availability checks.

Compare the following:

There's no availability attributes on 3.8. Without the guard in place, it tries to resolve the weak symbol at runtime, leading to failure on older macOS versions. I'll report the issue to CPython.

@ddeville
Copy link
Contributor Author

Ah! well that explains it :)

@indygreg
Copy link
Collaborator

Actually, I may not report the bug: python/cpython@b29d0a5 says that they intentionally omitted weak linking support from the 3.8 backport to reduce scope.

There are references to this decision at python/cpython#85272 (comment). There's probably more discussion floating around. But I can't make sense of things after the migration of the Python issue tracker to GitHub.

So, uh, I'm not sure where that leaves us.

My knee jerk is I think that means we need to avoid ~all weakly linked symbols coming from CPython 3.8 since they lack runtime availability guards?

Note there does appear to be runtime checking for _dyld_shared_cache_contains_path, ffi_closure_free, ffi_closure_alloc, and ffi_prep_closure_loc. Although since we provide our own libffi on macOS, those ffi symbols may not matter since USING_APPLE_OS_LIBFFI shouldn't be defined.

indygreg added a commit that referenced this pull request Apr 28, 2022
The bane of weak symbols on macOS has come back to haunt us.
(See indygreg/PyOxidizer#373 for previous
battles.)

In #122
we tracked down a runtime failure to the fact that CPython 3.8
didn't properly backport weak symbol handling support. So, if
you build with a modern SDK targeting an older SDK (which we do as
of 63f13fb), the linker will
insert a weak symbol. However, CPython doesn't have the runtime
guards and will attempt to dereference it, causing a crash.

Up to this point, our strategy for handling this mess was to
stop using symbols on all Python versions when we found one to
be causing an issue. This was crude, but effective.

In recent commits, we implemented support for leveraging the
macOS SDK .tbd files for validating symbol presence. We can now
cross reference undefined symbols in our binaries against what
the SDKs tell us is present and screen for missing symbols. This
helps us detect strong symbols that aren't present on targeted
SDK versions.

For weak symbols, I'm not sure if we can statically analyze the
Mach-O to determine if a symbol is guarded. I _think_ the guard
is a compiler built-in and gets converted to a function call, or
maybe inline assembly. We _might_ have to disassemble if we wanted
to catch unguarded weakly referenced symbols. Yeah, no.

In this commit, we effectively change our strategy for weak symbol
handling.

Knowing that CPython 3.9+ should have guarded weak symbols everywhere,
we only ban symbol use on CPython 3.8, specifically x86-64 3.8 since
the aarch64 build targets macOS SDK 11, which has the symbols we
need.

We also remove the one-off validation check for 2 banned symbols. In
its place we add validation that only a specific allow list of weak
symbols is present on CPython 3.8 builds.

As part of developing this, I found yet more bugs in other programs.

CPython had some pragmas forcing symbols to be weak but the pragmas
weren't protected by an #if guard. This caused a compiler failure if
we prevented the symbols from being defined.

libffi was also using mkostemp without runtime guards. I'm unsure if
Python would ever call into a function that would attempt to resolve
this symbol. But if it does it would crash on 10.9. So we disable that
symbol for builds targeting 10.9.
indygreg added a commit that referenced this pull request Apr 28, 2022
The bane of weak symbols on macOS has come back to haunt us.
(See indygreg/PyOxidizer#373 for previous
battles.)

In #122
we tracked down a runtime failure to the fact that CPython 3.8
didn't properly backport weak symbol handling support. So, if
you build with a modern SDK targeting an older SDK (which we do as
of 63f13fb), the linker will
insert a weak symbol. However, CPython doesn't have the runtime
guards and will attempt to dereference it, causing a crash.

Up to this point, our strategy for handling this mess was to
stop using symbols on all Python versions when we found one to
be causing an issue. This was crude, but effective.

In recent commits, we implemented support for leveraging the
macOS SDK .tbd files for validating symbol presence. We can now
cross reference undefined symbols in our binaries against what
the SDKs tell us is present and screen for missing symbols. This
helps us detect strong symbols that aren't present on targeted
SDK versions.

For weak symbols, I'm not sure if we can statically analyze the
Mach-O to determine if a symbol is guarded. I _think_ the guard
is a compiler built-in and gets converted to a function call, or
maybe inline assembly. We _might_ have to disassemble if we wanted
to catch unguarded weakly referenced symbols. Yeah, no.

In this commit, we effectively change our strategy for weak symbol
handling.

Knowing that CPython 3.9+ should have guarded weak symbols everywhere,
we only ban symbol use on CPython 3.8, specifically x86-64 3.8 since
the aarch64 build targets macOS SDK 11, which has the symbols we
need.

We also remove the one-off validation check for 2 banned symbols. In
its place we add validation that only a specific allow list of weak
symbols is present on CPython 3.8 builds.

As part of developing this, I found yet more bugs in other programs.

CPython had some pragmas forcing symbols to be weak but the pragmas
weren't protected by an #if guard. This caused a compiler failure if
we prevented the symbols from being defined.

libffi was also using mkostemp without runtime guards. I'm unsure if
Python would ever call into a function that would attempt to resolve
this symbol. But if it does it would crash on 10.9. So we disable that
symbol for builds targeting 10.9.
@ddeville
Copy link
Contributor Author

Ah yep I followed some of those patches from afar since a coworker was involved (mostly the work around supporting building 3.8 for arm64) but I hadn't realized that there were also discussions around back porting weak-linking fixes.

So taking a step back, I proposed this change because it's what we were already doing for preadv and pwritev and that sounded like a nice generalization of that weak-linking issue. Obviously knowing that 3.9+ doesn't have this problem anymore it's absolutely not the right approach and we should just keep things as they are for the common case.

Regarding 3.8, I guess there's always the option of calling it a day and upgrading to 3.9 but it's not always practical and also people often find themselves stuck on an older version of cpython because of needing to keep support for old OS versions, which is exactly the use case that this is breaking.

So maybe a middle ground here would be - as you said - to special case 3.8 and ignore those weakly-linked symbol. For those folks who happen to have to support an older cpython version and fairly old macos versions, I feel like not having support for potentially better newer system calls is the least bad of two non-optimal situations.

Also worth noting that while I included all symbols that could lead to similar issues (lazily picked from an internal codebase that had to deal with that over the years), getentropy is really the only one that I've had issues with. I should go check whether the other ones actually do have availability guards but we might also not have to ignore a large set of symbols.

@ddeville
Copy link
Contributor Author

Just saw dc87b17, excellent!

indygreg added a commit that referenced this pull request Apr 28, 2022
The bane of weak symbols on macOS has come back to haunt us.
(See indygreg/PyOxidizer#373 for previous
battles.)

In #122
we tracked down a runtime failure to the fact that CPython 3.8
didn't properly backport weak symbol handling support. So, if
you build with a modern SDK targeting an older SDK (which we do as
of 63f13fb), the linker will
insert a weak symbol. However, CPython doesn't have the runtime
guards and will attempt to dereference it, causing a crash.

Up to this point, our strategy for handling this mess was to
stop using symbols on all Python versions when we found one to
be causing an issue. This was crude, but effective.

In recent commits, we implemented support for leveraging the
macOS SDK .tbd files for validating symbol presence. We can now
cross reference undefined symbols in our binaries against what
the SDKs tell us is present and screen for missing symbols. This
helps us detect strong symbols that aren't present on targeted
SDK versions.

For weak symbols, I'm not sure if we can statically analyze the
Mach-O to determine if a symbol is guarded. I _think_ the guard
is a compiler built-in and gets converted to a function call, or
maybe inline assembly. We _might_ have to disassemble if we wanted
to catch unguarded weakly referenced symbols. Yeah, no.

In this commit, we effectively change our strategy for weak symbol
handling.

Knowing that CPython 3.9+ should have guarded weak symbols everywhere,
we only ban symbol use on CPython 3.8, specifically x86-64 3.8 since
the aarch64 build targets macOS SDK 11, which has the symbols we
need.

We also remove the one-off validation check for 2 banned symbols. In
its place we add validation that only a specific allow list of weak
symbols is present on CPython 3.8 builds.

As part of developing this, I discovered that libffi was also using
mkostemp without runtime guards. I'm unsure if Python would ever call
into a function that would attempt to resolve this symbol. But if it
does it would crash on 10.9. So we disable that symbol for builds
targeting 10.9.
indygreg added a commit that referenced this pull request Apr 29, 2022
The bane of weak symbols on macOS has come back to haunt us.
(See indygreg/PyOxidizer#373 for previous
battles.)

In #122
we tracked down a runtime failure to the fact that CPython 3.8
didn't properly backport weak symbol handling support. So, if
you build with a modern SDK targeting an older SDK (which we do as
of 63f13fb), the linker will
insert a weak symbol. However, CPython doesn't have the runtime
guards and will attempt to dereference it, causing a crash.

Up to this point, our strategy for handling this mess was to
stop using symbols on all Python versions when we found one to
be causing an issue. This was crude, but effective.

In recent commits, we implemented support for leveraging the
macOS SDK .tbd files for validating symbol presence. We can now
cross reference undefined symbols in our binaries against what
the SDKs tell us is present and screen for missing symbols. This
helps us detect strong symbols that aren't present on targeted
SDK versions.

For weak symbols, I'm not sure if we can statically analyze the
Mach-O to determine if a symbol is guarded. I _think_ the guard
is a compiler built-in and gets converted to a function call, or
maybe inline assembly. We _might_ have to disassemble if we wanted
to catch unguarded weakly referenced symbols. Yeah, no.

In this commit, we effectively change our strategy for weak symbol
handling.

Knowing that CPython 3.9+ should have guarded weak symbols everywhere,
we only ban symbol use on CPython 3.8, specifically x86-64 3.8 since
the aarch64 build targets macOS SDK 11, which has the symbols we
need.

We also remove the one-off validation check for 2 banned symbols. In
its place we add validation that only a specific allow list of weak
symbols is present on CPython 3.8 builds.

As part of developing this, I discovered that libffi was also using
mkostemp without runtime guards. I'm unsure if Python would ever call
into a function that would attempt to resolve this symbol. But if it
does it would crash on 10.9. So we disable that symbol for builds
targeting 10.9.
@indygreg
Copy link
Collaborator

In my mind 3.8 is the backwards compatible release supporting older machines. So changing just its build configuration to forego modern macOS features/symbols in the name of runtime compatibility is the way to go.

This commit I just pushed to main (which you commented on earlier) does just this.

It is unfortunate we have to deprive >95% of users with access to the symbols from using them. But I suppose that is the price we have to pay if we want to support such old macOS.

If any user is affected by this and really wants the symbols, they can upgrade to CPython 3.9 or 3.10, which continue to support macOS 10.9 on x86-64 and AFAIK don't have issues using weak symbols since CPython added runtime guards to all their weakly linked symbols.

Anyway, with 52b35f9 landing on main, I suppose there's no longer a need for this PR. So closing it.

Thank you for helping me get to the bottom of this. What a thorny bug!

@indygreg indygreg closed this Apr 29, 2022
@indygreg
Copy link
Collaborator

Oh, if you need this in a release, I might do a release over the weekend. We're a little bit behind on upgrading dependencies and I try to having this project track upstream pretty closely, as I know a growing number of projects are relying on these builds.

@ddeville
Copy link
Contributor Author

Thank you! I learned a bunch along the way so that was also educative :)

@ddeville ddeville deleted the weak-symbols-macos branch April 29, 2022 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants