Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS: disable fixup chains when linking extension modules #4301

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wjakob
Copy link
Member

@wjakob wjakob commented Oct 31, 2022

On macOS, extension library builds usually rely on the flag -undefined dynamic_lookup to get the linker to temporarily accept missing CPython API symbols. As the name implies, those symbols are then dynamically resolved when Python imports such an extension library. Binaries produced in this way are more portable since they don't contain a hardcoded path to the CPython library. This is critical when distributing binary wheels on PyPI, etc..

When targeting macOS>=12, XCode recently started to generate a somewhat ominous warning:

-undefined dynamic_lookup may not work with chained fixups

For further detail on what chained fixups are, see the following informative WWDC https://developer.apple.com/videos/play/wwdc2022/110362/ (the relevant part starts about 20mins into the video)

The message that the dynamic resolution functionality became broken. I reported this to Apple via feedback request FB11767124.

Apple investigated the request and updated the behavior of ld starting with XCode 14.3b1+: it now disables the fixup chain linker optimization whenever -undefined dynamic_lookup is specified. The feedback from Apple also stated that the parameter -Wl,-no_fixup_chains should be specified on pre-14.3 XCode versions to ensure correct behavior.

This commit realizes those changes by passing a flag to always disable fixup chains. Related discussion is available in the CPython repository (python/cpython#97524).

@wjakob wjakob requested a review from henryiii as a code owner October 31, 2022 21:18
@wjakob
Copy link
Member Author

wjakob commented Oct 31, 2022

FWIW this commit only fixes the pybind11 parts. If somebody uses the CMake linker function Python_add_library, then that will lead to the same issue. So some changes in CMake might be needed as well.

@henryiii
Copy link
Collaborator

henryiii commented Nov 1, 2022

@wjakob
Copy link
Member Author

wjakob commented Nov 1, 2022

If somebody on the Google or CMake team (or elsewhere) has an Apple support ticket to spend, this might be an excellent use for it.

I have many questions:

  • What are the implications of switching from -undefined dynamic_lookup to -undefined suppress? The pybind11 test suite to compiles and runs successfully, but what might conceivably be broken by the change? Why do these two options even exist?

  • In what way will breakage manifest (ld: warning: -undefined dynamic_lookup may not work with chained fixups) if one imports an extension library into a process that uses chained fixups? Will everything segfault? ..

  • Why is this all entangled with the choice of a two-level vs flat namespace? For example, if I only specify -undefined suppress, linking fails with the error message

    ld: can't use -undefined warning or suppress with -twolevel_namespace
    

    Only with the combination -undefined suppress -flat_namespace can I successfully compile extension libraries. From reading a bit about this, it seems to me that the two-level namespace feature is something quite specific to macOS, and that by switching to a one-level namespace, the symbol resolution behavior is becoming more linux-like. Is that understanding correct? (in that case, it doesn't seem that problematic)

  • Is -flat_namespace a reasonable linker flag to add to a Python extension library, in the sense that this will only affect symbol resolution locally? Or does this somehow spill over into the whole binary, potentially breaking a larger Cocoa application that happens to use Python and pybind11 internally?

I've not been able to find much concrete information on these very low level Darwin dynamic linker changes. It would be great to route this to a person at Apple who knows the nitty gritty details of the dynamic linker and can tell us what to do here.

(@rwgk)

@rwgk
Copy link
Collaborator

rwgk commented Nov 1, 2022

FYI: I just posted a question internally to ask about this. Waiting.

@wjakob
Copy link
Member Author

wjakob commented Nov 1, 2022

Issue on the CPython repository: python/cpython#97524

@carlocab
Copy link

carlocab commented Nov 1, 2022

Note that -flat_namespace is probably even more deprecated than -undefined dynamic_lookup, and can cause problems when you try to dlopen libraries built with it.

See https://developer.apple.com/forums/thread/689991, ocaml/ocaml#10723, Homebrew/brew#12225.

CC @aconchillo, since you might have more insight here.

@isuruf
Copy link
Contributor

isuruf commented Nov 1, 2022

Using flat_namespace will break conda packages where we ship our own libc++ to facilitate new C++ features on older macOS

@wjakob
Copy link
Member Author

wjakob commented Nov 2, 2022

Following the trail of experimentation, I made a nanobind version that uses a "ld response file" to more efficiently inform ld about all CPython API symbols via the -U flag (thanks for the hint about this on the CPython issue tracker @carlocab).

Link: wjakob/nanobind@ccb1ac5 (cabinet_of_curiosities branch)

The commit also contains a script collect-symbols.py that downloads symbol information for CPython 3.7-3.12 from GitHub and builds a combined list of symbols that one would consider to be part of the CPython API. That file is available here, in case anybody wants to play with it: https://github.com/wjakob/nanobind/blob/ccb1ac5cfa116ac425281b304efe4a41d064dbbb/cmake/darwin-ld.sym

With these changes, nanobind compiles without warnings and passes the test suite with a deployment target of OSX 12. I would expect the same behavior for pybind11. What remains unclear to me is whether the absence of a warning means that we are now safe of undefined behavior, or if somebody simply forgot to implement that same warning message for the -U flag...

@wjakob
Copy link
Member Author

wjakob commented Nov 2, 2022

One more update: one can inspect the rebase and binding tables using the command xcrun dyldinfo -rebase -bind <extension library>, which produces output of the following form:

rebase information (from local relocation records and indirect symbol table):
segment      section          address     type
__DATA_CONST __got            0x00008050  pointer
binding information (from external relocations and indirect symbol table):
segment      section          address        type   weak  addend dylib            symbol
__DATA_CONST __got            0x00008000    pointer           0 flat-namespace   _PyDict_Items
__DATA_CONST __got            0x00008008    pointer           0 flat-namespace   _PyErr_Clear
__DATA_CONST __got            0x00008010    pointer           0 flat-namespace   _PyIter_Check
__DATA_CONST __got            0x00008018    pointer           0 flat-namespace   _PyTuple_New
__DATA_CONST __got            0x00008020    pointer           0 flat-namespace   _PyUnicode_AsUTF8AndSize
__DATA_CONST __got            0x00008028    pointer           0 flat-namespace   _PyUnicode_FromStringAndSize
__DATA_CONST __got            0x00008030    pointer           0 flat-namespace   __Py_Dealloc
__DATA_CONST __got            0x00008038    pointer           0 flat-namespace   __Py_NoneStruct
__DATA_CONST __got            0x00008040    pointer           0 libSystem        __Unwind_Resume
__DATA_CONST __got            0x000080B0    pointer           0 libc++           __ZNKSt3__121__basic_string_commonILb1EE20__throw_length_errorEv
__DATA_CONST __got            0x000080B8    pointer           0 libc++           __ZNSt11logic_errorC2EPKc
__DATA_CONST __got            0x000080C0    pointer           0 libc++           __ZNSt12length_errorD1Ev
__DATA_CONST __got            0x000080C8    pointer           0 libc++           __ZNSt13runtime_errorD2Ev
__DATA_CONST __got            0x000080D0    pointer           0 libc++           __ZNSt3__112__next_primeEm
__DATA_CONST __got            0x000080D8    pointer           0 libc++           __ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1ERKS5_
__DATA_CONST __got            0x000080E0    pointer           0 libc++           __ZSt9terminatev
__DATA_CONST __got            0x000080F0    pointer           0 libc++           __ZTISt12length_error
__DATA_CONST __got            0x000080F8    pointer           0 libc++           __ZTVSt12length_error
__DATA_CONST __got            0x00008100    pointer           0 libc++           __ZdlPv
__DATA_CONST __got            0x00008108    pointer           0 libc++           __Znwm
__DATA_CONST __got            0x00008110    pointer           0 libc++           ___cxa_allocate_exception
__DATA_CONST __got            0x00008118    pointer           0 libc++           ___cxa_begin_catch
__DATA_CONST __got            0x00008120    pointer           0 libc++           ___cxa_free_exception
__DATA_CONST __got            0x00008128    pointer           0 libc++           ___cxa_throw
__DATA_CONST __got            0x00008130    pointer           0 libc++           ___gxx_personality_v0
__DATA_CONST __got            0x00008138    pointer           0 libSystem        _memcmp
__DATA_CONST __got            0x00008140    pointer           0 libSystem        _memcpy

In particular, note the dylib and symbol columns that are interesting. The output shown above is what you get with -undefined dynamic_resolve.

Three different styles of linker flags are under consideration at this point:

  1. The original method (-undefined dynamic_resolve) which now produces a warning about chained fixups.
  2. Specifying every single CPython API symbol as potentially undefined (using the -U parameter of ld along with an "ld response file" to efficiently pass lots of such arguments through a file). See my previous message and commit wjakob/nanobind@ccb1ac5 for details.
  3. The proposal in this PR (-undefined suppress -flat_namespace).

Preliminary analysis using dyldinfo shows that options #1 and #2 produce identical binding tables: the resulting shared libraries will resolve Python API symbols using a flat namespace, while all other symbols use the two-level namespace.

This means that the main benefit of approach #2 over #1 is that an error message would be generated if there are other undefined symbols besides CPython API functions. The current flag -undefined dynamic_resolve would defer their resolution without printing any kind of error message, which can be undesirable.

What I find curious is that option #1 (-undefined dynamic_resolve) produces the warning

ld: warning: -undefined dynamic_lookup may not work with chained fixups

while option #2 (specifying -U .. per symbol) does not.

Whether or not #1 and #2 will lead to undefined behavior together with chained fixups remains unclear. We will have to wait for Apple's response on whether this just a fluke.

In contrast, option #3 proposed in this PR (-undefined suppress -flat_namespace) causes the output of xcrun dyldinfo -rebase -bind <extension library> to change significantly. The resolving extension modules will resolve all symbols using the flat namespace, which is problematic as pointed out by @carlocab and @isuruf.

kpeeters pushed a commit to kpeeters/cadabra2 that referenced this pull request Dec 1, 2022
wjakob added a commit to wjakob/nanobind that referenced this pull request Feb 23, 2023
…space

On macOS, extension library builds usually rely on the flag ``-undefined
dynamic_lookup`` to get the linker to temporarily accept missing CPython
API symbols. As the name implies, those symbols are then dynamically
resolved when Python imports such an extension library. Binaries
produced in this way are more portable since they don't contain a
hardcoded path to the CPython library. This is critical when
distributing binary wheels on PyPI, etc..

When targeting macOS>=12, XCode recently started to generate a somewhat
ominous warning:

``-undefined dynamic_lookup may not work with chained fixups``

This suggested that the dynamic resolution functionality became broken.
I reported this to Apple via feedback request FB11767124 and switched
nanobind to an alternative set of flags (``-undefined suppress
-flat_namespace``) in the meantime. The flat namespace setup has various
limitations though, so this was not a satisfactory long-term solution.

Apple investigated the request and updated the behavior of ``ld``
starting with XCode 14.3b1+: it now disables the fixup chain linker
optimization whenever ``-undefined dynamic_lookup`` is specified. The
feedback from Apple also stated that the parameter
``-Wl,-no_fixup_chains`` should be specified on pre-14.3 XCode versions
to ensure correct behavior.

This commit realizes those changes by reverting to a two-level namespace
for the overall linking process, checking the XCode version, and
potentially manually disabling fixup chains when needed.

Related discussion is available in the pybind11
(pybind/pybind11#4301) and CPython repositories
(python/cpython#97524).
@wjakob wjakob changed the title Change macOS handling of undefined symbols macOS: disable fixup chains when linking extension modules Feb 27, 2023
@wjakob
Copy link
Member Author

wjakob commented Feb 27, 2023

I amended this commit to remove the -flat_namespace workaround that was found to be problematic. It now adopts the linker flags recommended by Apple (refer to the CPython issue for more details: python/cpython#97524).

On macOS, extension library builds usually rely on the flag ``-undefined
dynamic_lookup`` to get the linker to temporarily accept missing CPython
API symbols. As the name implies, those symbols are then dynamically
resolved when Python imports such an extension library. Binaries
produced in this way are more portable since they don't contain a
hardcoded path to the CPython library. This is critical when
distributing binary wheels on PyPI, etc..

When targeting macOS>=12, XCode recently started to generate a somewhat
ominous warning:

``-undefined dynamic_lookup may not work with chained fixups``

For further detail on what chained fixups are, see the following
informative WWDC
https://developer.apple.com/videos/play/wwdc2022/110362/ (the relevant
part starts about 20mins into the video)

The message that the dynamic resolution functionality became broken.
I reported this to Apple via feedback request FB11767124.

Apple investigated the request and updated the behavior of ``ld``
starting with XCode 14.3b1+: it now disables the fixup chain linker
optimization whenever ``-undefined dynamic_lookup`` is specified. The
feedback from Apple also stated that the parameter
``-Wl,-no_fixup_chains`` should be specified on pre-14.3 XCode versions
to ensure correct behavior.

This commit realizes those changes by passing a flag to *always* disable
fixup chains. Related discussion is available in the CPython repository
(python/cpython#97524).
Copy link

@carlocab carlocab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to check how older versions of Apple ld handle the -no_fixup_chains flag. Could produce errors for being an unrecognised flag.

@henryiii
Copy link
Collaborator

How much older? From what I understand, the latest version's don't require the flag (they default to this behavior to help dynamic languages that use this), and this just hides the warning for older versions. So (probably answering the question I posed above), I'd expect this would work on versions that produce the warning, so the thing to test is versions before this warning started showing up?

@carlocab
Copy link

How much older?

Anything older than Xcode 14, I think. We can check the output of ld -v:

❯ ld -v
@(#)PROGRAM:ld  PROJECT:ld64-820.1
BUILD 18:42:34 Sep 11 2022
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
LTO support using: LLVM version 14.0.0, (clang-1400.0.29.202) (static support for 29, runtime is 29)
TAPI support using: Apple TAPI version 14.0.0 (tapi-1400.0.11)

If it's ld64-819.6 or newer, then it knows the -no_fixup_chains flag. Older than that, passing -no_fixup_chains may produce an error.

Reference: https://en.wikipedia.org/wiki/Xcode#Xcode_11.0_-_14.x_(since_SwiftUI_framework)_2

@carlocab
Copy link

carlocab commented Feb 27, 2023

Actually, based on testing on GitHub runners, the flag seems to be supported as early as ld64-711 (which is in Xcode 13). It is not supported by ld64-609.8 (in Xcode 12.3).

Edit: Based on testing various versions of Xcode, you can pass -no_fixup_chains for Xcode 13 and newer, but shouldn't for older versions of Xcode. There are different ways to check, but the simplest might be to just parse the output of ld -v.

@henryiii
Copy link
Collaborator

So we should only add this flag if Xcode is between 13 and 14.2 (14.3+ always have this, so there's no need to add it there, I'd guess?) We can't really detect the linker version, but we could base it on the compiler version, as they should match.

@carlocab
Copy link

We can't really detect the linker version, but we could base it on the compiler version, as they should match.

Only if the user is using Apple Clang. That's probably the most common case, but it's not guaranteed (e.g. a user may have CC=gcc-13). You could probably detect the linker version by doing the equivalent of

$CC -Wl,-v

though.

@henryiii
Copy link
Collaborator

henryiii commented Mar 9, 2023

I think it's pretty safe to only add this check when using AppleClang between 13 and 14.2, as the version there is pre-cached and doesn't require us to run an extra shell command. If we miss adding this, then the user just sees a warning, so it's not that terrible, IMO, more of a minor quality of life improvement.

@wjakob
Copy link
Member Author

wjakob commented Mar 14, 2023

FYI, the source of the warning is finally understood -- please see this comment on the CPython tracker for details: python/cpython#97524 (comment)

To retain chained fixups while fixing the issue, I use the following solution in nanobind: wjakob/nanobind@2f29ec7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants