Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building internal modules with the cross build is (slightly) broken #162

Closed
lazka opened this issue Nov 30, 2023 · 6 comments
Closed

Building internal modules with the cross build is (slightly) broken #162

lazka opened this issue Nov 30, 2023 · 6 comments

Comments

@lazka
Copy link
Member

lazka commented Nov 30, 2023

Looking at https://github.com/msys2-contrib/cpython-mingw/actions/runs/6802267164/job/18495001638

It tries to build _contextvars.cp311-mingw_x86_64_clang.pyd but fails with:

lld: error: unable to find library -lpython3.11

then tries again at a later point and succeeds (likely because the dependency then exists)

@lazka lazka changed the title Building internal modules is (slightly) broken Building internal modules with the cross build is (slightly) broken Nov 30, 2023
@mstorsjo
Copy link

mstorsjo commented Dec 3, 2023

I've had a look at this now, and I think I understand most aspects of it.

For some additional context - I started digging into this, because when I tested builds with llvm-mingw, where I used LTO for some base toolchain files, my builds of cpython-mingw crashed at runtime. I think I have an explanation to this.

When linking these shared modules, *.pyd, on Linux, they don't link against libpython at all - they're just linked with undefined refereces, which must be provided by the process that loads the *.pyd shared libraries at runtime. This is possible with ELF (as long as not linking with --no-undefined or --no-allow-shlib-undefines), but isn't possible with PE DLLs.

These shared libraries are built and linked by running setup.py from within make, which then on its own tries to compile and link them. This can be built at the same time as linking a potential libpython3.11.so or libpython3.11.a, because the modules won't try to link against them.

However for mingw targets, this is different, since each of the shared modules *.pyd are linked with -lpython3.11. As there's no dependency between these on the makefile level, they can be built at the same time. When this happens, the run of setup.py ends with a listing of modules that weren't built, listed with e.g. Failed to build these modules:. On the next run, it tries to build these again, succeeding this time.

I can avoid it when building python within llvm-mingw with a modification to https://github.com/mstorsjo/llvm-mingw/blob/master/build-python.sh like this:

diff --git a/build-python.sh b/build-python.sh
index f6ddf25..51b832d 100755
--- a/build-python.sh
+++ b/build-python.sh
@@ -155,6 +155,7 @@ export CXX=$HOST-g++
     --without-ensurepip         \
     --without-c-locale-coercion
 
+$MAKE -j$CORES libpython3.11.dll.a
 $MAKE -j$CORES
 $MAKE install
 rm -rf $PREFIX/lib/python*/test

This isn't very pretty though. It can also be worked around directly in cpython-mingw's makefile like this:

diff --git a/Makefile.pre.in b/Makefile.pre.in
index 93728d6224..32433e67e3 100644
--- a/Makefile.pre.in
+++ b/Makefile.pre.in
@@ -778,7 +778,7 @@ $(srcdir)/Modules/_blake2/blake2s_impl.c: $(srcdir)/Modules/_blake2/blake2b_impl
 # -s, --silent or --quiet is always the first char.
 # Under BSD make, MAKEFLAGS might be " -s -v x=y".
 # Ignore macros passed by GNU make, passed after --
-sharedmods: $(PYTHON_FOR_BUILD_DEPS) pybuilddir.txt @LIBMPDEC_INTERNAL@ @LIBEXPAT_INTERNAL@
+sharedmods: $(PYTHON_FOR_BUILD_DEPS) pybuilddir.txt $(LDLIBRARY) @LIBMPDEC_INTERNAL@ @LIBEXPAT_INTERNAL@
        @case "`echo X $$MAKEFLAGS | sed 's/^X //;s/ -- .*//'`" in \
            *\ -s*|s*) quiet="-q";; \
            *) quiet="";; \

I.e. add a dependency on $(LDLIBRARY) to the sharedmods target.

With regards to correctness, this is an unnecessary restriction if building for e.g. Linux, because $(LDLIBRARY) strictly doesn't need to exist before starting building this target - so I'm not sure if this is strictly correct to do. Hopefully it would be benign though (i.e. that there's no build configuration where this would break a build by requiring something that wouldn't be built otherwise).

So, in most cases, this has been a benign issue, that mostly has caused confusion when reading build logs. However, it can actually cause broken builds of python, that crash at runtime.

When linking a *.pyd, while the libpython3.11.* are being built, we can have these situations:

  • We have no libpython3.11.* yet, the module fails to be built, and gets relinked later
  • We do have libpython3.11.dll.a as expected, we link against it, and we get a correct *.pyd
  • We don't have libpython3.11.dll.a yet, but we do have libpython3.11.a. When linking with -lpython3.11, we pick up libpython3.11.a instead. For most modules, when linking with libpython3.11.a, the link fails e.g. with an error like this:
ld.lld: error: undefined symbol: __declspec(dllimport) send
>>> referenced by ../Modules/signalmodule.c:350
>>>               libpython3.11.a(signalmodule.o):(trip_signal)

ld.lld: error: undefined symbol: __declspec(dllimport) getsockopt
>>> referenced by ../Modules/signalmodule.c:799
>>>               libpython3.11.a(signalmodule.o):(signal_set_wakeup_fd)

ld.lld: error: undefined symbol: __declspec(dllimport) WSAGetLastError
>>> referenced by ../Modules/signalmodule.c:803
>>>               libpython3.11.a(signalmodule.o):(signal_set_wakeup_fd)
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Most modules fail to link with an error like this, but 4 modules actually link successfully; the modules select, _multiprocessing, _overlapped and _socket. But when linked against the static library instead of the .dll.a import library, these don't work as intended (crashing at runtime), even if the build succeeded - probably because they expected to be interacting with the state of the actual Python runtime within cpython itself, but only interact with a separate copy from the static library.

Thus, this build race condition, can in some cases cause a broken build, if we hit this condition:

  • libpython3.11.a exists but libpython3.11.dll.a doesn't exist yet
  • Linking any of the modules select, _multiprocessing, _overlapped or _socket within this time gap

In most cases, libpython3.11.a probably gets created after libpython3.11.dll.a (producing it gets started later), but if ends up existing before the other one, we can have this situation. This is probably what I've been seeing.

So by fixing this missing dependency, we would avoid this race condition and this build nondeterminism.

@lazka
Copy link
Member Author

lazka commented Dec 4, 2023

Thanks for the investigation! Should I create a PR based on it?

@mstorsjo
Copy link

mstorsjo commented Dec 4, 2023

Thanks for the investigation! Should I create a PR based on it?

I guess I could create one myself as well; the main question is, is it ok to add a dependency on $(LDLIBRARY) to the sharedmods make target? For Python overall (for the platforms upstream Python themselves support), it's not necessary but should be mostly harmless except for possibly serializing the build a little (although it seems like upstream Python supports Cygwin, and they would have the same issue?). For cpython-mingw it's probably ok in any case (ideally, any fix I make here should be OK for upstream as well IMO).

@lazka
Copy link
Member Author

lazka commented Dec 4, 2023

I think it's fine as long as the commit message mentions it. Our fork currently doesn't work for Linux (see #131) so many of the Makefile changes aren't portable right now and would need similar adjustments.

although it seems like upstream Python supports Cygwin

it doesn't, officially at least

@mstorsjo
Copy link

mstorsjo commented Dec 4, 2023

I think it's fine as long as the commit message mentions it. Our fork currently doesn't work for Linux (see #131) so many of the Makefile changes aren't portable right now and would need similar adjustments.

Ok, thanks, that's good context info. Yeah ideally things would be in an upstreamable form, in case upstream would care to accept such patches, but I understand the realities of things :-)

although it seems like upstream Python supports Cygwin

it doesn't, officially at least

Oh, ok. The upstream makefile at least has some mentions of it, but I guess it's not maintained: https://github.com/python/cpython/blob/v3.12.0/Makefile.pre.in#L868-L875

mstorsjo added a commit to mstorsjo/cpython-mingw that referenced this issue Dec 4, 2023
When building shared modules (*.pyd) for MinGW, they link against
the import library for the main python (e.g. -lpython3.11). When
linking shared modules on Linux, this doesn't happen, but those
shared libraries are linked with undefined references which
then are fulfilled by the host process when they are loaded.

Add a dependency to make sure that $(LDLIBRARY), e.g.
libpython$(LDVERSION).dll.a, exists before starting to build
the shared modules.

Previously, if libpython$(LDVERSION).dll.a didn't exist when
the shared modules were linked, some of them failed to link,
but this wasn't reported as an error in the build, and a
later invocation of setup.py would retry building them, which
then would succeed. E.g. there was a seemingly benign race
condition in the build.

However, in some rare cases, the race condition no longer
was benign. The build also produces a corresponding static library,
$(LIBRARY), e.g. libpython$(VERSION)$(ABIFLAGS).a - which in
the end would be named similarly, libpython3.11.a, when
$(LDLIBRARY) would be libpython3.11.dll.a.

If the static library exists but the import library doesn't, then
most modules would still fail to link, but a few ones wouldn't
(select, _multiprocessing, _overlapped and _socket). The ones
that did succeed linking with the static library would crash at
runtime though.

By making sure that the right, intended library to fulfill the
linker parameter "-lpython3.11" exists before building the shared
modules, we avoid the whole race condition that in some rare
cases could produce a Python build that crashes at runtime.

For Linux targets, this extra dependency is unnecessary, but should
not cause other problems ($(LDLIBRARY) is set to the same as
$(LIBRARY) if not building any shared library).

This fixes MSYS2 cpython-mingw issue msys2-contrib#162.
lazka pushed a commit that referenced this issue Dec 4, 2023
When building shared modules (*.pyd) for MinGW, they link against
the import library for the main python (e.g. -lpython3.11). When
linking shared modules on Linux, this doesn't happen, but those
shared libraries are linked with undefined references which
then are fulfilled by the host process when they are loaded.

Add a dependency to make sure that $(LDLIBRARY), e.g.
libpython$(LDVERSION).dll.a, exists before starting to build
the shared modules.

Previously, if libpython$(LDVERSION).dll.a didn't exist when
the shared modules were linked, some of them failed to link,
but this wasn't reported as an error in the build, and a
later invocation of setup.py would retry building them, which
then would succeed. E.g. there was a seemingly benign race
condition in the build.

However, in some rare cases, the race condition no longer
was benign. The build also produces a corresponding static library,
$(LIBRARY), e.g. libpython$(VERSION)$(ABIFLAGS).a - which in
the end would be named similarly, libpython3.11.a, when
$(LDLIBRARY) would be libpython3.11.dll.a.

If the static library exists but the import library doesn't, then
most modules would still fail to link, but a few ones wouldn't
(select, _multiprocessing, _overlapped and _socket). The ones
that did succeed linking with the static library would crash at
runtime though.

By making sure that the right, intended library to fulfill the
linker parameter "-lpython3.11" exists before building the shared
modules, we avoid the whole race condition that in some rare
cases could produce a Python build that crashes at runtime.

For Linux targets, this extra dependency is unnecessary, but should
not cause other problems ($(LDLIBRARY) is set to the same as
$(LIBRARY) if not building any shared library).

This fixes MSYS2 cpython-mingw issue #162.
@mstorsjo
Copy link

mstorsjo commented Dec 4, 2023

Thanks for the merge! I believe this issue can be closed now?

@lazka lazka closed this as completed Dec 4, 2023
lazka pushed a commit that referenced this issue Dec 6, 2023
When building shared modules (*.pyd) for MinGW, they link against
the import library for the main python (e.g. -lpython3.11). When
linking shared modules on Linux, this doesn't happen, but those
shared libraries are linked with undefined references which
then are fulfilled by the host process when they are loaded.

Add a dependency to make sure that $(LDLIBRARY), e.g.
libpython$(LDVERSION).dll.a, exists before starting to build
the shared modules.

Previously, if libpython$(LDVERSION).dll.a didn't exist when
the shared modules were linked, some of them failed to link,
but this wasn't reported as an error in the build, and a
later invocation of setup.py would retry building them, which
then would succeed. E.g. there was a seemingly benign race
condition in the build.

However, in some rare cases, the race condition no longer
was benign. The build also produces a corresponding static library,
$(LIBRARY), e.g. libpython$(VERSION)$(ABIFLAGS).a - which in
the end would be named similarly, libpython3.11.a, when
$(LDLIBRARY) would be libpython3.11.dll.a.

If the static library exists but the import library doesn't, then
most modules would still fail to link, but a few ones wouldn't
(select, _multiprocessing, _overlapped and _socket). The ones
that did succeed linking with the static library would crash at
runtime though.

By making sure that the right, intended library to fulfill the
linker parameter "-lpython3.11" exists before building the shared
modules, we avoid the whole race condition that in some rare
cases could produce a Python build that crashes at runtime.

For Linux targets, this extra dependency is unnecessary, but should
not cause other problems ($(LDLIBRARY) is set to the same as
$(LIBRARY) if not building any shared library).

This fixes MSYS2 cpython-mingw issue #162.
lazka pushed a commit that referenced this issue Dec 7, 2023
When building shared modules (*.pyd) for MinGW, they link against
the import library for the main python (e.g. -lpython3.11). When
linking shared modules on Linux, this doesn't happen, but those
shared libraries are linked with undefined references which
then are fulfilled by the host process when they are loaded.

Add a dependency to make sure that $(LDLIBRARY), e.g.
libpython$(LDVERSION).dll.a, exists before starting to build
the shared modules.

Previously, if libpython$(LDVERSION).dll.a didn't exist when
the shared modules were linked, some of them failed to link,
but this wasn't reported as an error in the build, and a
later invocation of setup.py would retry building them, which
then would succeed. E.g. there was a seemingly benign race
condition in the build.

However, in some rare cases, the race condition no longer
was benign. The build also produces a corresponding static library,
$(LIBRARY), e.g. libpython$(VERSION)$(ABIFLAGS).a - which in
the end would be named similarly, libpython3.11.a, when
$(LDLIBRARY) would be libpython3.11.dll.a.

If the static library exists but the import library doesn't, then
most modules would still fail to link, but a few ones wouldn't
(select, _multiprocessing, _overlapped and _socket). The ones
that did succeed linking with the static library would crash at
runtime though.

By making sure that the right, intended library to fulfill the
linker parameter "-lpython3.11" exists before building the shared
modules, we avoid the whole race condition that in some rare
cases could produce a Python build that crashes at runtime.

For Linux targets, this extra dependency is unnecessary, but should
not cause other problems ($(LDLIBRARY) is set to the same as
$(LIBRARY) if not building any shared library).

This fixes MSYS2 cpython-mingw issue #162.
lazka pushed a commit that referenced this issue Feb 10, 2024
When building shared modules (*.pyd) for MinGW, they link against
the import library for the main python (e.g. -lpython3.11). When
linking shared modules on Linux, this doesn't happen, but those
shared libraries are linked with undefined references which
then are fulfilled by the host process when they are loaded.

Add a dependency to make sure that $(LDLIBRARY), e.g.
libpython$(LDVERSION).dll.a, exists before starting to build
the shared modules.

Previously, if libpython$(LDVERSION).dll.a didn't exist when
the shared modules were linked, some of them failed to link,
but this wasn't reported as an error in the build, and a
later invocation of setup.py would retry building them, which
then would succeed. E.g. there was a seemingly benign race
condition in the build.

However, in some rare cases, the race condition no longer
was benign. The build also produces a corresponding static library,
$(LIBRARY), e.g. libpython$(VERSION)$(ABIFLAGS).a - which in
the end would be named similarly, libpython3.11.a, when
$(LDLIBRARY) would be libpython3.11.dll.a.

If the static library exists but the import library doesn't, then
most modules would still fail to link, but a few ones wouldn't
(select, _multiprocessing, _overlapped and _socket). The ones
that did succeed linking with the static library would crash at
runtime though.

By making sure that the right, intended library to fulfill the
linker parameter "-lpython3.11" exists before building the shared
modules, we avoid the whole race condition that in some rare
cases could produce a Python build that crashes at runtime.

For Linux targets, this extra dependency is unnecessary, but should
not cause other problems ($(LDLIBRARY) is set to the same as
$(LIBRARY) if not building any shared library).

This fixes MSYS2 cpython-mingw issue #162.
lazka pushed a commit that referenced this issue Feb 12, 2024
When building shared modules (*.pyd) for MinGW, they link against
the import library for the main python (e.g. -lpython3.11). When
linking shared modules on Linux, this doesn't happen, but those
shared libraries are linked with undefined references which
then are fulfilled by the host process when they are loaded.

Add a dependency to make sure that $(LDLIBRARY), e.g.
libpython$(LDVERSION).dll.a, exists before starting to build
the shared modules.

Previously, if libpython$(LDVERSION).dll.a didn't exist when
the shared modules were linked, some of them failed to link,
but this wasn't reported as an error in the build, and a
later invocation of setup.py would retry building them, which
then would succeed. E.g. there was a seemingly benign race
condition in the build.

However, in some rare cases, the race condition no longer
was benign. The build also produces a corresponding static library,
$(LIBRARY), e.g. libpython$(VERSION)$(ABIFLAGS).a - which in
the end would be named similarly, libpython3.11.a, when
$(LDLIBRARY) would be libpython3.11.dll.a.

If the static library exists but the import library doesn't, then
most modules would still fail to link, but a few ones wouldn't
(select, _multiprocessing, _overlapped and _socket). The ones
that did succeed linking with the static library would crash at
runtime though.

By making sure that the right, intended library to fulfill the
linker parameter "-lpython3.11" exists before building the shared
modules, we avoid the whole race condition that in some rare
cases could produce a Python build that crashes at runtime.

For Linux targets, this extra dependency is unnecessary, but should
not cause other problems ($(LDLIBRARY) is set to the same as
$(LIBRARY) if not building any shared library).

This fixes MSYS2 cpython-mingw issue #162.
lazka pushed a commit that referenced this issue Apr 11, 2024
When building shared modules (*.pyd) for MinGW, they link against
the import library for the main python (e.g. -lpython3.11). When
linking shared modules on Linux, this doesn't happen, but those
shared libraries are linked with undefined references which
then are fulfilled by the host process when they are loaded.

Add a dependency to make sure that $(LDLIBRARY), e.g.
libpython$(LDVERSION).dll.a, exists before starting to build
the shared modules.

Previously, if libpython$(LDVERSION).dll.a didn't exist when
the shared modules were linked, some of them failed to link,
but this wasn't reported as an error in the build, and a
later invocation of setup.py would retry building them, which
then would succeed. E.g. there was a seemingly benign race
condition in the build.

However, in some rare cases, the race condition no longer
was benign. The build also produces a corresponding static library,
$(LIBRARY), e.g. libpython$(VERSION)$(ABIFLAGS).a - which in
the end would be named similarly, libpython3.11.a, when
$(LDLIBRARY) would be libpython3.11.dll.a.

If the static library exists but the import library doesn't, then
most modules would still fail to link, but a few ones wouldn't
(select, _multiprocessing, _overlapped and _socket). The ones
that did succeed linking with the static library would crash at
runtime though.

By making sure that the right, intended library to fulfill the
linker parameter "-lpython3.11" exists before building the shared
modules, we avoid the whole race condition that in some rare
cases could produce a Python build that crashes at runtime.

For Linux targets, this extra dependency is unnecessary, but should
not cause other problems ($(LDLIBRARY) is set to the same as
$(LIBRARY) if not building any shared library).

This fixes MSYS2 cpython-mingw issue #162.
lazka pushed a commit that referenced this issue Apr 12, 2024
When building shared modules (*.pyd) for MinGW, they link against
the import library for the main python (e.g. -lpython3.11). When
linking shared modules on Linux, this doesn't happen, but those
shared libraries are linked with undefined references which
then are fulfilled by the host process when they are loaded.

Add a dependency to make sure that $(LDLIBRARY), e.g.
libpython$(LDVERSION).dll.a, exists before starting to build
the shared modules.

Previously, if libpython$(LDVERSION).dll.a didn't exist when
the shared modules were linked, some of them failed to link,
but this wasn't reported as an error in the build, and a
later invocation of setup.py would retry building them, which
then would succeed. E.g. there was a seemingly benign race
condition in the build.

However, in some rare cases, the race condition no longer
was benign. The build also produces a corresponding static library,
$(LIBRARY), e.g. libpython$(VERSION)$(ABIFLAGS).a - which in
the end would be named similarly, libpython3.11.a, when
$(LDLIBRARY) would be libpython3.11.dll.a.

If the static library exists but the import library doesn't, then
most modules would still fail to link, but a few ones wouldn't
(select, _multiprocessing, _overlapped and _socket). The ones
that did succeed linking with the static library would crash at
runtime though.

By making sure that the right, intended library to fulfill the
linker parameter "-lpython3.11" exists before building the shared
modules, we avoid the whole race condition that in some rare
cases could produce a Python build that crashes at runtime.

For Linux targets, this extra dependency is unnecessary, but should
not cause other problems ($(LDLIBRARY) is set to the same as
$(LIBRARY) if not building any shared library).

This fixes MSYS2 cpython-mingw issue #162.
lazka pushed a commit that referenced this issue Sep 10, 2024
When building shared modules (*.pyd) for MinGW, they link against
the import library for the main python (e.g. -lpython3.11). When
linking shared modules on Linux, this doesn't happen, but those
shared libraries are linked with undefined references which
then are fulfilled by the host process when they are loaded.

Add a dependency to make sure that $(LDLIBRARY), e.g.
libpython$(LDVERSION).dll.a, exists before starting to build
the shared modules.

Previously, if libpython$(LDVERSION).dll.a didn't exist when
the shared modules were linked, some of them failed to link,
but this wasn't reported as an error in the build, and a
later invocation of setup.py would retry building them, which
then would succeed. E.g. there was a seemingly benign race
condition in the build.

However, in some rare cases, the race condition no longer
was benign. The build also produces a corresponding static library,
$(LIBRARY), e.g. libpython$(VERSION)$(ABIFLAGS).a - which in
the end would be named similarly, libpython3.11.a, when
$(LDLIBRARY) would be libpython3.11.dll.a.

If the static library exists but the import library doesn't, then
most modules would still fail to link, but a few ones wouldn't
(select, _multiprocessing, _overlapped and _socket). The ones
that did succeed linking with the static library would crash at
runtime though.

By making sure that the right, intended library to fulfill the
linker parameter "-lpython3.11" exists before building the shared
modules, we avoid the whole race condition that in some rare
cases could produce a Python build that crashes at runtime.

For Linux targets, this extra dependency is unnecessary, but should
not cause other problems ($(LDLIBRARY) is set to the same as
$(LIBRARY) if not building any shared library).

This fixes MSYS2 cpython-mingw issue #162.
lazka pushed a commit that referenced this issue Sep 10, 2024
When building shared modules (*.pyd) for MinGW, they link against
the import library for the main python (e.g. -lpython3.11). When
linking shared modules on Linux, this doesn't happen, but those
shared libraries are linked with undefined references which
then are fulfilled by the host process when they are loaded.

Add a dependency to make sure that $(LDLIBRARY), e.g.
libpython$(LDVERSION).dll.a, exists before starting to build
the shared modules.

Previously, if libpython$(LDVERSION).dll.a didn't exist when
the shared modules were linked, some of them failed to link,
but this wasn't reported as an error in the build, and a
later invocation of setup.py would retry building them, which
then would succeed. E.g. there was a seemingly benign race
condition in the build.

However, in some rare cases, the race condition no longer
was benign. The build also produces a corresponding static library,
$(LIBRARY), e.g. libpython$(VERSION)$(ABIFLAGS).a - which in
the end would be named similarly, libpython3.11.a, when
$(LDLIBRARY) would be libpython3.11.dll.a.

If the static library exists but the import library doesn't, then
most modules would still fail to link, but a few ones wouldn't
(select, _multiprocessing, _overlapped and _socket). The ones
that did succeed linking with the static library would crash at
runtime though.

By making sure that the right, intended library to fulfill the
linker parameter "-lpython3.11" exists before building the shared
modules, we avoid the whole race condition that in some rare
cases could produce a Python build that crashes at runtime.

For Linux targets, this extra dependency is unnecessary, but should
not cause other problems ($(LDLIBRARY) is set to the same as
$(LIBRARY) if not building any shared library).

This fixes MSYS2 cpython-mingw issue #162.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants