Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify docs around manylinux and c++ #732

Closed

Conversation

AllSeeingEyeTolledEweSew

I wanted to clarify and expand on some docs here. In the current docs, I especially found it misleading that manylinux2010 and manylinux2014 "support all C++ standards (up to C++17)", since using newer standards can produce manylinux-incompatible builds.

@henryiii
Copy link
Contributor

henryiii commented Jun 24, 2021

This is not correct. The newer GCC compilers on manylinux1 through manylinux2014 contain versions of GCC that have been patched by Red Hat to produce binaries that are compatible with the default libstdc++.so on that system. There might be some small issues in corner cases, but this works in most situations and is used across the ecosystem. The issue you've linked is getting manylinux1 + C++14, which doesn't work, since Red Hat dropped support for CentOS 5's back ports with GCC 4.9.

Manylinux2010 has GCC 8, and Manylinux2014 has GCC 9.

There is an open issue to find or perform something similar with Debian based manylinux_2_24 (and newer) images pypa/manylinux#1012, but it's still unresolved. See:

RHEL dev toolset took care of that and also other libraries like libgcc_s so that binaries produced with devtoolset would still be compatible with the base image with no action.

With some edits and changing versions, this could be useful, there's some good information and links.

@AllSeeingEyeTolledEweSew
Copy link
Author

Fascinating!

Is there documentation about how devtoolset's patches work? If I use a new c++17 feature from devtoolset-9 where gcc would link to a symbol with version CXXABI_1.3.12 (from 9.x), what happens to it? Are all libstdc++.so-specific bugfixes and behaviors from 9.x backported?

I'm surprised and confused, so it would be nice to have a link to relevant info to include in the docs.

@AllSeeingEyeTolledEweSew
Copy link
Author

Oh, I found:

Yes, build gcc 10 with the patches from https://git.centos.org/rpms/devtoolset-10-gcc/blob/c7/f/SOURCES.
In devtoolset, libstdc++.so is a linker script that points to a static libstdc++ for the new parts and shared libstdc++ for the older parts and the C++ ABI is set to GCC-4 ABI.

I'll try to redo this PR

@henryiii
Copy link
Contributor

Is there documentation about how devtoolset's patches work?

I would love this too, I'd like to see how hard it would be to apply for ubuntu.

Oh, I found:

Ahh, that's great, thanks! Where did you find it, out of curiosity? That's a clever way to handle it, actually.

@henryiii
Copy link
Contributor

Also, currently, what you have written is completely correct for the Debian based builds, which do not do this trick.

@henryiii henryiii marked this pull request as draft June 24, 2021 19:13
@henryiii
Copy link
Contributor

(I'm making this draft until you update it and un-mark it as draft, so I'll know to look at it again)

@AllSeeingEyeTolledEweSew
Copy link
Author

pypa/manylinux#1012 (comment)

It is clever, but surprising to me as a user!

Since the centos images are default, cibuildwheel's default behavior is really an implicit and selective version of -static-libstdc++.

Given that, I guess I'm leaning toward doing a full -static-libstdc++ unless I can find a reason not to. Explicit is better than implicit, and all.

@henryiii
Copy link
Contributor

It's really important to keep wheels small, since you could end up uploading dozens of them per release. And pretty much every other compiled package you use is likely depending on this - certainly all pybind11 packages like SciPy, PyTorch, etc. are shipping C++11 + manylinux1 wheels have have been for years.

How this will be handled in the future remains to be seen.

## manylinux1 and C++14
The default `manylinux1` image (based on CentOS 5) contains a version of GCC and libstdc++ that only supports C++11 and earlier standards. There are however ways to compile wheels with the C++14 standard (and later): https://github.com/pypa/manylinux/issues/118
## manylinux and C++
The `manylinux*` standards imply a limit on which C++ standard you can use. There are workarounds to this, and `cibuildwheel` uses some workarounds by default. You should be aware of them!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `manylinux*` standards imply a limit on which C++ standard you can use. There are workarounds to this, and `cibuildwheel` uses some workarounds by default. You should be aware of them!
The `manylinux*` standards imply a limit on which C++ standard you can dynamically link to. There is a workaround built into the manylinux1, manylinux2010, and manylinux2014 compilers described below that allows partial linking to just the portions that are allowed; this workaround is not available for the `manylinux_2_*` images yet.

For 95%+ of users, this makes no difference - it just works. No one compiles with old GCCs - 4.8 is already very old - C++11 is 10 years old this year. And it's not cibuildwheel's workaround at all, it's Red Hats - one could maybe argue it's manylinux's (which is the way I've stated it above), but it's not cibuildwheel.

Pybind11 would not exist (at least on linux) if we were constrained to the original compilers. And statically linking everything would probably down PyPI.

Consider that the `manylinux*` standards constrain _symbol versions_ in `libstdc++.so`. So when *dynamically linking* to `libstdc++.so`, the desired `manylinux*` standard constrains the gcc version, and thus constrains the C++ standard.

Cross-referencing [current manylinux standards](https://github.com/mayeut/pep600_compliance/blob/f86a7d7c153cc45aa3f2add6ffcf610c80501657/pep600_compliance/tools/policy.json) with [gcc's symbol versions](https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html) and [libstdc++'s language support](https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html), we have:
* `manylinux1`: gcc 4.1.x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be relegated to a note. For almost every user, this doesn't matter. IFAIK, all packages dynamically link and go about their day. This is a nice note, but what most users what is the second table - what GCC version is actually used. manylinux1 is C++11 compatible, but not C++14 compatible, etc. This is good detail to have in a note just in case someone is debugging a really weird bug, or wondering how CentOS 5 has GCC 4.8.


([The first release of a complete and stable C++17 ABI is gcc 9.1](https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017), but as of writing there is no official `manylinux*` standard that supports this version. Prior standards like C++11 *are* supported, but gcc doesn't document the per-version support as clearly as C++14 and later)

We *can* use newer C++ standards and support older `manylinux*`, if we use *static linking*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are trying to push anyone to static linking. The current system works fine. It's better if it's just mentioned as a detail.

@AllSeeingEyeTolledEweSew
Copy link
Author

Reading between the lines, it seems like you'd prefer I put the most relevant info first in the section, whereas I wrote it starting from first principles and building to relevant info later.

So rather than tweak, I'll just rewrite it and we can go from there. Will un-draft when ready

@AllSeeingEyeTolledEweSew AllSeeingEyeTolledEweSew marked this pull request as draft June 26, 2021 18:48
@henryiii
Copy link
Contributor

I would ask the question: what is the purpose of this section? The current version could be "convince users that the way we currently build wheels is wrong, they should either statically link or use GCC 4.1", or at least put worries and doubts in their minds. It wasn't intentional, but it was the way it read due to the historical path it was written. The purpose should be "inform users about what they can use, as well as what is happening behind the scenes".

You want users to be assured that the version of GCC they get is just fine to use as normal. Most packages just use it and are happy. "What C++ version can I use" is the high level question, and "what GCC version" is the slightly lower level question, and most users should be happy to stop there. Just in case a really weird bug occurs, having the information like what the true version of GCC is being dynamically linked, and anything newer than than is static, or why manylinux_2_* is "what you see is what you get" for GCC linking is interesting for very advanced users (and really hard information to find! Thanks for digging it up and collecting it!).

@henryiii
Copy link
Contributor

henryiii commented Jul 5, 2021

Also, I've got a long term plan to improve the packaging.python.org docs, such as in pypa/packaging.python.org#911 . Would you be interested in helping with that? I think a version of this page would work very well there as part of a new/rewritten section on building wheels for linux.

@joerick
Copy link
Contributor

joerick commented Apr 29, 2022

Closing due to inactivity. Feel free to reopen if you want to continue.

@joerick joerick closed this Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants