-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MSVC MT/MD incompatibility in PYBIND11_BUILD_ABI #4953
base: master
Are you sure you want to change the base?
Conversation
Could you please explain the rationale in the PR description? IIUC, currently this PR undoes #4779 for Because ultimately we're sharing C++ pointers between Python extension modules. The C++ ABI must be compatible, independently of everything else. Falling back to I don't know a lot about Windows linking and the various options. I'd appreciate if you could take the time to explain carefully in the PR description so that people like myself can clearly see what is correct and why. (So that we don't have to come back here again.) |
Yes, the ABI is compatible between different versions of MSVC 14.* (2015 - 14.0, 2017 - 14.1, 2019 - 14.2, 2022 - 14.3). With However, if you are using |
Amazing.
Could you please clarify what the "internal structures" are? If you move your comment with that clarification to the PR description it'll look good to me. Please also point to this PR from the comment in internals.h ( @wjakob this looks like something we want to keep in sync between pybind11 and nanobind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm glad you caught this before we made a release.
I am not so sure about this one @rwgk @isuruf . Consider the following scenario:
This is where we are leaving the world of C ABI compatibility world and entering the world of C++ ABI compatibility. The opener of the PR says:
This is precisely the issue. And it is also the issue in the other related PR (#3793) and the CondaForge issue (conda-forge/pybind11-feedstock#79), where I am surprised about the nonchalance in introducing an (in my opinion) dangerous change. After following the above scenario, each of the two pybind11 extensions will now want to read from and write to a shared internal data structure containing lots of nested STL types. It is a minefield -- even the slightest change in how those types are implemented will cause undefined behavior. Crashes in the simplest case, and silent data corruption on the worse side of the scale. This is just the pybind11 internals -- perhaps one could go through through the entire MSVC C++ library and confirm that types used to implement pybind11 itself ( But it does not stop there in my opinion: pybind11 has not just the job of protecting its own internals. It should also protect user data structures from ABI incompatibilities.The point of multiple extensions sharing a pybind11 "domain" is so that functions in these extensions can call each other with instances across libraries. Perhaps both users
Well, if user (case in point, C++20 features were exposed in MSVC but were not API stable yet microsoft/STL#1814 (comment)) I will say what I said earlier in the two other commits: most pybind11 extensions don't need to need type-level access to the internals of other pybind11 extensions. They can be isolated and everything works fine. It is just a small subset of extensions that have such an "intimate" relationship. I think it is reasonable to expect that such extensions are built with C++ libraries that declare themselves as being ABI-compatible. I know that this can sometimes be inconvenient. But weakening the pybind11 protection mechanism is not the right solution to this problem. |
Are there any pybind developer calls where we can discuss these issues? I feel like it would be easier to discuss these on a call and summarize the findings. |
I didn't think that through before. I'm trying to make that concrete for myself by looking at the sources. I think this is the door through which the potential for UB slips in:
Here extension module B is extracting a pointer to an object of this
If the Sounds convincing to me. |
We're in vastly different timezones. I think sorting this out via comments here will work best. (We didn't have a meeting in a long time.) |
@rwgk, that wouldn't happen as the allocation of internal structures happen in the same C runtime. If the external view of the object changes, then the stdlib developers would indicate this so that things wouldn't link at all. For eg: the |
This is a valid point. API stability in the case of experimental features is an issue in all compilers. |
That's a cool feature (I didn't know about it), but I don't think we have that protection: Here the pybind11/include/pybind11/detail/internals.h Line 535 in 4bb6163
And here the pybind11/include/pybind11/detail/internals.h Line 469 in 4bb6163
That by-passes any link checks. The extension modules are "connected" only at runtime. Is it possible to have a runtime check for e.g. |
You can use |
Is that documented to be unique to a given "external view"? (The mangled names I happened to see over the years didn't appear to encode any versioning information.) |
After turning this over in the back of my mind a little bit: I think it's clear that falling back to But it also sounds like using Could we find a middle ground? Maybe, @isuruf do you think you could add preprocessor-level functionality to map |
include/pybind11/detail/internals.h
Outdated
#ifndef PYBIND11_BUILD_ABI | ||
# if defined(__GXX_ABI_VERSION) | ||
# define PYBIND11_BUILD_ABI "_cxxabi" PYBIND11_TOSTRING(__GXX_ABI_VERSION) | ||
# elif defined(_MSC_VER) | ||
# define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER) | ||
# elif defined(_MSC_VER) && defined(_MT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if the assumption that all _MSC_VER >= 1900 && _MSC_VER < 2000
are fully binary compatible is valid, but I'm willing to take your word for it. I'll defer to @wjakob for make the final call on this one.
It would be ideal to reduce the redundancy and avoid the define PYBIND11_BUILD_ABI ""
fallback. (IMO it would be good to get rid of that fallback entirely, but not in this PR.) How about this?
# elif defined(_MSC_VER)
# if defined(_MT)
# define PYBIND11_BUILD_ABI "_mt_mscver" PYBIND11_TOSTRING(_MSC_VER)
# elif defined(_MD)
# if _MSC_VER >= 1900 && _MSC_VER < 2000
# define PYBIND11_BUILD_ABI "_md_mscver14"
# else
# define PYBIND11_BUILD_ABI "_md_mscver" PYBIND11_TOSTRING(_MSC_VER)
# else
# define PYBIND11_BUILD_ABI "_unkown_mscver" PYBIND11_TOSTRING(_MSC_VER)
# endif
# endif
I'm not sure about the _unkown_mscver
fallback. Maybe #error Unkown combination of MSVC macros
(or similar) would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using _MSC_VER/100
now to get the major version. See https://learn.microsoft.com/en-us/cpp/overview/compiler-versions?view=msvc-170 on how _MSC_VER is formatted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll ask around to find someone familiar with the MSVC ABI universe.
It's important that we get this right this time around, because everytime we make a change here evidently we're creating a big problem for conda-forge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to ask, did you consider the idea of preserving "_mscver"
for one branch of the #if defined
logic?
The idea is to minimize ABI breakages.
I'm just not sure which branch is the best one to pick.
I had a bit of time look a bit into C++ ABI stability in the last few days. I think that I am too paranoid after having been repeatedly bitten by this some years ago. It seems that basically all the three main compiler tools (GCC, Clang, MSVC) have started taking ABI stability seriously at some point in the last years and there haven't been major breaks since then. So I'm open to making a change here that allows interoperability between more extensions built with different compilers. But let's get it right and make something that we hopefully won't have to tweak retroactively because the criterion turns out to be too loose and, e.g., and ancient MSVC version turns out to not be ABI-compatible after all. One important think to keep in mind (@rwgk) is that a change here is itself an ABI break, of pybind11. Modifying the -Wenzel |
@isuruf: this PR was mentioned under conda-forge/pybind11-feedstock#95 This PR slipped my attention. We missed the pybind11 v2.13.0 release. @isuruf do you have an opinion when this is best merged? Would it make sense to tweak this PR, to avoid further ABI breaks? E.g. currently the diff under this PR is: -# define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER)
+# elif defined(_MSC_VER) && defined(_MT)
+# define PYBIND11_BUILD_ABI "_mt_mscver" PYBIND11_TOSTRING(_MSC_VER)
+# elif defined(_MSC_VER) && defined(_MD) && (_MSC_VER >= 1900) && (_MSC_VER < 2000)
+# define PYBIND11_BUILD_ABI "_md_mscver14"
+# elif defined(_MSC_VER) && defined(_MD)
+# define PYBIND11_BUILD_ABI "_md_mscver" PYBIND11_TOSTRING(_MSC_VER) could this (note the -# define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER)
+# elif defined(_MSC_VER) && defined(_MT)
+# define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER) <<<< LOOK HERE
+# elif defined(_MSC_VER) && defined(_MD) && (_MSC_VER >= 1900) && (_MSC_VER < 2000)
+# define PYBIND11_BUILD_ABI "_md_mscver14"
+# elif defined(_MSC_VER) && defined(_MD)
+# define PYBIND11_BUILD_ABI "_md_mscver" PYBIND11_TOSTRING(_MSC_VER) or this -# define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER)
+# elif defined(_MSC_VER) && defined(_MT)
+# define PYBIND11_BUILD_ABI "_mt_mscver" PYBIND11_TOSTRING(_MSC_VER)
+# elif defined(_MSC_VER) && defined(_MD) && (_MSC_VER >= 1900) && (_MSC_VER < 2000)
+# define PYBIND11_BUILD_ABI "_md_mscver14"
+# elif defined(_MSC_VER) && defined(_MD)
+# define PYBIND11_BUILD_ABI "_mscver" PYBIND11_TOSTRING(_MSC_VER) <<<< LOOK HERE be better? |
@henryiii @Skylion007 for visitility (see my other comment posted a minute ago) |
I'll add some tests tonight |
For added fun, msvc and clang-cl are also not (guaranteed to be) ABI-compatible on |
I was reading the diff only and didn't see how For completness, I did leave off the fact that |
Here's the new output after fixing a bug and expanding tests:
|
Do we want |
Fixed now. New values
|
include/pybind11/detail/internals.h
Outdated
/// On MSVC, mixing /MT and /MD will result in crashes. See (#4953) | ||
/// There is no macro for major version for MSVC, so we check for major version | ||
/// 19, 20, 21, 22 for now as major version 19 is MSVC 2015-2022 and we future | ||
/// proof for 3 major versions in the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better not to do that (future proof for 3 major versions), because the future is unpredictable. It's only important that we don't silently fall through to ""
like before. (You're doing that already.)
(I want to preserve some of the changes under my experimental #5411. Might take me a day or two to get back to that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't characterize it as unpredictable. The version has a predictable MMmm pattern and since there's no way to divide an integer and assign to another macro in the preprocessor, this is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant: It is likely but not certain that future MSVC versions will be ABI compatible if (_MSC_VER) / 100
are identical.
I think it's better to adjust/review here when those versions are released and we start testing. (We can backport to older pybind11 versions as needed to make them work. That's safer than having them assume ABI compatibility that may not exist.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the sentiment though. On other platforms like clang on macos, we assume that things work fine, but on certain platforms like msvc, we assume the worst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's true.
We don't have anything like version / 100
anywhere else. (Please correct me if I'm missing something.)
Unless ABI compatibility for version / 100
is a documented feature for a given platform, I believe it's prudent not to jump to conclusions. Safer is better here in my judgement.
In the back of my mind:
-
Obviously the "it will be ok" fall-through has proven to be harmful. Let's do less of that in the future.
-
See feat: enable type-safe interoperability between different independent Python/C++ bindings systems. #5296 for where my interest in more strictly enforced correctness is coming from. The conduit feature could work between any pair of bindings systems, but it has to be safe to not get a bad name.
I think something went wrong between the two commits below. Some mixup between testing for My commit c3415a9 is almost certainly also flawed. I removed the commit 4c6c344
commit 1d2f952
|
@rwgk this is ready now. See https://learn.microsoft.com/en-us/cpp/build/reference/md-mt-ld-use-run-time-library?view=msvc-170#remarks for more details. Basically
|
Thanks @isuruf for the explanation! For quality assurance: I found this in the latest GHA logs (LGTM):
|
Hi @robertmaynard, is there a chance that you could help review the changes in include/pybind11/detail/internals.h? — It's <20 changed/added lines. To have a clear track record, it would be great if you could formally approve this PR if/when it looks good to you.
Beyond fixing up
|
NVHPC is actively maintained and new version come out every couple of months. |
Thanks! I just revised the comment (713b427). — Coincidentally I was working on revising the misleading comment before I saw your comment here, after asking my teammates about it. |
#ifndef PYBIND11_BUILD_ABI | ||
# if defined(__GXX_ABI_VERSION) | ||
# if defined(__GXX_ABI_VERSION) // Linux/OSX. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this PR is for msvc but this isn't the correct way to check for linux.
Background:
This macro value is the same across all version of clang ( always 1002 ) but changes anytime the gcc internal compiler ABI changes, which generally has no impact. You can find more details on the gcc abi changes here: https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html
Generally you can safely pass types across anything 1002
or newer as long as the _GLIBCXX_USE_CXX11_ABI
( https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html ) is the same.
So if you build two libraries with differing values of _GLIBCXX_USE_CXX11_ABI
they can't have the same abi:
~ $ clang++-16 -D_GLIBCXX_USE_CXX11_ABI=0 ./test.cpp -shared && nm ./a.out | grep foo | c++filt
00000000000011a0 T foo(std::basic_string<char, std::char_traits<char>, std::allocator<char> >)
~ $ clang++-16 -D_GLIBCXX_USE_CXX11_ABI=1 ./test.cpp -shared && nm ./a.out | grep foo | c++filt
00000000000011a0 T foo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
But that isn't captured in your ABI checks at all and is significantly more important compared to __GXX_ABI_VERSION
# else | ||
# error "Unknown combination of MSVC preprocessor macros: PLEASE REVISE THIS CODE." | ||
# endif | ||
# elif defined(__NVCOMPILER) // NVHPC (PGI-based). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For nvhpc you would check _GLIBCXX_USE_CXX11_ABI
like you would for gcc or clang. You can combo that with __GNUC__
and __GNUC_MINOR__
to get the gcc compatible version that nvhpc is targetting. IIRC the behavior of nvhpc is always -fabi-version=0
where the latest is defined by the gcc it is targetting ( e.g. GNUC , GNUC_MINOR defines ).
Co-authored-by: Robert Maynard <robertjmaynard@gmail.com>
Description
The ABI is compatible between different versions of MSVC 14.* (2015 - 14.0, 2017 - 14.1, 2019 - 14.2, 2022 - 14.3).
However, the internal structures are different between these versions. For eg: the internal details of how C++ structures like maps are implemented might be different. (I'm not saying they are, but how maps are implemented is an implementation detail)
With /MT you are statically linking in the C runtime and when allocating one structure in a DLL created with a different MSVC version, runtime linked statically, and trying to use it in another DLL linked to a different MSVC runtime (static or dynamic) results in crashes. That's why when you link with /MT and is crossing DLL boundaries you need them to have the same version.
However, if you are using /MD you are using only one C runtime and the internal structures are consistent. Therefore you don't need to check MSVC version.
See https://learn.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017?view=msvc-170
For increased safety, this PR also eliminates the fall-through case leaving
PYBIND11_BUILD_ABI
blank.Suggested changelog entry: