-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect logic in opal_config_asm.m4 for 128-wide atomic ops #5529
Comments
The current logic is correct, the code you pinpoint to handles both the support for and the cross compiling support for 128 bits atomics. If support for 128 bits atomics exists, it cannot be disabled as it is the safest (and most efficient way) to cope with some ABA issues. The conditions are written in a reversed way, but the logic itself holds. |
@flang-cavium George is correct. I'll close this issue; please feel free to reply if you have further questions. And here's why attaching an image is sometimes useful 😄: |
Ummm, no. George is not correct. Actually, George is 100% wrong. This has nothing to do with cross-compiling. Two-value logic: "yes" or "no". ! "yes" means "no". Oui? Non? I have already explained in the bug in detail. Care to actually take a look? |
Question: "ABA" -- American Banking Association? |
Actually I did look at your bug description and at your patch and the current code before writing my answer. Your conclusion doesn't stand. ABA stands for the ABA problem, a multi-threaded synchronization issue that arise when 2 variables must be both atomically updated. 128 bits wide atomic operations provides a simple and efficient solution. |
So, in your esteemed opinion, cmpxchg16b should always be enabled, regardless of whether or not the ISA supports it, or whether or not ./configure said --enable-cross-cmpset128=no and --enable-cross-cmpxchg16b=no? Is this what you are saying here? What does "128 bits wide atomic operations provides a simple and efficient solution" really mean? Not all ISA's suport 128-bit wide atomics. Not even all flavors of X86_64 support it. You still haven't explained what this has to do with cross-compiling. |
Not to mention that the ./configure test for 128-wide atomics when compiling with GCC is inherently broken:
So, when compiling with GCC, the test fails at link-time. GCC needs -latomic on link-line, which is absent in the ./configure test. This ends up making cmpxchg16b and the 128-wide atomics always disabled when compiling with GCC. Problem is, cmpxchg16b with GCC ends up being disabled for all the wrong reasons. 128-wide atomics with GCC will end up disabled even for those X86_64 micro-arch'es that actually support 128-wide atomics and cmpxchg16b. |
And yes, I am saying that 128 bits atomics should always be used when available. The only way to turn it off is to completely disable all builtin atomics via Anyway, you pinpointed a real problem: OMPI fails to detect support for 16 bits atomics on newer compilers that provide atomic support via the atomic library. Patch underway. |
We're getting there.
Exactly. 128-wide atomics are not available on every single ISA. What you are advocating here is an all-or-nothing proposition. Either we have 128-wide atomics, in which case we can use atomics, or we don't, in which case we can't use atomics. This is wrong. 32-wide and 64-bit atomics are available on ISA's that do not have 128-wide atomics. For example, ARM64. Which means we should be able to use 32-bit and 64-bit atomics even when the 128-wide atomics aren't available. Which is precisely what ./configure pretends it does, but lies about it: --enable-cross-cmpset128 If I pass "no" to either or both these ./configure arguments, the 128-wide atomics should be disabled. But they are not disabled. They are always enabled, although I am not even cross-compiling. Which means ./configure should - at a minimum - test whether or not I am cross-compiling or not. And if I am not cross-compiling, they should be disabled unless I am explicitly enabling them with --enable-cross-cmpset128=yes And if you would have taken the time to test the patch, instead of arbitrarily discarding it wtihout testing, you would have noticed that the patch does exactly what you want it to do, and cures the problem of having 128-wide atomics always enabled, even for the ISA's that do not support them. |
You have deeply misunderstood or misinterpreted every single comment I made. As I said before I read and test your patch, and it does not work. |
No, I don't think I misunderstood anything. I have tested my patch as well. And it works. I seriously doubt you tested it. |
Let me break it down for you:
Please take a look at #5546 to see if it addresses your issue. |
No. That is not what I am claiming. What I am claiming is this: When configuring OpenMPI from the released tarball, with a compiler OTHER than GCC -- namely clang -- the following line is ALWAYS generated in ${top_srcdir}/opal/include/opal_config.h: /* Whether the __atomic builtin atomic compare and swap is lock-free on This will happen with clang - for example - regardless of whether ./configure says --enable-cross-cmpset128=no --enable-cross-cmpxchg16b=no or --enable-cross-cmpset128=yes --enable-cross-cmpxchg16b=yes This is on ARM64, and it is wrong. ARM64 does not have 128-wide atomics. This happens with clang because clang returns 0 for the test for 128-wide atomics on ANY ISA, in spite of the fact that clang doesn't even have support for a int128_t or uint128_t type. You could argue that this is a clang bug, but that's besides the point. They aren't going to change this behavior anytime soon. The reason this does not happen with GCC is because of the lack of -latomic on link line for the GCC atomics test - which I have already described above. CAVEAT: If you add -latomic on link-line to the GCC test, the test will link. That does not mean that 128-wide atomics are valid on ARM64. They are not. There are FOUR bugs here:
If you apply my opal_config_asm.m4 patch and you pass: --enable-cross-cmpset128=yes --enable-cross-cmpxchg16b=yes AND you add -latomic on link-line to the atomic test for GCC in ./configure, then ./configure will enable the 128-wide atomics. If you pass --enable-cross-cmpset128=no --enable-cross-cmpxchg16b=no then ./configure won't enable them. So, if you really want to know what happens with 128-wide atomics on ISA's that don't support them, please do the following, as a test:
opal_pointer_array You can easily test this on an X86_64 CPU that doesn't support cmpxchg16b. For example a laptop with a Corei5 or Corei7 CPU. I can tell you what happened on my Corei7 laptop when I tested this with GCC 7.2: these three tests fail with a SEGV at run-time. |
It looks like #5546 is... complicated (per the discussion between @bosilca and @hjelmn). Portable software is hard! Ugh. @bosilca and I just got off the phone discussing this issue. I think this boils down to two issues: The configure test for 128 cmpsetYou made a remark in your last reply that finally made it click for us: our configure test is getting the wrong answer for clang+ARM64. This is the root cause of all the problems. We got lost in the discussion and our own biases because of the George is going to amend #5546 to strengthen the test to make sure that The
|
Thanks, the information about clang is very interesting. If that is the root of the issue we can certainly amend how we test for the existence of the 128-bits atomics to make it work despite the poor support on the compiler side. I don't have access to a ARM64 machine, but on my Core i7 laptop, neither with gcc7.3 not clang-902 (llvm 9.1.0) the default OMPI always disables the 128-bits atomics completely and with my patch the __sync support is used. All tests run to completion. I would like to better understand the statement:
According to the clang documentation they have support for some __sync and some __atomic operations for some POSIX types, but there is no mention about the 128 bits atomics. So what exactly returns 0 ? The linking of the small test for 128-bits atomics succeed but the __sync call returns 0 ? |
clang doesn't use GCC's libatomic. It has its own builtins for all the __atomic* ops. Also: clang support for __int128 (and int128_t/uint128_t) was added only very recently to clang: https://bugs.llvm.org/show_bug.cgi?id=36862 (March 2018). and it's definitely not available in clang/LLVM < 7.0, which isn't released yet. So, for clang versions prior to 7.0, we can safely assume that support for __int128 is non-existent. The clang bug is that it doesn't issue a diagnostic. Which, technically, it isn't required to, given that __int128 is a GNU extension. |
Github pro tip: mark long verbatim sections with three single tick marks: https://help.github.com/articles/basic-writing-and-formatting-syntax/#quoting-code |
Also - and I almost forgot: Even when __int128 becomes available in ARM64 clang, it may not be lock-free. |
@flang-cavium Can you also output:
|
There is no format specifier available for __int128. |
For those interested: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0801h/bzf1476202789613.html This is the ARM documentation for the ARM V8.1a LSE extensions - namely lock-free atomics. There's a whole bunch of instructions there: CASA/CASL/CASAL, LDADDA, LDADDL, LDADDAL, etc. As you can see, there are no 128-wide lock-free atomics defined in the ARM64 ISA Specification. Only 32-bit and 64-bit. Only the W[x] (32-wide) and the X[x] (64-wide) registers are defined. So, 128-wide lock-free atomics on ARM64 will not happen until ARM defines the ISA for them. |
Per lengthy discussion on open-mpi#5529 and open-mpi#5546, it seems that -- at least as of August 2018 -- on ARM64+clang, the __atomic_compare_exchange_n() intrinsic function does exist, but it does not behave the way that Open MPI expects it to. Strengthen our configure test to make sure that the function not only exists, but also behaves the way Open MPI expects it to. Many thanks to Stefan Teleman for the bug report. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This statement might not be 100% accurate #5546 (comment). |
@flang-cavium back to the initial problem description typically,
For the sake of completeness, it is possible to Bottom line, and regardless the bozo case, |
On 08/17/2018 02:42 AM, Gilles Gouaillardet wrote:
External Email
@flang-cavium <https://github.com/flang-cavium> back to the initial
problem description
typically, |$enable_cross_cmpset128| can have 3 values
* |yes| if |configure --enable-cross-cmpset128|
* |no| if |configure --disable-cross-cmpset128|
* empty if no option was passed to the |configure| command line
We have already established that this is not how it works in reality.
This is how it would be *supposed* to work, according to the
well-established semantics of autoconf's --enable-<FOO> |
--disable-<FOO>. But that's not what OpenMPI has done here.
For the sake of completeness, it is possible to |configure
--enable-cross-cmpset=FOO| and Open MPI does not handle this properly
(and I guess the reason is this is not a wise thing to do in the first
place).
Bottom line, and regardless the bozo case, |test !
"$enable_cross_cmpset128" = "yes"| is *not* equivalent to |test
"$enable_cross_cmpset128" = "no"| (since |$enable_cross_cmpset| is not
set in most use cases)
That's incorrect.
In a 2-Value Logic grammar system, that is exactly how it's supposed to
work: If it's not "yes", it's "no". I.e: !True == False.
What you are describing here is a 2-Value Logic grammar system with an N
number of possible valid values.
I don't know how this kind of grammar works. I suspect it doesn't work
at all.
…--
Stefan Teleman
Cavium
stefan.teleman@cavium.com
|
On 08/17/2018 09:45 AM, Gilles Gouaillardet wrote:
External Email
@flang-cavium <https://github.com/flang-cavium> please read
https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Package-Options.html
Please stop wasting my time with this.
If you don't understand the semantics of autoconf's --enable | --disable
options, there is nothing that I can do about that.
And it's not my job to explain autoconf to you, either.
…--
Stefan Teleman
Cavium
stefan.teleman@cavium.com
|
While we appreciate your bug report, your trolling is unwelcome. If whoever you represent cares about this issue, they better find someone else to pursue this discussion. Good speed. |
On 08/17/2018 10:34 AM, bosilca wrote:
External Email
While we appreciate your bug report, your trolling is unwelcome. If
whoever you represent cares about this issue, they better find someone
else to pursue this discussion. Good speed.
If there was any trolling here, it came mainly from you, and, to a
lesser degree, from your associates.
See the most recent trolling about C11 and __sync_bool_compare_and_swap.
Not to mention your constant stream of incorrect - but nevertheless
emphatic - statements.
…--
Stefan Teleman
Cavium
stefan.teleman@cavium.com
|
With @flang-cavium's help, I filed an upstream bug with llvm: https://bugs.llvm.org/show_bug.cgi?id=38621 |
Thanks for everyone's assistance. My team appreciates it. From now on, I will be handling any interactions with the OpenMPI project. |
@jjones-cavium Ok, great -- thanks! We got the PR merged on master; it's awaiting merge on the v3.0.x, v3.1.x, and v4.0.x branches (#5617, #5618, #5619, respectively). Once those 3 are merged, this issue will be closed. |
This has now been merged into master, v3.0.x, v3.1.x, and v4.0.x branches. Done! |
Thank you for taking the time to submit an issue!
Patch is based on OpenMPI 3.1.1 - released tarball.
Compiled and installed from source.
Operating system: Ubuntu 16.04 AArch64 (ARM64).
Compilers:
Problem description:
The logic for discovering and enabling 128-bit wide atomic ops at ./configure time is incorrect. Essentially, 128-wide atomic ops are enabled regardless of whether or not the Target actually supports them.
The file opal_config_asm.m4 contains the following:
The logical test
if test ! "$enable_cross_cmpset128" = "yes" ; then
is equivalent to
if test "$enable_cross_cmpset128" = "no" ; then
which means, cmpset128 will be enabled even though ./configure was told
--enable-cross-cmpset128=no
Further down, there is another test for $enable_cross_cmpset128:
if test ! "$enable_cross_cmpset128" = "yes" ; then
is the equivalent of saying
if test "$enable_cross_cmpset128" = "no" ; then
Again, same logic problem, but this time around making certain 128-wide CAS really gets enable no matter what ./configure says.
Also: trying to attach a patch named opal_config_asm.m4.patch results in the following error:
We don't support that file type. Try again with a GIF, JPEG, JPG, PNG, DOCX, GZ, LOG, PDF, PPTX, TXT, XLSX or ZIP.
I seriously doubt anyone would ever submit a GIF, JPEG, JPG or PNG patch. I had to rename the patch file from opal_config_asm.m4.patch to opal_config_asm.m4.patch.txt.
opal_config_asm.m4.patch.txt
The text was updated successfully, but these errors were encountered: