-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the detection of 128 bits atomics. #5546
Conversation
0e913ab
to
8c5de45
Compare
@jsquyres, @bwbarrett and @hppritcha once we converge on how we want to address this issue, we will need to push it to all stable releases. We can certainly address the cross-128 argument, but the purpose of this patch is to be minimally invasive and only address the detection of 128 atomic support. |
@bosilca The problem is often when the compare-exchange support for 128-bit atomics requires libatomic then the support is not lock-free (example gcc on x86-64). This is why I didn't add -latomic. Is there some case where libatomic is required and the resulting support is lock-free? |
To be fair I only tested that with icc and gcc. Both failed atomic_is_lock_free () for 128-bit atomics iff libatomic was required for atomic_compare_exchange_strong on 128-bit values. |
@hjelmn I confirm, on the machines where I tested (x86_64 with different flavors of gcc and llvm) the atomics from -latomic are not lock-free. However, if we don't test for it we discard the check for 128-bits __sync and we would not mix the __atomic and __sync for x86_64. As a result all 128 bits atomics are disabled, which is not what we want. We could modify the logic to always test for 128-bits sync if we fail to use the 128-bits __atomic. But, in the long term, compilers will fix their support for __atomic, and then with the current patch the detection will work seamlessly. |
@bosilca and I discussed this on the phone: see #5529 (comment). The root cause of the problem, as identified by @flang-cavium in #5529, is that our __int128 x = 0;
__sync_bool_compare_and_swap (&x, 0, 1); but we're not actually testing if that function actually worked -- we're just testing that that program doesn't abort. Apparently, on clang+ARM64, this function doesn't cause the program to abort, and therefore @bosilca is going to amend the tests to make sure that the function actually worked, and then hopefully we'll get a correct answer from this configure test on clang+ARM64. Additionally, I'm going to add a commit or two on this PR about the |
Thank you for this. OpenMPI is very important to us - Cavium. Our T99 (ARM64) chip is big in HPC/Super-Computing and OpenMPI is a very important part of that. |
@bosilca I though we already always checked for __sync. If that isn't working there is a problem. https://github.com/open-mpi/ompi/blob/master/opal/include/opal/sys/gcc_builtin/atomic.h#L191 |
And you are correct. A compiler may eventually have a lock-free CAS-128 in libatomic so there is no problem with this PR. |
@jsquyres Crazy that it doesn't abort. That is an unexpected result. What does it do? Give a wrong answer or something? |
@hjelmn George hasn't added commits yet to make sure that the 128 bit test actually works. So let's not merge yet! |
75e5632
to
3f69c40
Compare
well. from the user comment it looked like they somewhat worked. that seems to be a compiler bug |
we need to add the check if they are lock free to the __atomic tests too. we do it for c11 in the other PR.... which i should probably merge |
I think it might be worthwhile to use large values in the test that we run -- e.g., Worse would be if it actually works (e.g., via emulation), but not in an atomic fashion. 😲 |
I got access to an arm64, so I might be able to dig into this a little bit. But the more I read the discussion the more confused I get, about what is supported and what not and about what is available through system libraries. |
Here's what happens when testing with values other than 0 and 1:
Testing with 0 and 1 only is fraught with peril. Constant-folding to 0 and 1. Plus the 0 and 1 arguments are passed as integer constants to the __atomic* intrinsic call. So the compiler will not actually use 128 bits to represent 0 or 1. It will use as little as possible. |
GCC's libatomic implements atomics as LL/SC with additional POSIX threads locking. Here's a snippet from the ARM64 disassembly:
Also: LL/SC atomics can silently fail: |
Yeah. They can fail which is why there is usually a loop around ll/sc. |
But that is outside the scope of this discussion, as if __atomic_compare_exchange_n exists it's the implementors responsibility to make it follow the specifications. Otherwise, if they provide a function that has a broken implementation there is little we can do except preventing OMPI from being built with that specific compiler. Btw in your example above where did you find the__atomic_compare_exchange_n prototype ? The 4th argument is supposed to be a bool and not a __int128 isn't it ? |
I'd like to compare a configure output from current master to a configure output to this PR. I.e., why hasn't this been a problem with the XL Fortran compiler before? |
The check for - latomic was added in this PR.
…On Fri, Aug 24, 2018, 7:37 PM Jeff Squyres ***@***.***> wrote:
I'd like to compare a configure output from current master to a configure
output to this PR. I.e., why hasn't this been a problem with the XL Fortran
compiler before?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5546 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAnOjRo71jyolDKlIS0Vuq2rTavyByMJks5uUI5WgaJpZM4V9fcF>
.
|
Actually, come to think of it, if we need Meaning: the Fortran linker needs to be able to find IBM: please advise. |
The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/7a36c9443b32c20d4c2a135db2e915c8 |
358eca8
to
bb65560
Compare
Thanks to Stefan Teleman for identifying this issue and providing a proof-of-concept patch. We ended up revamping the detection of 128-bit atomics to reduce duplicated code and be a slightly simpler -- albiet perhaps a bit more verbose -- approach: - Remove the --enable-cross-* options; they were confusing and unnecessary. - Always try to compile / link the compiler-intrinsic 128-bit atomic functions. - Strengthen the C tests we use to be more robust. - Use m4 to avoid duplicating the C tests multiple times in the .m4 source. - If not cross-compiling, try to run a short test and ensure that they actually work (as of Aug 2018, there's at least one platform where they don't: clang 6 on ARM64). If cross-compiling, just assume that they work. - Add more comments about what is going on with all the tests; it's tricky stuff. Our Future Selves will thank us. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Whitespace change only; no code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Thanks to George for finding/fixing these. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
bb65560
to
9194dbb
Compare
@jsquyres What was the problem? |
TL;DR: we weren't resetting CFLAGS/LIBS properly in one failure case. More detail: it's still weird that IBM's XL CI environment has FWIW, IBM may still want to report this to their XL dev team. Even though this has turned out to not be an issue for Open MPI, it can still be an issue when you need XLF to link a combined C/Fortran application. |
I still don't think it is needed for fortran. If the C code was linked with -latomic that is sufficient. Fortran can link to it without issue. |
@hjelmn we only have one |
@jsquyres 👍 . I can't approve the PR, github has a legit policy that prevents authors from approving their own patches. |
Before we merge, since there has been sooo many changes on this PR, I'd like to get a final round of confirmations from ARM people that this PR actually works for them. @shamisp @nSircombe Can you confirm that this PR works properly? |
Hi Jeff, I would like to run it by few more developers before we sign off.
Is it okay to wait until Monday COB ?
…On Sat, Aug 25, 2018 at 10:04 Jeff Squyres ***@***.***> wrote:
Before we merge, since there has been sooo many changes on this PR, I'd
like to get a final round of confirmations from ARM people that this PR
actually works for them.
@shamisp <https://github.com/shamisp> @nSircombe
<https://github.com/nSircombe> Can you confirm that this PR works
properly?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5546 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACIe2Ahr-43A9ZgDIJa1NX939xQ7_D4yks5uUWeRgaJpZM4V9fcF>
.
|
@shamisp No problem. |
Some more digging with the HPC Compiler team clears things up a bit... The equivalent of @bosilca suggestion (#5546 (comment)) is implemented in clang on Armv8, on a pair of 64bit ints, in a loop. So the configure log all makes sense, and I think the configure logic is fine now. There are some issues for us that we may need to look at (Armv8.1, .2... implement this slightly differently, which appears to impact the __atomics. And the lse extension is clearly causing problems). But for this PR things look ok - if @shamisp is happy, then I'm happy too. |
@nSircombe Thanks! |
@jsquyres @nSircombe - I'm okay with existing check. For the record @hjelmn - considering OpenMPI use case I want to implement ompi internal version of 128bit AMOs using LDXRP/STXRP. |
Thanks to @flang-cavium for identifying this issue and providing a proof-of-concept patch and a very accurate description of the issue in #5529.
It turns out we were missing the check for -latomic to get support for 128-bits atomics. Second we need to unset the local variables (_save) to avoid a conflict in OPAL_CHECK_SYNC_BUILTIN_CSWAP_INT128.
This patch might conflict with #5445, they are complementary and should be merged.
Signed-off-by: George Bosilca bosilca@icl.utk.edu