-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make QMC_SIMD_ALIGNMENT configured via CMake #2981
Conversation
a10a66a
to
bd2e49e
Compare
75dc37f
to
0ad4e20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Please change QMC_CLINE to a self-describing and longer name. e.g. QMC_SIMD_ALIGNMENT. What CLINE means is not obvious.
-
Rename CheckSIMDAlignement.cmake ->CheckSIMDAlignment.cmake
CMakeLists.txt
Outdated
# Check SIMD alignment for CPU only | ||
# This is intentionally placed before adding OpenMP offload compile options | ||
# to avoid contamination from device compilation pass | ||
#--------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an example of the potential problem, as detailed in your PR text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ye!
@prckent This should make the nightly test error on sulfur go away without need of passing "-march=skylake" as a workaround. |
Nightly scripts have been updated. |
Proposed changes
Renamed QMC_CLINE to QMC_SIMD_ALIGNMENT.
Set QMC_SIMD_ALIGNMENT (SIMD vector length in bytes, not necessarily cache line size) explicitly via CMake rather than depending on compiler defined macros.
The default value is determined by check
__AVX512F__
macro. If found, set 64 otherwise 32.Basically the compiler macro logic is moved to CMake and the value is configurable but checked in CMake.
Background:
The old code was fine when targeting only one architecture (CPU).
When OpenMP offload is used, the source code is compiled for two targets CPU and GPU.
The compiler transformed offload kernel carries C++ mangled function name foo with QMC_CLINE being part of it. For this reason, the caller (CPU pass) and the callee(GPU pass) doesn't match and kernels failed being loaded on GPU. https://bugs.llvm.org/show_bug.cgi?id=45401
What type(s) of changes does this code introduce?
Does this introduce a breaking change?
What systems has this change been tested on?
Bora, Summit, Tulip, Theta
Checklist