-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GNU Assembler bug workaround #196
Conversation
…AVX-512 bug. Needs testing on other platforms. Fixes #193.
…shell compatibility
I tested on I had added an
The option to disable AVX512F with icc is different, but I have not yet been able to figure out what that is. My attempts with
So the I do have another suggestion - will you please replace the magenta colour for the text for the message "Disabling AVX512" with red. This is an important enough message that users should pay attention. |
Hmm, that's too bad about |
@lgarrison I was using |
One solution could be to capture the output of |
Can you try this version? |
Yup - this works for me with intel 2018. Okay to go ahead and "squash + merge"? |
Yes! |
Just adding a note to document that @SandyYuan and I were able to get AVX-512 working on a Linux system that exhibited this bug by using conda-forge compilers. The steps were: $ conda create -n forge -c conda-forge python=3
$ conda activate forge
$ conda install -c conda-forge gcc_linux-64 binutils_linux-64 gsl
$ git clone https://github.com/manodeep/Corrfunc.git
$ cd Corrfunc/
$ # edit common.mk with CC := x86_64-conda_cos6-linux-gnu-gcc
$ pip install -e ./ In theory, mainline conda also has binutils>=2.32, but that wasn't working for us. And there's probably a syntax to pass |
The (in)ability to pass the compiler at the command-line when installing with |
Forgot to mention maybe the most pernicious bug: one should install GSL from conda when using conda's compilers, because if it picks up the system GSL, it will add |
Right - of course! You have to be entirely within the |
hi @lgarrison, this solution worked for me. In case it is interesting, i report the improvement in performance: Node details:
AVX
AVX512
The odd thing is how much performance with AVX512 decreases when 48 threads are used. |
Thanks for reporting back! I'm glad you did get a speedup after all. Definitely curious that you don't see a speedup until you use hyperthreading. Maybe the floating-point throughput is not the bottleneck initially, until you use enough threads to mask whatever memory/cache/backend latency is occurring. You may also be interested in seeing if PR #258 give you extra performance, if your use-case allows for evenly-spaced bins. |
This fixes #193 by querying the compiler to see if it is using a known-bad version of the GNU Assembler (GAS) under the hood. If so, AVX-512 is disabled. This works for
gcc
andicc
(both of which use GAS under the hood), andclang
if it is using GAS instead of its integrated assembler.If AVX-512 is disabled, a warning will be printed during compilation. I think the pip installation will still be silent though, which may not be what we want.
I also had to change
icc -xHost
toicc -march=native
. It doesn't seem possible to disable AVX-512F (with-mno-avx512f
, for example) with the former, only the latter. I don't think it will make a big difference.There was also a bug in the
icc
compiler options: we were using-xHost -xCORE-AVX512
indiscriminately, which will always generate AVX-512 instructions, even on non-AVX-512 machines!I also changed the compilation output coloring to use
tput
instead ofecho
and escape sequences. On Ubuntu, I was getting tons of-e
s in my output becausesh
is actuallydash
, andecho
indash
doesn't have a-e
flag (it's enabled by default).tput
seemed like a more robust solution than trying to detectdash
, but let me know what you think.