GNU Assembler bug workaround #196

lgarrison · 2019-10-03T14:59:38Z

This fixes #193 by querying the compiler to see if it is using a known-bad version of the GNU Assembler (GAS) under the hood. If so, AVX-512 is disabled. This works for gcc and icc (both of which use GAS under the hood), and clang if it is using GAS instead of its integrated assembler.

If AVX-512 is disabled, a warning will be printed during compilation. I think the pip installation will still be silent though, which may not be what we want.

I also had to change icc -xHost to icc -march=native. It doesn't seem possible to disable AVX-512F (with -mno-avx512f, for example) with the former, only the latter. I don't think it will make a big difference.

There was also a bug in the icc compiler options: we were using -xHost -xCORE-AVX512 indiscriminately, which will always generate AVX-512 instructions, even on non-AVX-512 machines!

I also changed the compilation output coloring to use tput instead of echo and escape sequences. On Ubuntu, I was getting tons of -es in my output because sh is actually dash, and echo in dash doesn't have a -e flag (it's enabled by default). tput seemed like a more robust solution than trying to detect dash, but let me know what you think.

…AVX-512 bug. Needs testing on other platforms. Fixes #193.

…shell compatibility

manodeep · 2019-10-06T02:30:57Z

I tested on ozstar, with binutils/2.30. While the fix worked with gcc (disabled AVX512F), it did not work for icc.

I had added an #error AVX512F should not be enabled line here to make sure that the compilation fails, if AVX512F was enabled. Here is the error generated:

icc -DDOUBLE_PREC  -DVERSION=\"2.3.1\" -DUSE_UNICODE -std=c99 -m64 -g -Wsign-compare -Wall -Wextra -Wshadow -Wunused -fPIC -D_POSIX_SOURCE=200809L -D_GNU_SOURCE -D_DARWIN_C_SOURCE -O3  -march=native -qopenmp -mno-avx512f -I../../io -I../../utils  -c countpairs_impl_double.c -o countpairs_impl_double.o
icc: command line warning #10159: invalid argument for option '-m'
In file included from countpairs_impl_double.c(21):
countpairs_kernels_double.c(30): error: #error directive: AVX512F should not be enabled
  #error AVX512F should not be enabled
   ^

The option to disable AVX512F with icc is different, but I have not yet been able to figure out what that is. My attempts with -mno-COMMON-AVX512 etc were fruitless. Here is how I was trying it:

[~/temp/test_corrfunc_gas_bug/Corrfunc @farnarkle2] icc -O3 -std=c99  -dM -E -xc -c /dev/null | grep -i avx
[~/temp/test_corrfunc_gas_bug/Corrfunc @farnarkle2] icc -march=native -O3 -std=c99  -dM -E -xc -c /dev/null | grep -i avx
#define __AVX__ 1
#define __AVX_I__ 1
#define __AVX2__ 1
#define __AVX512F__ 1
#define __AVX512CD__ 1
#define __AVX512DQ__ 1
#define __AVX512BW__ 1
#define __AVX512VL__ 1

So the -march=native is definitely affecting the compiler flags. Without the -march=native, the AVX* flags are not enabled.

I do have another suggestion - will you please replace the magenta colour for the text for the message "Disabling AVX512" with red. This is an important enough message that users should pay attention.

lgarrison · 2019-10-06T03:00:27Z

Hmm, that's too bad about icc. It works for me on icc 19, what version are you running? But we'll have to think of another solution that works for earlier icc too.

manodeep · 2019-10-06T08:54:06Z

@lgarrison I was using icc 18.0.1. Intel/2016 (the other version available on ozstar), does not support AVX512 - so the fix is not triggered.

manodeep · 2019-10-06T10:47:33Z

One solution could be to capture the output of $(CC) -march=native -dM -E -xc -c /dev/null and checking what instruction sets are supported. We can then filter-out the AVX512 flags when GAS 2.30/2.31 is detected

lgarrison · 2019-10-06T14:39:00Z

Can you try this version? -xCORE-AVX2 seems to override -march=native, even on older icc.

manodeep · 2019-10-07T11:07:28Z

Yup - this works for me with intel 2018. Okay to go ahead and "squash + merge"?

lgarrison · 2019-10-07T12:06:54Z

Yes!

lgarrison · 2021-05-10T19:57:49Z

Just adding a note to document that @SandyYuan and I were able to get AVX-512 working on a Linux system that exhibited this bug by using conda-forge compilers. The steps were:

$ conda create -n forge -c conda-forge python=3
$ conda activate forge
$ conda install -c conda-forge gcc_linux-64 binutils_linux-64 gsl
$ git clone https://github.com/manodeep/Corrfunc.git
$ cd Corrfunc/
$ # edit common.mk with CC := x86_64-conda_cos6-linux-gnu-gcc
$ pip install -e ./

In theory, mainline conda also has binutils>=2.32, but that wasn't working for us. And there's probably a syntax to pass CC to pip without cloning the repo to edit common.mk, but we didn't test that.

manodeep · 2021-05-10T23:47:05Z

The (in)ability to pass the compiler at the command-line when installing with pip has always bothered me. I have now figured it out -- the solution is to use python -m pip install --verbose -e . --install-option="CC=/path/to/custom/compiler"

Relevant SO link

lgarrison · 2021-05-12T16:13:50Z

Forgot to mention maybe the most pernicious bug: one should install GSL from conda when using conda's compilers, because if it picks up the system GSL, it will add -I/usr/include to the CFLAGS, which will intercept non-GSL #includes, loading a set of headers incompatible with conda's compilers.

manodeep · 2021-05-12T23:47:38Z

Right - of course! You have to be entirely within the conda ecosystem

valerio-marra · 2021-12-22T18:10:39Z

conda install -c conda-forge gcc_linux-64 binutils_linux-64 gsl

hi @lgarrison, this solution worked for me. In case it is interesting, i report the improvement in performance:

Node details:

4x12 Core Skylake Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
512 GB DDR3 1333 MHz

AVX

Running corrfunc with 20000000 particles on 48 threads
DDtheta_mocks took 519.1 seconds
Running corrfunc with 20000000 particles on 96 threads
DDtheta_mocks took 409.5 seconds

AVX512

Running corrfunc with 20000000 particles on 48 threads
DDtheta_mocks took 521.4 seconds
Running corrfunc with 20000000 particles on 96 threads
DDtheta_mocks took 332.8 seconds

The odd thing is how much performance with AVX512 decreases when 48 threads are used.

lgarrison · 2021-12-22T18:22:36Z

Thanks for reporting back! I'm glad you did get a speedup after all. Definitely curious that you don't see a speedup until you use hyperthreading. Maybe the floating-point throughput is not the bottleneck initially, until you use enough threads to mask whatever memory/cache/backend latency is occurring.

You may also be interested in seeing if PR #258 give you extra performance, if your use-case allows for evenly-spaced bins.

lgarrison added 4 commits October 2, 2019 19:30

Disable AVX-512 if using a version of the GNU Assembler with a known …

58e8c74

…AVX-512 bug. Needs testing on other platforms. Fixes #193.

Change escape sequences to use tput instead of echo for better cross-…

a789cd4

…shell compatibility

Fix GNU assembler version detection

87c70e7

Update GAS version detection to query runtime compiler

4817b83

lgarrison requested a review from manodeep October 3, 2019 14:59

lgarrison added this to the v2.3.2 milestone Oct 3, 2019

lgarrison added 2 commits October 3, 2019 15:01

Update changelog [ci skip]

7791b61

Fix grep for osx

adfb79f

Update message color [ci skip]

3f2ed46

Use -xCORE-AVX2 to disable AVX-512 for old Intel compilers

8bfcdeb

Fix sense of icc version test

7b3ef2d

manodeep merged commit 1b0038f into master Oct 7, 2019

manodeep deleted the gas_bug_workaround branch October 11, 2019 17:56

lgarrison mentioned this pull request Dec 21, 2021

script hangs when using unbuffered output #269

Closed

lgarrison mentioned this pull request Aug 12, 2022

Update GNU assembler bug detection #278

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GNU Assembler bug workaround #196

GNU Assembler bug workaround #196

lgarrison commented Oct 3, 2019

manodeep commented Oct 6, 2019

lgarrison commented Oct 6, 2019

manodeep commented Oct 6, 2019

manodeep commented Oct 6, 2019

lgarrison commented Oct 6, 2019

manodeep commented Oct 7, 2019

lgarrison commented Oct 7, 2019

lgarrison commented May 10, 2021 •

edited

Loading

manodeep commented May 10, 2021 •

edited

Loading

lgarrison commented May 12, 2021

manodeep commented May 12, 2021

valerio-marra commented Dec 22, 2021

lgarrison commented Dec 22, 2021

GNU Assembler bug workaround #196

GNU Assembler bug workaround #196

Conversation

lgarrison commented Oct 3, 2019

manodeep commented Oct 6, 2019

lgarrison commented Oct 6, 2019

manodeep commented Oct 6, 2019

manodeep commented Oct 6, 2019

lgarrison commented Oct 6, 2019

manodeep commented Oct 7, 2019

lgarrison commented Oct 7, 2019

lgarrison commented May 10, 2021 • edited Loading

manodeep commented May 10, 2021 • edited Loading

lgarrison commented May 12, 2021

manodeep commented May 12, 2021

valerio-marra commented Dec 22, 2021

lgarrison commented Dec 22, 2021

lgarrison commented May 10, 2021 •

edited

Loading

manodeep commented May 10, 2021 •

edited

Loading