Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unconditional NO_AVX512=1 for flang builds #4789

Open
h-vetinari opened this issue Jul 8, 2024 · 13 comments
Open

Remove unconditional NO_AVX512=1 for flang builds #4789

h-vetinari opened this issue Jul 8, 2024 · 13 comments

Comments

@h-vetinari
Copy link
Contributor

I want to figure out what's happening with the errors with flang when using the default build options on a platform that supports AVX512. This problem was already observed in #4016 (CC @mmuetzel), leading to work-arounds like the following

# Compiling with Flang 16 seems to cause test errors on machines
# with AVX512 instructions. Revisit after MSYS2 distributes Flang 17.
no-avx512-flags: -DNO_AVX512=1

However, these problems still occur in conda-forge with the in-progress flang 19, almost 3 releases later (c.f. #4768); more precisely, the errors are

The following tests FAILED:
	  5 - sblas3 (Failed)
	  8 - dblas3 (Failed)
	 11 - cblas3 (Failed)
	 15 - zblas3 (Failed)

which matches what happened in #4016. There are some more detailed failure logs in that PR that I haven't yet tried to reproduce.

Before raising an upstream bug report, I first would like to properly understand what's happening in OpenBLAS itself, because for now I haven't been able to construct the link between NO_AVX512 and any fortran code.

Running on azure pipelines, we're getting skylakex agents regularly, which have some AVX512 instructions and thus fall into the above failures (there are still some non-AVX512 agents around; when I caught one, the tests passed). As hoped, adding NO_AVX512=1 does in fact cause the tests to pass, with the following difference in configuration:

 Running getarch
 GETARCH results:
-CORE=SKYLAKEX
-LIBCORE=skylakex
+CORE=HASWELL
+LIBCORE=haswell
 NUM_CORES=2
 HAVE_MMX=1
 HAVE_SSE=1
 HAVE_SSE2=1
 HAVE_SSE3=1
 HAVE_SSSE3=1
 HAVE_SSE4_1=1
 HAVE_SSE4_2=1
 HAVE_AVX=1
 HAVE_AVX2=1
-HAVE_AVX512VL=1
 HAVE_FMA3=1
 MAKEFLAGS += -j 2

The macro HAVE_AVX512VL doesn't appear often outside of the config setup, basically the only usage AFAICT is

// distribute
#if defined(HAVE_AVX512VL) || defined(HAVE_AVX512BF16)
#include "intrin_avx512.h"

What I don't understand is how intrin_avx512.h influences any fortran code.

@h-vetinari
Copy link
Contributor Author

Before raising an upstream bug report, I first would like to properly understand what's happening in OpenBLAS itself, because for now I haven't been able to construct the link between NO_AVX512 and any fortran code.

To repeat, I'm aware that this most likely needs upstream fixes, which I want to help bring about. Still, this issue makes sense from my POV as figuring out the OpenBLAS side of things, as well as tracking the removal of those work-arounds.

@martin-frbg
Copy link
Collaborator

This is probably not related to use of AVX512 intrinsics enabled by the macro, but to the -march=skylakex-avx512 option carrying over from CCOMMON_OPT to FCOMMON_OPT when the TARGET is SKYLAKEX

@martin-frbg
Copy link
Collaborator

(although the skylakex-avx512 option should get filtered out since 52b71a1 (March 22 of this year), as it is no longer supported by more recent versions of flang-new)

@h-vetinari
Copy link
Contributor Author

as it is no longer supported by more recent versions of flang-new

It wasn't supported before either, AFAIU it's just that flang 18 started erroring on unknown flags. In any case, the builds in conda-forge/openblas-feedstock#115 are based on 0.3.27, so contain 52b71a1

I genuinely don't know if it's a bug/misconfiguration in OpenBLAS, or a compiler error in flang. In any case, if we can figure out (or someone can explain to me) how NO_AVX512 affects any fortran targets, I can raise an upstream bug to involve the flang folks.

@martin-frbg
Copy link
Collaborator

I must admit I do not see how NO_AVX512 can influence fortran targets - unless this is some weird register usage problem in their Fortran/C interoperability.
Note that we already build the GEMM kernels for AVX512-capable targets with the -fexhaustive-register-search option to clang to avoid a build failure ("inline assembly requires more registers than are available"), perhaps this changes register usage in ways not expected by the flang-new ABI (on Windows?). (IIRC the errors observed in #4016 were suggestive of argument passing going wrong , the Fortran code seeing GEMM results where the input arguments were supposedly trashed - the equivalent CBLAS tests however completed successfully).
Interestingly, Azure CI now always gives me AMD EPYC systems without AVX512 when I try to reproduce the old problem...

@martin-frbg
Copy link
Collaborator

reconfiguring my SkylakeX system for testing locally.

@h-vetinari
Copy link
Contributor Author

Not sure if this'll play any role in this, but flang just gained support for -mtune: llvm/llvm-project@f1d3fe7

@martin-frbg
Copy link
Collaborator

may be useful for future performance (or to introduce more fma-related deviations in the lapack test results...). my local test hit an unexpected problem in that some "#include"s of the actual sources by the cmake-generated files cannot be resolved by make, although the exact same absolute paths work for browsing the affected files. can't remember getting this before with msys...

@martin-frbg
Copy link
Collaborator

include problem solved (path apparently too long) but flang 18.1.8 fails when compiling LAPACK's slamch (which does divisions by huge and near-zero numbers to determine machine constants). need to check if this was already reported/fixed

@h-vetinari
Copy link
Contributor Author

If you're able to use conda-forge compilers, you could install

conda install -c conda-forge/label/llvm_dev flang=19

which should give you a flang built off of llvm/llvm-project@3bb2563, so ~2 weeks old. LLVM 19 branches in about a week; I plan to have rc1 built soon after.

@martin-frbg
Copy link
Collaborator

thank you. unfortunately I had a few problems with my miniconda installation - finally got a build that reproduces the blas3 test failures but have not found yet what causes them

@martin-frbg
Copy link
Collaborator

martin-frbg commented Oct 21, 2024

Hmm, I'm trying to revisit this now with the current 19.1 binaries from conda-forge (conda install -c conda-forge flang=19 and cmake from there as well) in a VM based on Microsoft's Windows 11 for Developers (WinDev2407Eval) with VS2022. Strangely I do not even get cmake to recognize flang-new (cmake invokes it with -cc1 instead of -fc1 which fails, later on devenv throws a usage error as well). Have I somehow forgotten how to do things, is there a known incompatibility with VS2022/cmake-3.30.5 or something special about the condaforge LLVM19 package that I am not aware of ?

NVM, I'm just too tired, forgot to add -G Ninja - and the "Anaconda Powershell" appears to be incompatible with MSVC's vcvarsall.bat so I had no mt.exe as well.

@martin-frbg
Copy link
Collaborator

So the conda-forge LLVM 19.1 (used in conjunction with VS2022) appears to work correctly even with AVX512 enabled.

I am currently experiencing two problems with this setup though - with BUILD_STATIC_LIBS, all tests using CSCAL or ZSCAL fail to link due to an unresolved symbol __imp__fdclass (that is probably related to the use of isinf and isnan in these two kernels).
With BUILD_SHARED_LIBS, all BLAS tests for error exits fail while the numerical ones pass - this is probably caused by picking up the wrong version of XERBLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants