Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.3.8 tests returns fatal errors on Skylake with GCC 9.2.0 #2408

Closed
akesandgren opened this issue Feb 11, 2020 · 20 comments
Closed

0.3.8 tests returns fatal errors on Skylake with GCC 9.2.0 #2408

akesandgren opened this issue Feb 11, 2020 · 20 comments
Milestone

Comments

@akesandgren
Copy link

akesandgren commented Feb 11, 2020

We get multiple fatal errors when running the tests on Skylake systems.
CFLAGS = -O2 -ftree-vectorize -march=native -fno-math-errno

For instance:

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
           EXPECTED RESULT   COMPUTED RESULT
       1      0.267033          0.267033    
       2     -0.335365         -0.335365    
       3     -0.308392         -0.308392    
       4      0.347952          0.347952    
       5     -0.476523E-01     -0.476523E-01
       6     -0.200500         -0.200500    
       7      0.276024          0.276024    
       8     -0.416284         -0.416284    
       9      0.419880          0.466533    
      10      0.383916          0.426573    
      11      0.410889          0.456543    
 ******* DGEMV  FAILED ON CALL NUMBER:
   2176: DGEMV ('N', 11,  7, 0.0, A, 12, X, 1, 0.9, Y, 2)         .

 DGBMV  PASSED THE TESTS OF ERROR-EXITS

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
           EXPECTED RESULT   COMPUTED RESULT
       1      0.276024          0.276024    
       2     -0.416284         -0.416284    
       3      0.419880          0.419880    
       4      0.872128E-01      0.872128E-01
       5      0.383916          0.383916    
       6      0.168132          0.168132    
       7     -0.344356         -0.344356    
       8     -0.227473         -0.227473    
       9     -0.380320         -0.422577    
      10      0.962038E-01      0.106893    
      11      0.240060          0.266733    
 ******* DGBMV  FAILED ON CALL NUMBER:
   8656: DGBMV ('N', 11,  7,  0,  0, 0.0, A,  2, X, 1, 0.9, Y, 2) .

Any thoughts on that?

@akesandgren
Copy link
Author

PS, a build on Broadwell with the same config settings works as it should

@akesandgren akesandgren changed the title 0.3.8 tests returns fatal errors on Skylake 0.3.8 tests returns fatal errors on Skylake with GCC 9.2.0 Feb 11, 2020
@akesandgren
Copy link
Author

Currently rebuilding without vectorize...

@lexming
Copy link

lexming commented Feb 11, 2020

I confirm the same error on a Intel Xeon Gold 6126 CPU using GCC-9.2.0.

The build command is make BINARY='64' CC='gcc -ftree-vectorize' FC='gfortran -ftree-vectorize' USE_OPENMP='1' USE_THREAD='1'
Specifically, the compilation option triggering those errors is -ftree-vectorize.

List of failed tests:

  • DGEMV
  • DGBMV
  • DSYMV
  • DSBMV
  • DSPMV
  • cblas_dgemv

@akesandgren
Copy link
Author

confirmed, removing -ftree-vectorize makes the problem go away.

@akesandgren
Copy link
Author

Would be nice if there was a proper fix for this.

@martin-frbg
Copy link
Collaborator

Disabling the dgemv_n microkernel for SkylakeX in kernel/x86_64/dgemv_n_4.c would be my first bet. (Unless 0.3.7 or earlier worked with -ftree-vectorize - that file is unchanged from 0.3.4 or so)

@akesandgren
Copy link
Author

It didn't work with vectorize before either, just bringing this up again, since we accidentally forgot to turn if off at the first build.

@bartoldeman
Copy link
Contributor

0.3.7 fails too, but not with GCC 8.3.0, only with GCC 9.2.0.

@Diazonium
Copy link
Contributor

Similar issues happened before I think, when a newer GCC version started shuffling/clobbering different registers. @wjc404 did a lot of work on the AVX-512 assembly, maybe he has a better idea what goes wrong.

@bartoldeman
Copy link
Contributor

The test fails with alpha=0, which means it's actually an issue with DSCAL I suspect.

@bartoldeman
Copy link
Contributor

Indeed putting the generic
DSCALKERNEL = ../arm/scal.c
in kernel/x86_64/KERNEL.SKYLAKEX
fixes the failures. Now digging deeper.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Feb 11, 2020

Passes with a snapshot of gcc 10 ... possible gcc9 bug ? Trying attribute(no_tree_vectorize) on the dscal_kernel_8 in dscal_microk_skylakex-2.c now... works.
Now why would the gcc tree vectorizer fall over
https://github.com/xianyi/OpenBLAS/blob/cb6ef49857719b64c5f882e32957f5de2fb1d302/kernel/x86_64/dscal_microk_skylakex-2.c#L35-L44 ?

@bartoldeman
Copy link
Contributor

noinline does the trick too. What is weird is that the increment is 2, and if I read the code correctly dscal_kernel_8 should not even be invoked for that case.

@akesandgren
Copy link
Author

dscal_kernel_8 noinline confirmed as a working solution.
I.e.,
static void dscal_kernel_8( BLASLONG n, FLOAT *alpha, FLOAT *x) attribute ((noinline));

Using
static void dscal_kernel_8( BLASLONG n, FLOAT *alpha, FLOAT *x) attribute ((no_tree_vectorize));
does not work.

@martin-frbg
Copy link
Collaborator

Strange - no_tree_vectorize definitely worked for me, only difference is that I prepended it as
__attribute__((optimize("no-tree-vectorize"))) static void...

@akesandgren
Copy link
Author

Ok, will check that way then...

@akesandgren
Copy link
Author

Confirmed, the no-tree-vectorize works:
static void dscal_kernel_8( BLASLONG n, FLOAT *alpha, FLOAT *x) attribute ((optimize("no-tree-vectorize")));

@martin-frbg
Copy link
Collaborator

As another datapoint, compilation with gcc 7.4 also shows no problem, so it looks as if it is only 9.x that miscompiles the dscal kernel.

@bartoldeman
Copy link
Contributor

The issue is in dscal.c, so far openblas has been lucky it hasn't caused issues before: warning - tabs corrupted, I'll file a PR:

--- dscal.c.orig        2020-02-12 13:53:48.831716193 -0000
+++ dscal.c     2020-02-12 13:55:28.026247370 -0000
@@ -137,10 +137,10 @@
        "jnz    1b                                          \n\t"
 
         :
-          "+r" (n)      // 0
+          "+r" (n),     // 0
+          "+r" (x),     // 1
+          "+r" (x1)     // 2
         :
-          "r" (x),      // 1
-          "r" (x1),     // 2
           "r" (alpha),  // 3
           "r" (inc_x),  // 4
           "r" (inc_x3)  // 5

@martin-frbg
Copy link
Collaborator

martin-frbg commented Feb 12, 2020

Ouch. Here we go again... guess the earlier clobber list should have warned me that it is not just n
that needs to be flagged as input/output... The same bug is probably in at least some of the other scal kernels that got the (incomplete) fix from #2010 as well.

martin-frbg added a commit that referenced this issue Feb 12, 2020
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
marxin pushed a commit to marxin/OpenBLAS that referenced this issue Feb 17, 2020


The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they
must be declared as input/output constraints, otherwise the compiler
may assume the corresponding registers are not modified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants