include -ftree-vectorize and -fno-math-errno in default compiler optimisation flags for GCC #2388

bartoldeman · 2018-01-31T14:04:24Z

Since Intel auto-vectorizes at -O2 but not at -O1, and GCC
auto-vectorizes at -O3 but not at -O2 this makes it consistent.

damianam

lgtm. And a sensible thing to do. But I won't merge it yet, I'd like to check with Kenneth first, since this has the potential to break some builds (in theory at least).

damianam · 2018-01-31T15:41:25Z

Oh, it breaks here: https://github.com/easybuilders/easybuild-framework/pull/2388/files#diff-3177e89e1afa2a8305974f5995c31c29R284

Simply because it checks that -v is not present in -O2 -ftree-vectorize -march=native, which it is of course false, because of -vectorize. We need to change this test as well

Since Intel auto-vectorizes at -O2 but not at -O1, and GCC auto-vectorizes at -O3 but not at -O2 this makes it consistent.

This avoids seeing "-v" being used in "-ftree-vectorize".

As -ftree-vectorize is there by default it confuses the test case.

boegel · 2018-02-13T14:53:36Z

@bartoldeman Do you have any idea of the potential negative impact this may have, both in terms of performance and floating-point accuracy of the resulting binaries (or maybe even build problems)?

bartoldeman · 2018-02-14T16:03:50Z

I've applied this at Compute Canada. So far we have seen no issues but I can monitor a bit.
Alternatively this could be a config setting perhaps?

victorusu · 2018-02-16T18:59:17Z

The n-body fortran example implemented here is a good case for this flag. It can help improve performance, at least, for well written fortran codes.

The code is written in such a way that the compiler can auto-vectorize the double loop.

    m = 1                                                                          
    do i = 1, nb                                                                   
       do j = i + 1, nb                                                            
          v(:,i) = v(:,i) - r(:,m) * mass(j) * mag(m)                              
          v(:,j) = v(:,j) + r(:,m) * mass(i) * mag(m)                                   
          m = m + 1                                                                
       end do                                                                      
    end do

Results

Running this code on my laptop with a EasyBuilt version of GCC 7.2.0, I get:

Using the current default EasyBuild flags (`-O2 -march=native`)

gfortran -O2 -march=native main.f90 -o main-f90.x
time ./main-f90.x 50000000
-0.169075164
-0.169059907

real	0m5.624s
user	0m5.561s
sys	0m0.017s

Adding `-ftree-vectorize` to the current default EasyBuild flags (`-O2 -march=native -ftree-vectorize`)

gfortran -O2 -march=native -ftree-vectorize main.f90 -o main-f90.x
iMac:ada-md victor$ time ./main-f90.x 50000000
-0.169075164
-0.169059907

real	0m4.265s
user	0m4.225s
sys	0m0.013s

Compiling with `-O3` level (`-O3 -march=native`)

gfortran -O3 -march=native main.f90 -o main-f90.x
iMac:ada-md victor$ time ./main-f90.x 50000000
-0.169075164
-0.169059907

real	0m4.294s
user	0m4.256s
sys	0m0.011s

gcc details

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/apps/macos/software/Core/GCCcore/7.2.0/libexec/gcc/x86_64-apple-darwin17.4.0/7.2.0/lto-wrapper
Target: x86_64-apple-darwin17.4.0
Configured with: ../configure --enable-languages=c,c++,fortran --enable-checking=release --disable-multilib --enable-shared=yes --enable-static=yes --enable-threads=posix --enable-gold=default --enable-plugins --enable-ld --with-plugin-ld=ld.gold --enable-bootstrap --prefix=/opt/apps/macos/software/Core/GCCcore/7.2.0 --with-local-prefix=/opt/apps/macos/software/Core/GCCcore/7.2.0
Thread model: posix
gcc version 7.2.0 (GCC)

bartoldeman · 2018-03-01T02:49:33Z

@boegel GCC is conservative by default (vs. Intel which is loose by default but EB's -fp-model source makes it much more precise).

as far as I understand -ftree-vectorize without -ffast-math (or one of its sub-options, -funsafe-math-optimizations, -fassociative-math etc.) will not cause any changes in FP results, that is, vectorized sum reductions are not enabled.

Helios07 · 2018-03-01T11:59:33Z

I make some comparisons with a code that vectorizes quite good (Newtonian interaction of some thousand particles). I tested on a Xeon Skylake 6140 dual socket machine (36 cores, AVX2). Intel 2018a and GNU 6.4 were used.

icpc, -O3 -xHost: 10.6761s
icpc, -O3: 24.8094s
g++: -O2 -march=native: 89.756s
g++: -O2 -march=native -ftree-vectorize: 88.8382s
g++: -O3 -march=native -ftree-vectorize: 88.2167s
g++: -Ofast -march=native -ftree-vectorize: 13.4951s

It seems like some other flag is needed for the vectorization.

victorusu · 2018-03-02T08:35:24Z

According to the Auto-vectorization in GCC guide,

Vectorization is enabled by the flag -ftree-vectorize and by default at -O3.

Helios07 · 2018-03-02T11:16:54Z

Your are right, vectorization is enabled with -O3. But in this case, it does not explain the difference between the times measured with GNU and Intel. GNU seems to be more precise, but slower, unless the flag -ffast-math is used.
I repeated my measurement with slightly bigger parameters (they are not comparable to those above, but hopefully a bit more reliable) and added -ffast-math explicitely in one measurement:

icpc, -O3 -xHost: 21.4103s
icpc, -O3: 36.4158s
g++: -O2 -march=native: 119.651s
g++: -O2 -march=native -ftree-vectorize: 117.14s
g++: -O3 -march=native -ftree-vectorize: 119.059s
g++: -Ofast -march=native -ftree-vectorize: 23.0479s
g++: -O3 -march=native -ftree-vectorize -ffast-math: 25.0692s
g++: -O2 -march=native -ffast-math: 24.8151s

ocaisa · 2018-03-02T11:28:31Z

@Helios07 What you've done is helpful and makes a lot of sense, if you're not allowed to reorder multiplication then vectorisation is going to be difficult.

From GCC (7.2):

-ffast-math
  Sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast.

Do people think it is worthwhile cherry-picking from this set to be more conservative? Is -fassociative-math enough?

bartoldeman · 2018-03-02T13:19:01Z

@Helios07 that is a good case for the 'loose' and 'veryloose' precision settings.
please try this too (easybuild default for icpc):
icpc -O2 -ftz -fp-speculation=safe -fp-model source
(with 'loose' you get -fp-model fast=1 which is actually the default for icpc if you don't give it any options, and 'veryloose' gives -fp-model fast=2)

I think gcc 'loose' and 'veryloose' settings should get some/all of the -ffast-math options, and not just -mrecip -mno-ieee-fp but that needs some further testing.

What do you get with?
g++ -O2 -ftree-vectorize -mrecip -mno-ieee-fp
(present 'loose') and
g++ -O2 -ftree-vectorize -mrecip=all -mno-ieee-fp
(present 'veryloose')

Note that -fassociative-math allows to change FP evaluation of a+b+c from (a+b)+c to a+(b+c), and the more aggressive options will even disregards () put there by programmers (ie. a+(b+c) => (a+b)+c), at least in Fortran (not sure about C/C++).

Helios07 · 2018-03-02T14:16:38Z

Here you are:

icpc -O2 -ftz -fp-speculation=safe -fp-model source: 30.9678s
icpc -O2 -ftz -fp-speculation=safe -fp-model fast=1: 36.5549s
icpc -O2 -ftz -fp-speculation=safe -fp-model fast=2: 37.3925s
g++ -O2 -ftree-vectorize -mrecip -mno-ieee-fp: 118.231s
g++ -O2 -ftree-vectorize -mrecip=all -mno-ieee-fp: 118.291s

If you want to try this by yourself, I am using a slightly modified version of this code:
https://github.com/ryancoleman/lotsofcoresbook1code/tree/master/Pearls1_Chapter10/Pearl

bartoldeman · 2018-03-02T15:55:33Z

Thanks. However I forgot a few things. To properly compare we need this:

icpc -xHost -O2 -ftz -fp-speculation=safe -fp-model source
icpc -xHost -O2 -fp-model fast=1
icpc -xHost -O2 -fp-model fast=2
g++ -march=native -O2 -ftree-vectorize -mrecip -mno-ieee-fp
g++ -march=native -O2 -ftree-vectorize -mrecip=all -mno-ieee-fp

Helios07 · 2018-03-04T16:40:27Z

No problem.
icpc -xHost -O2 -ftz -fp-speculation=safe -fp-model source: 21.5741s
icpc -xHost -O2 -fp-model fast=1: 22.2876s
icpc -xHost -O2 -fp-model fast=2: 21.3196s
g++ -march=native -O2 -ftree-vectorize -mrecip -mno-ieee-fp: 117.894s
g++ -march=native -O2 -ftree-vectorize -mrecip=all -mno-ieee-fp: 118.806s

Helios07 · 2018-03-06T16:40:03Z

@ocaisa I tried some cherry picking:

g++ -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno -funsafe-math-optimizations -ffinite-math-only -fno-rounding-math -fno-signaling-nans -fcx-limited-range (=-ffast-math according to gcc 6.4 which I used): 24.3423s
g++ -fopenmp -O2 -ftree-vectorize -march=native: 118.356s
g++ -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno: 92.9929s
g++ -fopenmp -O2 -ftree-vectorize -march=native -funsafe-math-optimizations: 118.294s
g++ -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno -funsafe-math-optimizations: 25.3898s
g++ -fopenmp -O2 -ftree-vectorize -march=native -funsafe-math-optimizations -ffinite-math-only -fno-rounding-math -fno-signaling-nans -fcx-limited-range: 119.785s
g++ -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno -ffinite-math-only -fno-rounding-math -fno-signaling-nans -fcx-limited-range: 93.9554s

It seems to me like the combination of the flags -fno-math-errno -funsafe-math-optimizations are the key here. Unfortunately, I have no idea, how dangerous they are.

bartoldeman · 2018-03-06T18:24:03Z

The gcc manual states that -funsafe-math-optimizations enables '-fno-signed-zeros', '-fno-trapping-math', '-fassociative-math' and '-freciprocal-math' so perhaps one of those 4 already do the trick?
I'll have a look at the code myself too.

bartoldeman · 2018-03-06T18:47:04Z

-fno-math-errno -fno-signed-zeros -fno-trapping-math -fassociative-math is the minimal set that does the trick for me with GCC. (GCC manual states that -fno-signed-zeros -fno-trapping-math is needed for -fassociative-math)

bartoldeman · 2018-03-06T19:47:21Z

The full story is as follows here:

the code is somewhat special since it uses #pragma simd, which is understood by Intel to force vectorization. Nowadays we should use #pragma omp simd.

Now if you do NOT use #pragma omp simd or #pragma simd (Intel only)
icc -xHost -O2 -ftz -fp-speculation=safe -fp-model source
and
gcc -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno
will both NOT vectorize: you will need to use -fp-model fast (icc) or at least
-fno-math-errno -fno-signed-zeros -fno-trapping-math -fassociative-math for GCC.

If you DO use the following loop with #pragma omp simd

#pragma omp simd reduction(+:pi,axi,ayi,azi)
      for (int j=0; j<N; j++) {
        float dx = x[j] - xi;
        float dy = y[j] - yi;
        float dz = z[j] - zi;
        float R2 = dx * dx + dy * dy + dz * dz + EPS2;
        float invR = 1.0f / sqrtf(R2);
        float invR3 = m[j] * invR * invR * invR;
        pi += m[j] * invR;
        axi += dx * invR3;
        ayi += dy * invR3;
        azi += dz * invR3;
      }

then both GCC and Intel vectorize with the above options (no need for -fassociative-math etc), however Intel does a better job (better than with the auto vectorizer in fact!). GCC gets to full speed with -funsafe-math-optimizations but for some reason not the other optimizations.

Anyway:

it would be good to consider -fno-math-errno in default GCC options, as icc does not enable it by default.
loose precision for gcc should at least include -fno-math-errno -fno-signed-zeros -fno-trapping-math -fassociative-math then; veryloose could use -ffast-math
the rest is all tuning of "omp simd"

boegel · 2018-03-14T19:03:56Z

@bartoldeman As discussed during the EasyBuild conf call today:

Can you enhance this to also use -fno-math-errno by default, since that doesn't impact floating-point accuracy? Once you did, I'll look into rebuilding CP2K 5.1 with foss/2018a on top of this on Haswell & Skylake.
Do we need to take similar measures for Clang & PGI, or do they vectorise already by default at -O2? (cc @victorusu, @geimer)?
For changing the loose and or veryloose toolchain options, please follow up in a separate PR

bartoldeman · 2018-03-21T12:59:28Z

Thanks. Just wanted clarification to reproduce.

Now the only difference with your setup (if you used unmodified easyconfigs for foss-2018a) is probably that I had to build everything up to GCCcore on a Broadwell node, and only then switched to a Skylake node (which has no system gcc installed). I would not expect that where you build GCCcore makes a difference but you never know.

By the way the one and only WRONG result I get is always Fist/regtest-3/2d_pot.inp.out

boegel · 2018-03-22T11:22:29Z

@bartoldeman Same WRONG test for me when only a single test fails.

13 tests that fail on Skylake (when only rebuilding CP2K with -ftree-vectorize):

Fist/regtest-3/2d_pot.inp.out
TMC/regtest/TMC_walltime.inp.out
TMC/regtest/TMC_1pot_H2ONH4.inp.out
TMC/regtest/TMC_prot_reorder.inp.out
TMC/regtest/TMC_atom_swap_test.inp.out
TMC/regtest/TMC_PT.inp.out
TMC/regtest/TMC_NPT.inp.out
TMC/regtest/TMC_NPT_2pot.inp.out
TMC/regtest/TMC_NPT_2pot_2.inp.out
TMC/regtest/TMC_NPT_2pot_PT.inp.out
TMC/regtest/TMC_test_restart_0.inp.out
TMC/regtest/TMC_test_restart_1.inp.out
TMC/regtest_ana_post_proc/TMC_ana_density.inp.out

For example (other are pretty similar):

/tmp/CP2K/5.1/foss-2018a/TEST-Linux-x86-64-foss-popt-2018-03-22_10-54-18/TMC/regtest/TMC_test_restart_1.inp.out :
 7605  Total energy: : ref = 0.55218050338752289 new = -999
 7606  relative error :   1.00055273e+00 >  numerical tolerance = 1.0E-14

I'll now try rebuilding the dependencies first with -ftree-vectorize, and then rebuilding CP2K on top of that...

boegel · 2018-03-22T16:53:58Z

@bartoldeman Same 13 tests still fail after rebuilding Libint, libxc, libxsmm, FFTW, PLUMED and then CP2K on top of it...

I'll now rebuild the whole shebang, let's see if that helps...

bartoldeman · 2018-03-22T17:02:37Z

I'll have a look at those tests... the -999 looks rather suspicious.

bartoldeman · 2018-03-22T17:10:29Z

@boegel when I grep for -999 in cp2k I find this:

./cp2k-5.1/src/tmc/tmc_setup.F:            WRITE (output_unit, *) "TMC|NOTenoughProcessorsX= -999"
./cp2k-5.1/src/tmc/tmc_setup.F:            WRITE (output_unit, *) "TMC|NOTcalculatedTotal energy: -999"
./cp2k-5.1/src/tmc/tmc_setup.F:            WRITE (output_unit, *) "TMC|ANAtestOutputInitX= -999"

I wonder if the tests on the skylake node have enough processors available. Could you have a look at your core-pinning there?

boegel · 2018-03-23T09:58:36Z

@bartoldeman Turns out I was re-testing CP2K in an environment with just a single core available, while the tests that were failing require at least 2 cores to be available...

I tested again on a full node with proper core pinning, and then I also saw just a single failing test (the one that fails consistently).

Conclusion: including -ftree-vectorize indeed doesn't do any harm to the CP2K test suite, which serves as a good canary imho.

boegel · 2018-03-23T10:13:13Z

@bartoldeman Some minor tweaks in ComputeCanada#4, with that included this is good to go imho.

use DEFAULT_OPT_LEVEL constant + fix minor style issues

bartoldeman · 2018-03-23T12:47:39Z

Thanks, at least that mystery is solved then!

We can still disable vectorization by setting 'lowopt' -- do you still also want the vectorize toolchain option?

boegel · 2018-03-23T13:32:49Z

@bartoldeman Ah, yes, thanks for reminding me... Having a toolchain option to selectively disable --ftree-vectorize would indeed make a lot of sense.

Disabling -ftree-vectorize at lowopt makes sense to.

bartoldeman · 2018-03-23T19:09:08Z

@boegel I am looking at the 'vectorize' toolchainopt but having trouble. Certainly won't be able to do it in an hour on a Friday afternoon.

If I put it in COMPILER_UNIQUE_OPTION_MAP I can't put a "True" and a "False" option without digging much more deeply right, ie,

'vectorize': ['ftree-vectorize', '-fno-tree-vectorize'],

will not work. What is the best way here, adding a 'novectorize' too? Or just doing it in code in _set_compiler_toolchainoptions for every compiler?

bartoldeman · 2018-03-26T01:17:27Z

one way I think of is to use

'vectorize': ['ftree-vectorize'],
'novectorize': ['fno-tree-vectorize'],

in the mapping but only expose 'vectorize' as a valid toolchain opt. Then some code can set 'novectorize' to True if 'vectorize' == False.

If set to True or False, will explicitly enable or disable auto-vectorization. Otherwise (if left to None) noopt and lowopt will not vectorize and other optimzations levels will vectorize.

Modifying in place changes the original optflags which messes up further tests in the toolchain tests.

boegel · 2018-03-26T20:03:06Z

easybuild/tools/toolchain/compiler.py

@@ -88,6 +88,8 @@ class Compiler(Toolchain):
        'static': (False, "Build static library"),
        '32bit': (False, "Compile 32bit target"),  # LA, FFTW
        'openmp': (False, "Enable OpenMP"),
+        'vectorize': (None, "Enable compiler auto-vectorization, default except for noopt and lowopt"),
+        'novectorize': (None, "Disable compiler auto-vectorization, default for noopt and lowopt"), # not set, only used to map


@bartoldeman Why do we need both? Can't we detect True or False (vs None) for vectorize instead?

the problem is in the mappings in gcc.py etc. I cannot map 'vectorize' to '-ftree-vectorize' for True and to '-fno-tree-vectorize' for False, unless those mappings get redesigned of course.

Hmm. I could use ['fno-tree-vectorize','ftree-vectorize'] with a bit of code to select first or second. It's just inconsistent with the current use. Maybe use a dict to distinguish:
{False: 'fno-tree-vectorize', True: 'ftree-vectorize'}

This avoids the need for 'novectorize' and makes for more compact logic.

boegel · 2018-03-30T11:54:04Z

easybuild/tools/toolchain/compiler.py

@@ -246,7 +247,17 @@ def _set_compiler_flags(self):

        # 1st one is the one to use. add default at the end so len is at least 1
        optflags = [self.options.option(x) for x in self.COMPILER_OPT_FLAGS if self.options.get(x, False)] + \
-                   [self.options.option(default_opt_level)]
+                   [self.options.option(default_opt_level)][:1]


@bartoldeman To be correct, this should be this I think...

optflags = ([...] + [...])[:1]

boegel · 2018-03-30T11:56:52Z

easybuild/tools/toolchain/compiler.py

+            vectflags = self.options.option('vectorize')[self.options['vectorize']]
+            # avoid double use of such flags
+            if isinstance(optflags[0], list):
+                optflags[0] = [flag for flag in optflags[0] if flag not in self.options.option('vectorize').values()]


flag not in vectflags?

@boegel

Addresses @boegel's review.

boegel · 2018-03-31T07:34:42Z

lgtm, and works as expected after a quick test, so good to go

I'll try and do a partial regtest (foss/2017b + foss/2018a easyconfigs) on top of this to see if this causes any trouble...

boegel · 2018-06-15T12:40:55Z

First example of this biting us in the ass: easybuilders/easybuild-easyconfigs#6461 .

akesandgren · 2021-02-25T12:35:18Z

And the next one is GROMACS/2021 with foss/2020a getting numerical errors due to -fno-math-errno.

-fno-math-errno should NOT be turned on by default since it has numerical impact.

https://gromacs.bioexcel.eu/t/test-failure-in-fourcenter-tests-in-gromacs-2021/1633

damianam previously approved these changes Jan 31, 2018

View reviewed changes

bartoldeman dismissed damianam’s stale review via 9c8098c February 1, 2018 09:24

bartoldeman added 3 commits February 8, 2018 13:18

Add -ftree-vectorize to defaultopt='-O2' for GCC.

18d73dc

Since Intel auto-vectorizes at -O2 but not at -O1, and GCC auto-vectorizes at -O3 but not at -O2 this makes it consistent.

test_misc_flags_shared: split flags.

1c8842b

This avoids seeing "-v" being used in "-ftree-vectorize".

Use march=nocona instead of -ftree-vectorize in optarch test.

2d44ee4

As -ftree-vectorize is there by default it confuses the test case.

bartoldeman force-pushed the gcc_tree_vectorize branch from 7a8f6a0 to 2d44ee4 Compare February 8, 2018 13:18

easybuilders deleted a comment from boegelbot Feb 9, 2018

boegel added this to the 3.x milestone Feb 13, 2018

boegel added the change label Feb 13, 2018

boegel modified the milestones: 3.x, 3.6.0 Mar 14, 2018

use DEFAULT_OPT_LEVEL constant + fix minor style issues

6b32291

boegel mentioned this pull request Mar 23, 2018

use DEFAULT_OPT_LEVEL constant + fix minor style issues ComputeCanada/easybuild-framework#4

Merged

Merge pull request #4 from boegel/gcc_tree_vectorize

4b9176e

use DEFAULT_OPT_LEVEL constant + fix minor style issues

bartoldeman added 2 commits March 26, 2018 17:23

Introduce 'vectorize' toolchainopt.

0963663

If set to True or False, will explicitly enable or disable auto-vectorization. Otherwise (if left to None) noopt and lowopt will not vectorize and other optimzations levels will vectorize.

We need a copy of optflags[0] instead of modifying in place.

71dbf6d

Modifying in place changes the original optflags which messes up further tests in the toolchain tests.

boegel reviewed Mar 26, 2018

View reviewed changes

bartoldeman added 2 commits March 26, 2018 22:55

Use dictionary with boolean keys for 'vectorize' toolchainopt.

6a962a2

This avoids the need for 'novectorize' and makes for more compact logic.

Test both 'vectorize': True and 'vectorize': False toolchainopts.

45f550a

easybuilders deleted a comment from boegelbot Mar 30, 2018

boegel added the enhancement label Mar 30, 2018

boegel requested changes Mar 30, 2018

View reviewed changes

vectorize: fix [:1] syntax and clarify avoiding double flags.

ed7494d

Addresses @boegel's review.

boegel approved these changes Mar 31, 2018

View reviewed changes

boegel merged commit 95b57e9 into easybuilders:develop Mar 31, 2018

boegel mentioned this pull request Jun 15, 2018

lstopo segfault on CentOS 7 with GCC 7.3 -O2 -ftree-vectorize open-mpi/hwloc#315

Closed

boegel mentioned this pull request Mar 3, 2021

Using -fno-math-errno by default is causing numerical errors #3598

Open

include -ftree-vectorize and -fno-math-errno in default compiler optimisation flags for GCC #2388

include -ftree-vectorize and -fno-math-errno in default compiler optimisation flags for GCC #2388

Conversation

bartoldeman commented Jan 31, 2018

damianam left a comment

Choose a reason for hiding this comment

damianam commented Jan 31, 2018

boegel commented Feb 13, 2018

bartoldeman commented Feb 14, 2018

victorusu commented Feb 16, 2018 • edited Loading

Results

Using the current default EasyBuild flags (-O2 -march=native)

Adding -ftree-vectorize to the current default EasyBuild flags (-O2 -march=native -ftree-vectorize)

Compiling with -O3 level (-O3 -march=native)

gcc details

bartoldeman commented Mar 1, 2018

Helios07 commented Mar 1, 2018

victorusu commented Mar 2, 2018

Helios07 commented Mar 2, 2018

ocaisa commented Mar 2, 2018 • edited Loading

bartoldeman commented Mar 2, 2018 • edited Loading

Helios07 commented Mar 2, 2018

bartoldeman commented Mar 2, 2018

Helios07 commented Mar 4, 2018

Helios07 commented Mar 6, 2018

bartoldeman commented Mar 6, 2018

bartoldeman commented Mar 6, 2018

bartoldeman commented Mar 6, 2018 • edited Loading

boegel commented Mar 14, 2018

bartoldeman commented Mar 21, 2018

boegel commented Mar 22, 2018

boegel commented Mar 22, 2018

bartoldeman commented Mar 22, 2018

bartoldeman commented Mar 22, 2018

boegel commented Mar 23, 2018

boegel commented Mar 23, 2018

bartoldeman commented Mar 23, 2018

boegel commented Mar 23, 2018

bartoldeman commented Mar 23, 2018 • edited Loading

bartoldeman commented Mar 26, 2018

boegel Mar 26, 2018

Choose a reason for hiding this comment

bartoldeman Mar 26, 2018

Choose a reason for hiding this comment

bartoldeman Mar 26, 2018

Choose a reason for hiding this comment

boegel Mar 30, 2018

Choose a reason for hiding this comment

boegel Mar 30, 2018

Choose a reason for hiding this comment

boegel commented Mar 31, 2018

boegel commented Jun 15, 2018

akesandgren commented Feb 25, 2021

victorusu commented Feb 16, 2018 •

edited

Loading

Using the current default EasyBuild flags (`-O2 -march=native`)

Adding `-ftree-vectorize` to the current default EasyBuild flags (`-O2 -march=native -ftree-vectorize`)

Compiling with `-O3` level (`-O3 -march=native`)

ocaisa commented Mar 2, 2018 •

edited

Loading

bartoldeman commented Mar 2, 2018 •

edited

Loading

bartoldeman commented Mar 6, 2018 •

edited

Loading

bartoldeman commented Mar 23, 2018 •

edited

Loading