Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark results #16

Open
certik opened this issue Oct 18, 2023 · 8 comments
Open

Benchmark results #16

certik opened this issue Oct 18, 2023 · 8 comments

Comments

@certik
Copy link
Contributor

certik commented Oct 18, 2023

As of e7747e6 on Apple M1 Max and GFortran 11.3.0:

$ fpm test --profile=release --flag "-ffast-math -march=native" test_dft_schroed_fast --verbose
[...]
+ build/gfortran_565E65E7876A06C6/test/test_dft_schroed_fast
[...]
$ fpm test --profile=release --flag "-ffast-math -march=native" test_dft_dirac_fast --verbose
[...]
+ build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast 
[...]

And then benchmark using:

$ time build/gfortran_565E65E7876A06C6/test/test_dft_schroed_fast
 SCF convergence error:  0.28258872238814092     
 SCF convergence error:   8.2495113254026364E-003
 SCF convergence error:   5.2116623764959513E-003
 SCF convergence error:   2.1089267579554871E-003
 SCF convergence error:   1.6365563510589709E-005
 SCF convergence error:   8.5098749877943192E-006
 SCF convergence error:   4.4540042836160865E-006
 SCF convergence error:   1.6731351859533561E-008
 SCF convergence error:   7.0714598621179903E-009
 SCF convergence error:   4.5291272954273154E-009
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -25658.41788786 -25658.41788885  9.92E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -3689.35513954  -3689.35513984  2.99E-07
   2   -639.77872802   -639.77872809  7.10E-08
   3   -619.10855022   -619.10855018  3.79E-08
   4   -161.11807323   -161.11807321  1.58E-08
   5   -150.97898021   -150.97898016  4.68E-08
   6   -131.97735833   -131.97735828  4.30E-08
   7    -40.52808426    -40.52808425  1.17E-08
   8    -35.85332086    -35.85332083  2.56E-08
   9    -27.12321233    -27.12321230  3.27E-08
  10    -15.02746011    -15.02746007  4.23E-08
  11     -8.82408941     -8.82408940  1.32E-08
  12     -7.01809223     -7.01809220  2.35E-08
  13     -3.86617516     -3.86617513  3.00E-08
  14     -0.36654337     -0.36654335  1.55E-08
  15     -1.32597631     -1.32597632  1.11E-08
  16     -0.82253797     -0.82253797  2.62E-09
  17     -0.14319019     -0.14319018  4.86E-09
  18     -0.13094786     -0.13094786  5.40E-10
build/gfortran_565E65E7876A06C6/test/test_dft_schroed_fast  0.03s user 0.00s system 89% cpu 0.037 total
$ time build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220429767083260     
 SCF iteration:           8
 SCF convergence error:   5.9353570468374528E-003
 SCF iteration:           9
 SCF convergence error:   1.8809300891007297E-003
 SCF iteration:          10
 SCF convergence error:   1.0439264224260114E-004
 SCF iteration:          11
 SCF convergence error:   3.1819967261981219E-005
 SCF iteration:          12
 SCF convergence error:   1.1597509001148865E-005
 SCF iteration:          13
 SCF convergence error:   1.4384913811227307E-006
 SCF iteration:          14
 SCF convergence error:   1.2588679965119809E-006
 SCF iteration:          15
 SCF convergence error:   1.3697535905521363E-007
 SCF iteration:          16
 SCF convergence error:   1.8208083929494023E-008
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232560 -28001.13232549  1.17E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902054  -4223.41902046  8.12E-08
   2   -789.48978235   -789.48978233  2.00E-08
   3   -761.37447600   -761.37447597  2.56E-08
   4   -622.84809459   -622.84809456  2.05E-08
   5   -199.42980565   -199.42980564  5.06E-09
   6   -186.66371313   -186.66371312  7.93E-09
   7   -154.70102668   -154.70102667  4.47E-09
   8   -134.54118030   -134.54118029  8.25E-09
   9   -128.01665739   -128.01665738  7.53E-09
  10    -50.78894806    -50.78894806  4.06E-09
  11    -45.03717128    -45.03717129  3.42E-09
  12    -36.68861048    -36.68861049  4.16E-09
  13    -27.52930624    -27.52930624  3.80E-09
  14    -25.98542890    -25.98542891  3.84E-09
  15    -13.88951423    -13.88951423  4.44E-09
  16    -13.48546969    -13.48546969  4.49E-09
  17    -11.29558710    -11.29558710  1.76E-09
  18     -9.05796425     -9.05796425  1.16E-09
  19     -7.06929563     -7.06929563  4.20E-12
  20     -3.79741623     -3.79741623  1.40E-09
  21     -3.50121719     -3.50121718  1.86E-09
  22     -0.14678839     -0.14678838  5.78E-09
  23     -0.11604717     -0.11604717  5.88E-09
  24     -1.74803996     -1.74803995  7.41E-09
  25     -1.10111901     -1.10111900  7.85E-09
  26     -0.77578419     -0.77578418  7.87E-09
  27     -0.10304082     -0.10304082  5.31E-09
  28     -0.08480203     -0.08480202  4.84E-09
  29     -0.16094729     -0.16094728  3.27E-09
build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast  0.78s user 0.04s system 100% cpu 0.806 total

Now apply the following patch:

$ git diff
diff --git a/src/dirac.f90 b/src/dirac.f90
index 0fc99c0..957aeb7 100644
--- a/src/dirac.f90
+++ b/src/dirac.f90
@@ -234,7 +234,7 @@ contains
     real(dp) :: E_dirac_shift
     integer :: idx
     logical :: accurate_eigensolver
-    accurate_eigensolver = .true.
+    accurate_eigensolver = .false.
     iter = iter + 1
     print *, "SCF iteration:", iter
     Vin = reshape(x, shape(Vin))

And

$ time build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220430676850810     
 SCF iteration:           8
 SCF convergence error:   5.9354385557526257E-003
 SCF iteration:           9
 SCF convergence error:   1.8807012675097212E-003
 SCF iteration:          10
 SCF convergence error:   1.0478643525857478E-004
 SCF iteration:          11
 SCF convergence error:   3.1437355573871173E-005
 SCF iteration:          12
 SCF convergence error:   1.1697793524945155E-005
 SCF iteration:          13
 SCF convergence error:   1.4564047887688503E-006
 SCF iteration:          14
 SCF convergence error:   1.1532956705195829E-006
 SCF iteration:          15
 SCF convergence error:   4.6789864427410066E-007
 SCF iteration:          16
 SCF convergence error:   4.9017762648873031E-007
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232497 -28001.13232549  5.13E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902063  -4223.41902046  1.75E-07
   2   -789.48978203   -789.48978233  2.98E-07
   3   -761.37447602   -761.37447597  4.29E-08
   4   -622.84809451   -622.84809456  5.05E-08
   5   -199.42980565   -199.42980564  6.11E-09
   6   -186.66371313   -186.66371312  3.46E-09
   7   -154.70102667   -154.70102667  2.87E-09
   8   -134.54118030   -134.54118029  5.74E-09
   9   -128.01665739   -128.01665738  5.19E-09
  10    -50.78894806    -50.78894806  2.26E-10
  11    -45.03717128    -45.03717129  5.89E-09
  12    -36.68861048    -36.68861049  4.33E-09
  13    -27.52930624    -27.52930624  4.18E-09
  14    -25.98542890    -25.98542891  4.04E-09
  15    -13.88951423    -13.88951423  4.37E-09
  16    -13.48546969    -13.48546969  3.98E-09
  17    -11.29558710    -11.29558710  1.05E-09
  18     -9.05796425     -9.05796425  3.44E-09
  19     -7.06929563     -7.06929563  1.40E-09
  20     -3.79741623     -3.79741623  3.13E-10
  21     -3.50121719     -3.50121718  1.96E-09
  22     -0.14678839     -0.14678838  4.34E-09
  23     -0.11604717     -0.11604717  5.56E-09
  24     -1.74803996     -1.74803995  6.32E-09
  25     -1.10111901     -1.10111900  6.28E-09
  26     -0.77578419     -0.77578418  6.33E-09
  27     -0.10304082     -0.10304082  4.15E-09
  28     -0.08480203     -0.08480202  3.90E-09
  29     -0.16094728     -0.16094728  2.24E-09
build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast  0.54s user 0.03s system 100% cpu 0.565 total
@HaoZeke
Copy link
Contributor

HaoZeke commented Oct 18, 2023

Machine

image

Also:

Build type: native build
Project name: featom
Project version: 0.1.0
Fortran compiler for the host machine: gfortran (gcc 12.3.0 "GNU Fortran (conda-forge gcc 12.3.0-0) 12.3.0")
Fortran linker for the host machine: gfortran ld.bfd 2.40
Host machine cpu family: x86_64
Host machine cpu: x86_64
Found pkg-config: /home/rgoswami/micromamba/envs/fe/bin/pkg-config (0.29.2)
Run-time dependency lapack found: YES 3.9.0

meson build commands

FFLAGS='-ffast-math -march=native' meson setup bbdir --buildtype="release" -Dwith_tests=True
meson compile -C bbdir
time ./bbdir/testDftSchroedFast

DFT Schroedinger

./bbdir/testDftSchroedFast  0.02s user 0.00s system 98% cpu 0.021 total

DFT Dirac

These are based on #17 with the patch for the "accurate eigensolver".

# Lapack 3.9.0
./bbdir/testDftDiracFast  0.95s user 0.01s system 99% cpu 0.955 total
# mkl-dynamic-lp64-seq 2023.2
./bbdir/testDftDiracFast  0.67s user 0.01s system 99% cpu 0.679 total

Intel ifort

micromamba install -c hcc ifort_linux-64
FC=$(which ifort) FFLAGS="-O3 -xHost -ipo -no-prec-div -fp-model fast=2" meson setup bbdir -Dwith_tests=True --buildtype="release"

...

Fortran compiler for the host machine: /home/rgoswami/micromamba/envs/fe/bin/ifort (intel 2021.6.0 "ifort (IFORT) 2021.6.0 20220226")
Fortran linker for the host machine: /home/rgoswami/micromamba/envs/fe/bin/ifort ld.bfd 2.40
Host machine cpu family: x86_64
Host machine cpu: x86_64
Found pkg-config: /home/rgoswami/micromamba/envs/fe/bin/pkg-config (0.29.2)
Run-time dependency mkl-dynamic-lp64-seq found: YES 2023.2

./bbdir/testDftDiracFast  0.51s user 0.01s system 99% cpu 0.525 total

Which corresponds to:
image

@certik
Copy link
Contributor Author

certik commented Oct 18, 2023

To use the Accelerate framework on macOS, one can use:

fpm test --profile=release --flag "-ffast-math -march=native -framework Accelerate" test_dft_dirac_fast --verbose

But I am getting similar timing:

$ time build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast
 SCF iteration:           1
[...]
  28     -0.08480203     -0.08480202  3.90E-09
  29     -0.16094728     -0.16094728  2.24E-09
build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast  0.54s user 0.03s system 100% cpu 0.565 total

It seems that on macOS even just linking -lblas and -llapack links against Accelerate by default.

@certik
Copy link
Contributor Author

certik commented Oct 18, 2023

The dimension is about 240x240 for Dirac, and we only need 7 eigenvalues. Let's use a lapack interface that can return just 7, or use some custom eigensolver that can do it.

@certik
Copy link
Contributor Author

certik commented Oct 20, 2023

With #18 I get:

$ time build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220418857676123     
 SCF iteration:           8
 SCF convergence error:   5.9357534701121040E-003
 SCF iteration:           9
 SCF convergence error:   1.8806871048582252E-003
 SCF iteration:          10
 SCF convergence error:   1.0455984192958567E-004
 SCF iteration:          11
 SCF convergence error:   3.2101015676744282E-005
 SCF iteration:          12
 SCF convergence error:   1.1781998182414100E-005
 SCF iteration:          13
 SCF convergence error:   1.2883829185739160E-006
 SCF iteration:          14
 SCF convergence error:   8.7908847490325570E-007
 SCF iteration:          15
 SCF convergence error:   1.6529884305782616E-007
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232635 -28001.13232549  8.65E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902075  -4223.41902046  2.94E-07
   2   -789.48978246   -789.48978233  1.35E-07
   3   -761.37447601   -761.37447597  3.29E-08
   4   -622.84809461   -622.84809456  4.34E-08
   5   -199.42980567   -199.42980564  2.36E-08
   6   -186.66371312   -186.66371312  6.40E-09
   7   -154.70102667   -154.70102667  2.26E-09
   8   -134.54118029   -134.54118029  2.11E-09
   9   -128.01665738   -128.01665738  8.76E-10
  10    -50.78894805    -50.78894806  1.06E-08
  11    -45.03717127    -45.03717129  1.59E-08
  12    -36.68861047    -36.68861049  1.22E-08
  13    -27.52930623    -27.52930624  1.41E-08
  14    -25.98542889    -25.98542891  1.51E-08
  15    -13.88951422    -13.88951423  1.64E-08
  16    -13.48546968    -13.48546969  1.61E-08
  17    -11.29558710    -11.29558710  4.23E-10
  18     -9.05796425     -9.05796425  7.23E-10
  19     -7.06929564     -7.06929563  4.66E-09
  20     -3.79741624     -3.79741623  8.66E-09
  21     -3.50121719     -3.50121718  7.14E-09
  22     -0.14678840     -0.14678838  1.89E-08
  23     -0.11604718     -0.11604717  1.83E-08
  24     -1.74803998     -1.74803995  2.22E-08
  25     -1.10111902     -1.10111900  2.19E-08
  26     -0.77578420     -0.77578418  2.18E-08
  27     -0.10304083     -0.10304082  1.50E-08
  28     -0.08480204     -0.08480202  1.37E-08
  29     -0.16094729     -0.16094728  9.98E-09
build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast  0.43s user 0.01s system 99% cpu 0.437 total

@HaoZeke
Copy link
Contributor

HaoZeke commented Oct 23, 2023

❯ vtune -report hotspots -r r000hs -group-by module
vtune: Using result path `/home/rgoswami/Git/Github/Fortran/featom/r000hs'
vtune: Executing actions 75 % Generating a report                              Module            CPU Time  CPU Time:Effective Time  CPU Time:Spin Time  CPU Time:Overhead Time  Module Path                                                      
----------------  --------  -----------------------  ------------------  ----------------------  -----------------------------------------------------------------
libmkl_core.so.2    0.230s                   0.230s                  0s                      0s  /home/rgoswami/micromamba/envs/fe/lib/libmkl_core.so.2           
libfeatom.so        0.120s                   0.120s                  0s                      0s  /home/rgoswami/Git/Github/Fortran/featom/bbdir/src/libfeatom.so  
libc.so.6           0.010s                   0.010s                  0s                      0s  /usr/lib/libc.so.6                                               
libc++abi.so        0.010s                   0.010s                  0s                      0s  /opt/intel/oneapi/vtune/2023.2.0/lib64/pinruntime/libc++abi.so   
libc-dynamic.so     0.010s                   0.010s                  0s                      0s  /opt/intel/oneapi/vtune/2023.2.0/lib64/pinruntime/libc-dynamic.so
testDftDiracFast    0.010s                   0.010s                  0s                      0s  /home/rgoswami/Git/Github/Fortran/featom/bbdir/testDftDiracFast  
vtune: Executing actions 100 % done                                            
❯ vtune -report hotspots -r r000hs
vtune: Using result path `/home/rgoswami/Git/Github/Fortran/featom/r000hs'
vtune: Executing actions 75 % Generating a report                              Function                  CPU Time  CPU Time:Effective Time  CPU Time:Spin Time  CPU Time:Overhead Time  Module            Function (Full)              Source File  Start Address
------------------------  --------  -----------------------  ------------------  ----------------------  ----------------  ---------------------------  -----------  -------------
[MKL LAPACK]@dsyevx         0.150s                   0.150s                  0s                      0s  libmkl_core.so.2  mkl_lapack_dsyevx            [Unknown]    0x9c6470     
assemble_radial_dirac_sh    0.060s                   0.060s                  0s                      0s  libfeatom.so      assemble_radial_dirac_sh     fe.f90       0x16a70      
[MKL LAPACK]@dsygst         0.050s                   0.050s                  0s                      0s  libmkl_core.so.2  mkl_lapack_dsygst            [Unknown]    0x9c8230     
phih                        0.040s                   0.040s                  0s                      0s  libfeatom.so      phih                         feutils.f90  0x47f93      
dphih                       0.020s                   0.020s                  0s                      0s  libfeatom.so      dphih                        feutils.f90  0x50278      
MKL_Load_Lib_Ex             0.020s                   0.020s                  0s                      0s  libmkl_core.so.2  MKL_Load_Lib_Ex              [Unknown]    0x21ca50     
free                        0.010s                   0.010s                  0s                      0s  libc.so.6         free                         [Unknown]    0x9d2e0      
[MKL LAPACK]@xdgetrf        0.010s                   0.010s                  0s                      0s  libmkl_core.so.2  mkl_lapack_xdgetrf           [Unknown]    0xd8c050     
memmove                     0.010s                   0.010s                  0s                      0s  libc-dynamic.so   memmove                      [Unknown]    0x69e30      
operator new                0.010s                   0.010s                  0s                      0s  libc++abi.so      operator new(unsigned long)  [Unknown]    0x25000      
__intel_avx_rep_memcpy      0.010s                   0.010s                  0s                      0s  testDftDiracFast  __intel_avx_rep_memcpy       [Unknown]    0x4ac280     
vtune: Executing actions 100 % done                                            
❯ vtune -R callstacks -r r000hs -group-by callstack
vtune: Using result path `/home/rgoswami/Git/Github/Fortran/featom/r000hs'
vtune: Executing actions 75 % Generating a report                              Function/Function Stack        CPU Time  Module                  Function (Full)                Source File              Start Address
-----------------------------  --------  ----------------------  -----------------------------  -----------------------  -------------
[MKL LAPACK]@dsyevx              0.140s  libmkl_core.so.2        mkl_lapack_dsyevx              [Unknown]                0x9c6470     
dsyevx_                              0s  libmkl_intel_lp64.so.2  dsyevx_                        [Unknown]                0x705190     
solve_eig_irange                     0s  libfeatom.so            solve_eig_irange               solvers.f90              0x4a940      
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
assemble_radial_dirac_sh         0.060s  libfeatom.so            assemble_radial_dirac_sh       fe.f90                   0x16a70      
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
phih                             0.040s  libfeatom.so            phih                           feutils.f90              0x47f93      
fe2quad                              0s  libfeatom.so            fe2quad                        feutils.f90              0x47f00      
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
[MKL LAPACK]@dsygst              0.030s  libmkl_core.so.2        mkl_lapack_dsygst              [Unknown]                0x9c8230     
DSYGST                               0s  libmkl_intel_lp64.so.2  DSYGST                         [Unknown]                0x706480     
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
dphih                            0.020s  libfeatom.so            dphih                          feutils.f90              0x50278      
assemble_poisson_gj                  0s  libfeatom.so            assemble_poisson_gj            hartree_screening.f90    0x4fe70      
hartree_potential_gj                 0s  libfeatom.so            hartree_potential_gj           hartree_screening.f90    0x4d130      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
MKL_Load_Lib_Ex                  0.020s  libmkl_core.so.2        MKL_Load_Lib_Ex                [Unknown]                0x21ca50     
__mkl_cpu_detect_and_load_dll        0s  libmkl_core.so.2        __mkl_cpu_detect_and_load_dll  [Unknown]                0x21be50     
[MKL LAPACK]@dsteqr                  0s  libmkl_core.so.2        mkl_lapack_dsteqr              [Unknown]                0x9ba900     
DSTEQR                               0s  libmkl_intel_lp64.so.2  DSTEQR                         [Unknown]                0x6ff3d0     
gauss_jacobi_gw                      0s  libgjp_gw.so            gauss_jacobi_gw                gjp_gw_single.f90        0x64e0       
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
[MKL LAPACK]@dsygst              0.020s  libmkl_core.so.2        mkl_lapack_dsygst              [Unknown]                0x9c8230     
DSYGST                               0s  libmkl_intel_lp64.so.2  DSYGST                         [Unknown]                0x706480     
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
[Unknown stack frame(s)]             0s  [Unknown]               [Unknown stack frame(s)]       [Unknown]                0            
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
free                             0.010s  libc.so.6               free                           [Unknown]                0x9d2e0      
for_dealloc_allocatable              0s  testDftDiracFast        for_dealloc_allocatable        [Unknown]                0x439650     
inv                                  0s  libfeatom.so            inv                            linalg.f90               0x1ea50      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
[MKL LAPACK]@xdgetrf             0.010s  libmkl_core.so.2        mkl_lapack_xdgetrf             [Unknown]                0xd8c050     
DGETRF                               0s  libmkl_intel_lp64.so.2  DGETRF                         [Unknown]                0x623590     
inv                                  0s  libfeatom.so            inv                            linalg.f90               0x1ea50      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
memmove                          0.010s  libc-dynamic.so         memmove                        [Unknown]                0x69e30      
memcpy                               0s  libc-dynamic.so         memcpy                         [Unknown]                0x69d30      
MKL_Load_Lib_Ex                      0s  libmkl_core.so.2        MKL_Load_Lib_Ex                [Unknown]                0x21ca50     
__mkl_cpu_detect_and_load_dll        0s  libmkl_core.so.2        __mkl_cpu_detect_and_load_dll  [Unknown]                0x21be50     
[MKL LAPACK]@dsteqr                  0s  libmkl_core.so.2        mkl_lapack_dsteqr              [Unknown]                0x9ba900     
DSTEQR                               0s  libmkl_intel_lp64.so.2  DSTEQR                         [Unknown]                0x6ff3d0     
gauss_jacobi_gw                      0s  libgjp_gw.so            gauss_jacobi_gw                gjp_gw_single.f90        0x64e0       
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
operator new                     0.010s  libc++abi.so            operator new(unsigned long)    [Unknown]                0x25000      
MKL_Load_Lib_Ex                      0s  libmkl_core.so.2        MKL_Load_Lib_Ex                [Unknown]                0x21ca50     
__mkl_cpu_detect_and_load_dll        0s  libmkl_core.so.2        __mkl_cpu_detect_and_load_dll  [Unknown]                0x21be50     
[MKL LAPACK]@dsteqr                  0s  libmkl_core.so.2        mkl_lapack_dsteqr              [Unknown]                0x9ba900     
DSTEQR                               0s  libmkl_intel_lp64.so.2  DSTEQR                         [Unknown]                0x6ff3d0     
gauss_jacobi_gw                      0s  libgjp_gw.so            gauss_jacobi_gw                gjp_gw_single.f90        0x64e0       
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
[MKL LAPACK]@dsyevx              0.010s  libmkl_core.so.2        mkl_lapack_dsyevx              [Unknown]                0x9c6470     
dsyevx_                              0s  libmkl_intel_lp64.so.2  dsyevx_                        [Unknown]                0x705190     
solve_eig_irange                     0s  libfeatom.so            solve_eig_irange               solvers.f90              0x4a940      
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
[Unknown stack frame(s)]             0s  [Unknown]               [Unknown stack frame(s)]       [Unknown]                0            
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
__intel_avx_rep_memcpy           0.010s  testDftDiracFast        __intel_avx_rep_memcpy         [Unknown]                0x4ac280     
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            
                                                                                                                                      
vtune: Executing actions 100 % done                                      

@certik
Copy link
Contributor Author

certik commented Oct 25, 2023

With the latest commit I get:

$ time build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220424374379945     
 SCF iteration:           8
 SCF convergence error:   5.9356682395446114E-003
 SCF iteration:           9
 SCF convergence error:   1.8809154171322007E-003
 SCF iteration:          10
 SCF convergence error:   1.0443680002936162E-004
 SCF iteration:          11
 SCF convergence error:   3.1862826290307567E-005
 SCF iteration:          12
 SCF convergence error:   1.1619773431448266E-005
 SCF iteration:          13
 SCF convergence error:   1.4270481187850237E-006
 SCF iteration:          14
 SCF convergence error:   1.3223652786109596E-006
 SCF iteration:          15
 SCF convergence error:   2.4143082555383444E-007
 SCF iteration:          16
 SCF convergence error:   5.5670170695520937E-008
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232562 -28001.13232549  1.35E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902055  -4223.41902046  9.11E-08
   2   -789.48978235   -789.48978233  2.46E-08
   3   -761.37447600   -761.37447597  3.10E-08
   4   -622.84809459   -622.84809456  2.44E-08
   5   -199.42980565   -199.42980564  6.82E-09
   6   -186.66371313   -186.66371312  9.53E-09
   7   -154.70102668   -154.70102667  5.38E-09
   8   -134.54118030   -134.54118029  9.35E-09
   9   -128.01665739   -128.01665738  8.53E-09
  10    -50.78894806    -50.78894806  3.60E-09
  11    -45.03717129    -45.03717129  1.41E-09
  12    -36.68861048    -36.68861049  3.75E-09
  13    -27.52930624    -27.52930624  2.23E-09
  14    -25.98542890    -25.98542891  2.56E-09
  15    -13.88951423    -13.88951423  3.29E-09
  16    -13.48546969    -13.48546969  2.25E-09
  17    -11.29558710    -11.29558710  6.12E-10
  18     -9.05796425     -9.05796425  2.79E-10
  19     -7.06929564     -7.06929563  8.73E-10
  20     -3.79741623     -3.79741623  2.07E-09
  21     -3.50121719     -3.50121718  2.69E-09
  22     -0.14678839     -0.14678838  6.58E-09
  23     -0.11604717     -0.11604717  6.89E-09
  24     -1.74803996     -1.74803995  8.23E-09
  25     -1.10111901     -1.10111900  8.43E-09
  26     -0.77578419     -0.77578418  9.08E-09
  27     -0.10304082     -0.10304082  6.28E-09
  28     -0.08480203     -0.08480202  6.48E-09
  29     -0.16094729     -0.16094728  4.73E-09
build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast  0.49s user 0.10s system 117% cpu 0.497 total

@certik
Copy link
Contributor Author

certik commented Oct 25, 2023

featom using 310aeb8:

$ fpm test --profile=release --flag "-ffast-math -march=native -framework Accelerate " test_dft_dirac_fast --verbose
$ time build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220418857676123     
 SCF iteration:           8
 SCF convergence error:   5.9357534701121040E-003
 SCF iteration:           9
 SCF convergence error:   1.8806871048582252E-003
 SCF iteration:          10
 SCF convergence error:   1.0455984192958567E-004
 SCF iteration:          11
 SCF convergence error:   3.2101015676744282E-005
 SCF iteration:          12
 SCF convergence error:   1.1781998182414100E-005
 SCF iteration:          13
 SCF convergence error:   1.2883829185739160E-006
 SCF iteration:          14
 SCF convergence error:   8.7908847490325570E-007
 SCF iteration:          15
 SCF convergence error:   1.6529884305782616E-007
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232635 -28001.13232549  8.65E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902075  -4223.41902046  2.94E-07
   2   -789.48978246   -789.48978233  1.35E-07
   3   -761.37447601   -761.37447597  3.29E-08
   4   -622.84809461   -622.84809456  4.34E-08
   5   -199.42980567   -199.42980564  2.36E-08
   6   -186.66371312   -186.66371312  6.40E-09
   7   -154.70102667   -154.70102667  2.26E-09
   8   -134.54118029   -134.54118029  2.11E-09
   9   -128.01665738   -128.01665738  8.76E-10
  10    -50.78894805    -50.78894806  1.06E-08
  11    -45.03717127    -45.03717129  1.59E-08
  12    -36.68861047    -36.68861049  1.22E-08
  13    -27.52930623    -27.52930624  1.41E-08
  14    -25.98542889    -25.98542891  1.51E-08
  15    -13.88951422    -13.88951423  1.64E-08
  16    -13.48546968    -13.48546969  1.61E-08
  17    -11.29558710    -11.29558710  4.23E-10
  18     -9.05796425     -9.05796425  7.23E-10
  19     -7.06929564     -7.06929563  4.66E-09
  20     -3.79741624     -3.79741623  8.66E-09
  21     -3.50121719     -3.50121718  7.14E-09
  22     -0.14678840     -0.14678838  1.89E-08
  23     -0.11604718     -0.11604717  1.83E-08
  24     -1.74803998     -1.74803995  2.22E-08
  25     -1.10111902     -1.10111900  2.19E-08
  26     -0.77578420     -0.77578418  2.18E-08
  27     -0.10304083     -0.10304082  1.50E-08
  28     -0.08480204     -0.08480202  1.37E-08
  29     -0.16094729     -0.16094728  9.98E-09
build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast  0.41s user 0.01s system 99% cpu 0.420 total

dftatom:

$ time ./tests/atom_U/uraninum_rlda
 Test eps:   1.1999999999999999E-006
 Z=          92
 N=        5269
E_tot= -28001.13232639 E_tot_exact= -28001.13232549 error:  9.00E-07
 state    E            E_exact          error     occupancy
1s   -4223.41902044  -4223.41902046 -1.83E-08    2.000
2s    -789.48978232   -789.48978233 -1.16E-08    2.000
2p    -761.37447596   -761.37447597 -1.35E-08    2.000
2p    -622.84809453   -622.84809456 -3.60E-08    4.000
3s    -199.42980566   -199.42980564  1.07E-08    2.000
3p    -186.66371314   -186.66371312  1.15E-08    2.000
3p    -154.70102665   -154.70102667 -2.03E-08    4.000
3d    -134.54118027   -134.54118029 -1.93E-08    4.000
3d    -128.01665735   -128.01665738 -3.18E-08    6.000
4s     -50.78894808    -50.78894806  1.89E-08    2.000
4p     -45.03717131    -45.03717129  1.98E-08    2.000
4p     -36.68861048    -36.68861049 -4.93E-09    4.000
4d     -27.52930624    -27.52930624 -3.22E-09    4.000
4d     -25.98542889    -25.98542891 -1.85E-08    6.000
4f     -13.88951422    -13.88951423 -1.70E-08    6.000
4f     -13.48546967    -13.48546969 -2.00E-08    8.000
5s     -11.29558711    -11.29558710  1.37E-08    2.000
5p      -9.05796426     -9.05796425  1.32E-08    2.000
5p      -7.06929564     -7.06929563  7.97E-10    4.000
5d      -3.79741623     -3.79741623  1.17E-09    4.000
5d      -3.50121718     -3.50121718 -5.77E-09    6.000
5f      -0.14678838     -0.14678838 -2.39E-09    1.286
5f      -0.11604716     -0.11604717 -3.17E-09    1.714
6s      -1.74803996     -1.74803995  5.41E-09    2.000
6p      -1.10111900     -1.10111900  4.31E-09    2.000
6p      -0.77578418     -0.77578418  8.61E-10    4.000
6d      -0.10304082     -0.10304082  3.74E-10    0.400
6d      -0.08480202     -0.08480202 -2.54E-10    0.600
7s      -0.16094728     -0.16094728  1.06E-09    2.000
./tests/atom_U/uraninum_rlda  0.27s user 0.01s system 98% cpu 0.277 total

@certik
Copy link
Contributor Author

certik commented Oct 25, 2023

With cedfa6a

$ time build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220418857676123     
 SCF iteration:           8
 SCF convergence error:   5.9357534701121040E-003
 SCF iteration:           9
 SCF convergence error:   1.8806871048582252E-003
 SCF iteration:          10
 SCF convergence error:   1.0455984192958567E-004
 SCF iteration:          11
 SCF convergence error:   3.2101015676744282E-005
 SCF iteration:          12
 SCF convergence error:   1.1781998182414100E-005
 SCF iteration:          13
 SCF convergence error:   1.2883829185739160E-006
 SCF iteration:          14
 SCF convergence error:   8.7908847490325570E-007
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232613 -28001.13232549  6.45E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902078  -4223.41902046  3.21E-07
   2   -789.48978230   -789.48978233  3.07E-08
   3   -761.37447596   -761.37447597  1.65E-08
   4   -622.84809453   -622.84809456  3.38E-08
   5   -199.42980561   -199.42980564  3.01E-08
   6   -186.66371306   -186.66371312  6.59E-08
   7   -154.70102661   -154.70102667  6.81E-08
   8   -134.54118022   -134.54118029  6.72E-08
   9   -128.01665731   -128.01665738  7.02E-08
  10    -50.78894802    -50.78894806  4.95E-08
  11    -45.03717123    -45.03717129  5.97E-08
  12    -36.68861043    -36.68861049  5.96E-08
  13    -27.52930618    -27.52930624  5.95E-08
  14    -25.98542885    -25.98542891  5.75E-08
  15    -13.88951417    -13.88951423  6.03E-08
  16    -13.48546963    -13.48546969  6.33E-08
  17    -11.29558706    -11.29558710  4.17E-08
  18     -9.05796421     -9.05796425  4.43E-08
  19     -7.06929559     -7.06929563  4.12E-08
  20     -3.79741619     -3.79741623  3.41E-08
  21     -3.50121715     -3.50121718  3.41E-08
  22     -0.14678836     -0.14678838  2.99E-08
  23     -0.11604714     -0.11604717  2.92E-08
  24     -1.74803993     -1.74803995  2.87E-08
  25     -1.10111897     -1.10111900  3.12E-08
  26     -0.77578414     -0.77578418  3.45E-08
  27     -0.10304078     -0.10304082  3.04E-08
  28     -0.08480199     -0.08480202  3.05E-08
  29     -0.16094726     -0.16094728  2.65E-08
build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast  0.40s user 0.01s system 99% cpu 0.404 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants