Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fails with gfortran 11.1.0 #9468

Closed
marcfehling opened this issue Apr 29, 2021 · 49 comments
Closed

Build fails with gfortran 11.1.0 #9468

marcfehling opened this issue Apr 29, 2021 · 49 comments
Assignees

Comments

@marcfehling
Copy link
Collaborator

I've been trying to build FDS with the latest GNU Fortran compiler 11.1.0 and the latest release of OpenMPI 4.4.1.

However, if building FDS with this configuration either on the latest release 6.7.5 or the master branch, compilation fails with different errors as follows:

  • 6.7.5
Building mpi_gnu_linux_64
[...]
mpifort -c -m64 -O2 -std=f2008 -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP=\"-\" -DGITDATE_PP=\""\"" -DBUILDDATE_PP=\""Apr 29, 2021  17:08:57\"" -DCOMPVER_PP=\""Gnu gfortran 11.1.0"\"   -fopenmp ../../Source/main.f90
../../Source/main.f90:2726:61:

  359 |       CALL MESH_EXCHANGE(2) ! Exchange radiation intensity at interpolated boundaries
      |                                                                                     2
......
 2726 |             IF (ICYC>1) ANG_INC_COUNTER = M%ANGLE_INC_COUNTER
      |                                                             1
Error: Index variable ‘ang_inc_counter’ redefined at (1) in procedure ‘mesh_exchange’ called from within DO loop at (2)
../../Source/main.f90:3103:59:

  359 |       CALL MESH_EXCHANGE(2) ! Exchange radiation intensity at interpolated boundaries
      |                                                                                     2
......
 3103 |          IF (ICYC>1) ANG_INC_COUNTER = M4%ANGLE_INC_COUNTER
      |                                                           1
Error: Index variable ‘ang_inc_counter’ redefined at (1) in procedure ‘mesh_exchange’ called from within DO loop at (2)
../../Source/main.f90:2726:61:

  806 |             CALL MESH_EXCHANGE(2)
      |                                 2                            
......
 2726 |             IF (ICYC>1) ANG_INC_COUNTER = M%ANGLE_INC_COUNTER
      |                                                             1
Error: Index variable ‘ang_inc_counter’ redefined at (1) in procedure ‘mesh_exchange’ called from within DO loop at (2)
../../Source/main.f90:3103:59:

  806 |             CALL MESH_EXCHANGE(2)
      |                                 2                          
......
 3103 |          IF (ICYC>1) ANG_INC_COUNTER = M4%ANGLE_INC_COUNTER
      |                                                           1
Error: Index variable ‘ang_inc_counter’ redefined at (1) in procedure ‘mesh_exchange’ called from within DO loop at (2)
../../Source/main.f90:2726:61:

  873 |          CALL MESH_EXCHANGE(2)
      |                              2                               
......
 2726 |             IF (ICYC>1) ANG_INC_COUNTER = M%ANGLE_INC_COUNTER
      |                                                             1
Error: Index variable ‘ang_inc_counter’ redefined at (1) in procedure ‘mesh_exchange’ called from within DO loop at (2)
../../Source/main.f90:3103:59:

  873 |          CALL MESH_EXCHANGE(2)
      |                              2                             
......
 3103 |          IF (ICYC>1) ANG_INC_COUNTER = M4%ANGLE_INC_COUNTER
      |                                                           1
Error: Index variable ‘ang_inc_counter’ redefined at (1) in procedure ‘mesh_exchange’ called from within DO loop at (2)
make: *** [main.o] Error 1
  • master
Building mpi_gnu_linux_64
mpifort -c -m64 -O2 -std=f2008 -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP=\"FDS6.7.5-1440-g9ca15b0-master\" -DGITDATE_PP=\""Thu Apr 29 18:57:09 2021 -0400\"" -DBUILDDATE_PP=\""Apr 29, 2021  17:17:33\"" -DCOMPVER_PP=\""Gnu gfortran 11.1.0"\"   ../../Source/prec.f90
../../Source/prec.f90:5:16:

    5 | IMPLICIT NONE (TYPE,EXTERNAL)
      |                1
Error: Fortran 2018: IMPORT NONE with spec list at (1)
make: *** [prec.o] Error 1
@marcfehling
Copy link
Collaborator Author

What is the last known release of the GNU compiler that successfully builds FDS? Maybe it's just an issue with my machine. I'd be willing to try other versions.

@sbenkorichi
Copy link
Collaborator

10 should work, not that long ago I used it. Also there is a wiki page on how to compile.
I'm not sure why you have an issue with 11 release, is it official? Can you provide more details of your distribution system etc

@marcfehling
Copy link
Collaborator Author

10 should work, not that long ago I used it. Also there is a wiki page on how to compile.

Thanks. I will try that one out.

I'm not sure why you have an issue with 11 release, is it official? Can you provide more details of your distribution system etc

GCC 11 is an official release (fresh from two days ago), see https://gcc.gnu.org/.

I've just built gcc 11 along with openmpi manually on our server, which otherwise only provides gcc 4.8.5 via the OS. As far as I know FDS requires at least gcc 9 now. We use CentOS 7.

@marcfehling
Copy link
Collaborator Author

marcfehling commented Apr 30, 2021

I can build FDS 6.7.5 with gfortran 10.3.0 successfully!

But I still get the same error when building the master branch. This seems to be a separate issue introduced with #9361.

@mcgratta mcgratta self-assigned this Apr 30, 2021
@mcgratta
Copy link
Contributor

We have introduced coding conventions that are compatible with the Fortran 2018 standard. This line

IMPLICIT NONE (TYPE,EXTERNAL)

instructs the compile not to assume anything about the data types or external routines. We also started adhering to the MPI 2008 Fortran binding conventions.

You may want to try the new Intel oneAPI compilers. They are now free, and can be installed on all platforms.

@marcosvanella
Copy link
Contributor

I updated the -std flag for gnu targets to 2018. That seems to take care of the IMPLICIT NONE (TYPE,EXTERNAL) error.

@marcfehling
Copy link
Collaborator Author

marcfehling commented May 4, 2021

I updated the -std flag for gnu targets to 2018. That seems to take care of the IMPLICIT NONE (TYPE,EXTERNAL) error.

I can confirm that your change fixes the IMPLICIT NONE (TYPE,EXTERNAL) issue with gfortran 10.3.0. Thank you!

So fds can be built with gfortran 10.3.0 again.


I updated the -std flag as you did in my local copy of the 6.7.5 release and compiled with gfortran 11.1.0 again. I again get the MESH_EXCHANGE() -- ANGLE_INC_COUNTER error.


Further, I tried to compile the current master branch with gfortran 11.1.0 again and the error message changed. I'm greeted with an internal compiler error this time...

mpifort -c -m64 -O2 -std=f2018 -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP=\"FDS6.7.5-1457-g77a76ca-master\" -DGITDATE_PP=\""Mon May 3 16:52:28 2021 -0400\"" -DBUILDDATE_PP=\""May 03, 2021  18:35:16\"" -DCOMPVER_PP=\""Gnu gfortran 11.1.0"\"   ../../Source/scrc.f90
f951: internal compiler error: Segmentation fault
0xc2977f crash_signal
	/raid/fehling/packages/gcc-11.1.0/gcc/toplev.c:327
0x748ef0 gfc_sym_get_dummy_args(gfc_symbol*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/symbol.c:5256
0x7e5759 doloop_contained_procedure_code
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:2483
0x7ec7f9 gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int (*)(gfc_expr**, int*, void*), void*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:5320
0x7ee1c8 doloop_code
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:2627
0x7ec7f9 gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int (*)(gfc_expr**, int*, void*), void*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:5320
0x7ec90e gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int (*)(gfc_expr**, int*, void*), void*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:5644
0x7ec90e gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int (*)(gfc_expr**, int*, void*), void*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:5644
0x7ec90e gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int (*)(gfc_expr**, int*, void*), void*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:5644
0x7eda3b doloop_warn
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:3059
0x7edf2a gfc_run_passes(gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:156
0x711817 gfc_resolve(gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:17437
0x711817 gfc_resolve(gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:17410
0x711cd9 update_current_proc_array_outer_dependency
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:3123
0x71ccc7 resolve_call
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:3750
0x721d6f gfc_resolve_code(gfc_code*, gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:12041
0x724647 gfc_resolve_blocks(gfc_code*, gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:10842
0x721c36 gfc_resolve_code(gfc_code*, gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:11810
0x726147 resolve_codes
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:17396
0x72607e resolve_codes
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:17379
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make: *** [scrc.o] Error 1

As this error occurs in compiling scrc.f90 before main.f90 (which causes another issue), I assume that this is unrelated to the MESH_EXCHANGE() -- ANGLE_INC_COUNTER error...

@mcgratta
Copy link
Contributor

mcgratta commented May 4, 2021

Yes, I think this is a bug. We are potentially changing a loop index within the loop. I'll fix. I'm surprised our Intel compiler did not pick this up.

mcgratta added a commit to mcgratta/fds that referenced this issue May 4, 2021
mcgratta added a commit that referenced this issue May 4, 2021
FDS Source: Issue #9468. Do not change increment variable within a loop
@mcgratta
Copy link
Contributor

mcgratta commented May 4, 2021

I fixed what I believe was the problem. Can you try compiling again and let me know if it works.

marcfehling pushed a commit to marcfehling/fds that referenced this issue May 5, 2021
@marcfehling
Copy link
Collaborator Author

marcfehling commented May 5, 2021

I checked out the 6.7.5 release and cherry-picked both of your commits c59cb75 and 0a816c0. You'll find my custom branch here. Compilation of this branch succeeds with gfortran 11.1.0.

It appears that the above internal compiler error will be triggered by some commit since the last release.

@mcgratta
Copy link
Contributor

mcgratta commented May 5, 2021

I'm confused. If you checkout the latest source code from firemodels/fds and compile with gfortran 11.1.0, does it succeed? And if not, what is the error message?

@marcfehling
Copy link
Collaborator Author

marcfehling commented May 5, 2021

Compilation on master fails with gfortran 11.1.0 with the internal compiler error from above.
I'll give you the terminal output on an empty folder fds/Build/mpi_gnu_linux_64.

$ ./make_fds.sh 
Building mpi_gnu_linux_64
...
mpifort -c -m64 -O2 -std=f2018 -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP=\"FDS6.7.5-1471-g6fa32e5-master\" -DGITDATE_PP=\""Wed May 5 16:47:21 2021 -0400\"" -DBUILDDATE_PP=\""May 05, 2021  16:22:29\"" -DCOMPVER_PP=\""Gnu gfortran 11.1.0"\"   ../../Source/scrc.f90
f951: internal compiler error: Segmentation fault
0xc2977f crash_signal
	/raid/fehling/packages/gcc-11.1.0/gcc/toplev.c:327
0x748ef0 gfc_sym_get_dummy_args(gfc_symbol*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/symbol.c:5256
0x7e5759 doloop_contained_procedure_code
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:2483
0x7ec7f9 gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int (*)(gfc_expr**, int*, void*), void*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:5320
0x7ee1c8 doloop_code
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:2627
0x7ec7f9 gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int (*)(gfc_expr**, int*, void*), void*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:5320
0x7ec90e gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int (*)(gfc_expr**, int*, void*), void*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:5644
0x7ec90e gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int (*)(gfc_expr**, int*, void*), void*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:5644
0x7ec90e gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int (*)(gfc_expr**, int*, void*), void*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:5644
0x7eda3b doloop_warn
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:3059
0x7edf2a gfc_run_passes(gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/frontend-passes.c:156
0x711817 gfc_resolve(gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:17437
0x711817 gfc_resolve(gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:17410
0x711cd9 update_current_proc_array_outer_dependency
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:3123
0x71ccc7 resolve_call
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:3750
0x721d6f gfc_resolve_code(gfc_code*, gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:12041
0x724647 gfc_resolve_blocks(gfc_code*, gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:10842
0x721c36 gfc_resolve_code(gfc_code*, gfc_namespace*)
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:11810
0x726147 resolve_codes
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:17396
0x72607e resolve_codes
	/raid/fehling/packages/gcc-11.1.0/gcc/fortran/resolve.c:17379
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make: *** [scrc.o] Error 1

@mcgratta
Copy link
Contributor

mcgratta commented May 6, 2021

But you said that the latest code does compile with Gnu 10?

@marcfehling
Copy link
Collaborator Author

marcfehling commented May 6, 2021

But you said that the latest code does compile with Gnu 10?

Yes, the latest code compiles with gfortran 10.


I ran a git bisect session to identify at which commit the internal compiler error with gfortran 11 first appeared. It seems like it appears since 684f900 introduced with #9211. This commit changed about 32k lines in the scrc.f90 file, so it's not feasible to find the exact code snippet that triggers this error. We could submit a full bug report and hope that the error will be fixed in a future gcc release. (I don't have any experience in submitting such a bug report.)

@mcgratta
Copy link
Contributor

mcgratta commented May 6, 2021

Susan -- a change that you made to the code caused the latest version of the Gnu Fortran compiler to fail, but the error message does not indicate why. Could you look through your changes made in 684f900
to see if there is something that might be "legal" but may also cause a compiling problem. Sometimes these kinds of errors point out bad programming practice.

@drjfloyd
Copy link
Contributor

drjfloyd commented May 6, 2021

This commit also undid Randy's commits from the day before, was that intentional?

ff26958#diff-284759de040cf76249bb9a4fa31f70d7e3efd1e6c8bd70a11a3ff708290a89a4

@mcgratta
Copy link
Contributor

mcgratta commented May 6, 2021

I would not think so. This is an on-going issue with Susan not updating enough.

@rmcdermo
Copy link
Contributor

rmcdermo commented May 6, 2021

Jason,
I think we fixed that overwrite. I did notice it when Susan made the commit. See line 6123 in read.f90 of current source:

 6123:    IF (PART_ID/='null' .AND. SPRAY_PATTERN_TABLE == 'null' .AND. PDPA_RADIUS<TWO_EPSILON_EB) THEN

@rmcdermo
Copy link
Contributor

rmcdermo commented May 6, 2021

The issue with overwrites is (was) because of two things:

  1. Susan uses two separate repositories because she likes to have scrc broken up into many different files for fast compiling.
  2. She had been waiting months to make commits. Hopefully, this is behind us.

@mcgratta
Copy link
Contributor

mcgratta commented May 6, 2021 via email

@rmcdermo
Copy link
Contributor

rmcdermo commented May 6, 2021

Susan's overwrite of my commit has been corrected, yes.

@rmcdermo
Copy link
Contributor

rmcdermo commented May 6, 2021

I did not do it via a revert because it would have just added, then subtracted, then added again another 32k lines of code. Here is the commit that fixed things:
128c291

@SusanKilian
Copy link
Collaborator

SusanKilian commented May 6, 2021 via email

@SusanKilian
Copy link
Collaborator

I just had a look, but it seems that the package manager of my Arch Linux installation does not currently update to the new gfortran-11.1.0 compiler. As mentioned in the posts above, the old 10-version compiled the code without any problems, so I currently have no idea what the issue might be. Unfortunately, I won't manage to install the new 11-version in another quick way tonight. As soon as the 2nd day of the German FDS Usergroup meeting is over tomorrow, I will take care of the matter.

@marcfehling
Copy link
Collaborator Author

A fellow Arch Linux user :) You can install the gcc-11 package from the AUR. This will compile it from source which may take a lot of time.

The server I was trying to compile FDS on only had gcc 4 installed, so I built gcc 11 by hand from https://gcc.gnu.org/. You may need to re-build your mpi libraries with this particular compiler (this seems to be a fortran requirement?). It would be reassuring if you could reproduce the error.

@SusanKilian
Copy link
Collaborator

Hello Marc, shortly after I wrote the post, I just noticed that it is already available via AUR. I started the AUR update in the meantime, but as you say, it seems to take time. Either way, there doesn't seem to be a quick fix. I will also pay attention to the MPI libraries.

@SusanKilian
Copy link
Collaborator

SusanKilian commented May 7, 2021

In the meantime I have installed gfortran 11.1.0, see

$mpifort -- version
GNU Fortran (GCC) 11.1.0
Copyright (C) 2021 Free Software Foundation, Inc.

and freshly compiled the code with it (after a fresh remote update to pull request #9491, 'add Contents line to PDF of Config Management Plan').

I also get the error message with the internal compiler error, but not with the compilation of the routine scrc.f90, but after the compilation of fire.f90. Here is the output of my compilation.

mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP="FDS6.7.5-965-gfecb325ec-dirty-master" -DGITDATE_PP=""Thu Feb 11 14:58:42 2021 +0100"" -DBUILDDATE_PP=""May 07, 2021 20:12:23"" -DCOMPVER_PP=""Gnu gfortran 11.1.0"" ../../Source/scrc.f90
mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP="FDS6.7.5-965-gfecb325ec-dirty-master" -DGITDATE_PP=""Thu Feb 11 14:58:42 2021 +0100"" -DBUILDDATE_PP=""May 07, 2021 20:12:23"" -DCOMPVER_PP=""Gnu gfortran 11.1.0"" ../../Source/ccib.f90
mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP="FDS6.7.5-965-gfecb325ec-dirty-master" -DGITDATE_PP=""Thu Feb 11 14:58:42 2021 +0100"" -DBUILDDATE_PP=""May 07, 2021 20:12:23"" -DCOMPVER_PP=""Gnu gfortran 11.1.0"" -fopenmp ../../Source/radi.f90
mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP="FDS6.7.5-965-gfecb325ec-dirty-master" -DGITDATE_PP=""Thu Feb 11 14:58:42 2021 +0100"" -DBUILDDATE_PP=""May 07, 2021 20:12:23"" -DCOMPVER_PP=""Gnu gfortran 11.1.0"" ../../Source/part.f90
mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP="FDS6.7.5-965-gfecb325ec-dirty-master" -DGITDATE_PP=""Thu Feb 11 14:58:42 2021 +0100"" -DBUILDDATE_PP=""May 07, 2021 20:12:23"" -DCOMPVER_PP=""Gnu gfortran 11.1.0"" ../../Source/vege.f90
mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP="FDS6.7.5-965-gfecb325ec-dirty-master" -DGITDATE_PP=""Thu Feb 11 14:58:42 2021 +0100"" -DBUILDDATE_PP=""May 07, 2021 20:12:23"" -DCOMPVER_PP=""Gnu gfortran 11.1.0"" -fopenmp ../../Source/mass.f90
mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP="FDS6.7.5-965-gfecb325ec-dirty-master" -DGITDATE_PP=""Thu Feb 11 14:58:42 2021 +0100"" -DBUILDDATE_PP=""May 07, 2021 20:12:23"" -DCOMPVER_PP=""Gnu gfortran 11.1.0"" ../../Source/wall.f90
mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP="FDS6.7.5-965-gfecb325ec-dirty-master" -DGITDATE_PP=""Thu Feb 11 14:58:42 2021 +0100"" -DBUILDDATE_PP=""May 07, 2021 20:12:23"" -DCOMPVER_PP=""Gnu gfortran 11.1.0"" ../../Source/fire.f90
f951: internal compiler error: Segmentation fault
0x16db598 internal_error(char const*, ...)
???:0
0x760730 gfc_sym_get_dummy_args(gfc_symbol*)
???:0
0x8089c9 gfc_code_walker(gfc_code**, int ()(gfc_code**, int, void*), int ()(gfc_expr**, int, void*), void*)
???:0
0x8089c9 gfc_code_walker(gfc_code**, int ()(gfc_code**, int, void*), int ()(gfc_expr**, int, void*), void*)
???:0
0x808ade gfc_code_walker(gfc_code**, int ()(gfc_code**, int, void*), int ()(gfc_expr**, int, void*), void*)
???:0
0x808ade gfc_code_walker(gfc_code**, int ()(gfc_code**, int, void*), int ()(gfc_expr**, int, void*), void*)
???:0
0x808ade gfc_code_walker(gfc_code**, int ()(gfc_code**, int, void*), int ()(gfc_expr**, int, void*), void*)
???:0
0x80a16a gfc_run_passes(gfc_namespace*)
???:0
0x727363 gfc_resolve(gfc_namespace*)
???:0
0x73884f gfc_resolve_code(gfc_code*, gfc_namespace*)
???:0
0x73b147 gfc_resolve_blocks(gfc_code*, gfc_namespace*)
???:0
0x738716 gfc_resolve_code(gfc_code*, gfc_namespace*)
???:0
0x72733a gfc_resolve(gfc_namespace*)
???:0
0x719909 gfc_parse_file()
???:0
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See https://bugs.archlinux.org/ for instructions.

I really want to help, but I'm afraid I'm not quite clear on how to fix something here. The Intel compiler doesn't complain about anything related to the compilation of scrc.f90 and I also don't get any error message with gfortran 11.1.0.

Is it not possible that this is really still an internal compiler error that seems non-deterministic in some way?

Or rather @marcfehling, in which way could you associate this error with scrc.f90 concerning the upper mentioned bisect? Can you give me any more hints on how to nail down the problem?

@mcgratta
Copy link
Contributor

mcgratta commented May 7, 2021

Susan or Marc

Could you comment out the lines with FINDLOC in scrc.f90 and soot.f90 and then try to compile again. I tried to compile the code with the beta Fortran compiler ifx that comes with oneAPI, and I get "internal errors" for lines with FINDLOC. I think the syntax of this line is OK, but maybe there is some problem with the function. It is a relatively new Fortran feature.

@SusanKilian
Copy link
Collaborator

SusanKilian commented May 7, 2021 via email

@marcfehling
Copy link
Collaborator Author

marcfehling commented May 7, 2021

Or rather @marcfehling, in which way could you associate this error with scrc.f90 concerning the upper mentioned bisect? Can you give me any more hints on how to nail down the problem?

I assume you used the make_fds.sh script to compile? The script compiles fds with 4 jobs, so multiple files may be compiled at once. The error message that you get does not necessarily correspond to the last call of mpifort.

Usually make will point you to the file that created the error. It's the line that immediately follows the output you posted. For me, it is:

make: *** [scrc.o] Error 1

Could you comment out the lines with FINDLOC in scrc.f90 and soot.f90 and then try to compile again. I tried to compile the code with the beta Fortran compiler ifx that comes with oneAPI, and I get "internal errors" for lines with FINDLOC. I think the syntax of this line is OK, but maybe there is some problem with the function. It is a relatively new Fortran feature.

I commented out these lines as requested in both files. The code might not make any sense then, but we just want to see if we can compile it. Anyways, I get the exact same internal compiler error, so the cause might be something different. Let's see if Susan can reproduce it.

@mcgratta
Copy link
Contributor

mcgratta commented May 7, 2021 via email

@SusanKilian
Copy link
Collaborator

I could implement the functionality of the FINDLOC by myself, if you like ...

Concerning the gfortran 11.1.0 compiler error: I've been trying to understand the problem for quite a while now, but unfortunately without success so far: For example, I've split scrc into 2 routines, each compiled separately, successively packing more and more modules into the first routine to better narrow down the first occurrence of the error (such that the compilation error only occurs in routine 2). This suggests at first sight that it happens in Module scarc_methods. There again, I successively restricted subroutines to only their call (and commented away the contents) to see at what point even the second routine compiles again. But everything I do here changes the result in a non-deterministic way. Depending on what I'm commenting away, or even in what order each subroutine is listed, the result changes from when compilation is possible again. So it seems to happen sometimes in this or that routine (among them routines, which have been used for years in an unchanged way). So, there is no clear picture yet, but seems more or less random.

In another test I commented away all MPI calls, but also without success.

I have to do some more thinking to come up with another testing strategy. Do you probably have another idea, how/what to test more? Maybe it would actually make sense if I sent a bug report to Gnu, since the whole behavior seems very non-deterministic to me.

@mcgratta
Copy link
Contributor

mcgratta commented May 8, 2021 via email

@marcfehling
Copy link
Collaborator Author

@SusanKilian Are you able to compile your separate ScaRC repository with gfortran 11? I see you separated the scarc.f90 file into a umber of files here. The compiler would give you a hint in which file it encounters the error.

@SusanKilian
Copy link
Collaborator

@marcfehling, that is a very good hint and I have already checked that. But it shows the same behavior, which is not surprising since scrc is nothing else than the merged version of these scarc_XXX routines in the appropriate order. It breaks in the same module as for scrc, but without a corresponding error message and the procedure described above (successively commenting out of individual segments) in this module shows the same non-deterministic and non-targeting behavior. It must be a more overarching effect that is not yet clear to me.

@SusanKilian
Copy link
Collaborator

After a very extensive testing (with successive commenting away and re-inserting code segments) it now looks like I have a compilable version. On the one hand I have reworked my USE statements again more strictly. On the other hand I had to change the position of the routine SCARC_METHOD_COARSE in the corresponding module. To be honest, I still don't understand what the exact reason is, or rather I'm still trying to figure it out (comparing to the now compilable version), so that it can hopefully be avoided in the future. I am actively working on it. Also, I would very much like to check my test cases again to make sure that no new side effect has been created by all these tests.

Very many effects during testing seemed to occur more or less randomly and the results were very often non-deterministic. Perhaps it is really the case that the new compiler is not yet completely stable as the upper error message also suggests. At least it seems to be much more picky than the Intel compiler or its 10 predecessor.

But I still have a question: Is there a certain reason that in some routines (e.g. main) 'USE MPI_F08' is called explicitly, although it is also contained in 'USE GLOBAL_CONSTANTS', which was called before. I had some of these redundancies in scrc as well and have removed them in the currently running version.

@mcgratta
Copy link
Contributor

mcgratta commented May 10, 2021 via email

@SusanKilian
Copy link
Collaborator

SusanKilian commented May 11, 2021

Regarding the USE-statements: I think that it probably doesn't matter or that the compiler should resolve/optimize possible redundancies.

If you want, I could upload the new version now. To summarize again, this one includes

  1. deleting redundant MPI_F08-use statements where it was already included via GLOBAL_CONSTANTS before (which, however, is apparently not critical for the compilation with the 11 version )
  2. the stricter application of some other USE statements (apparently also not critical)
  3. the removal of a supposed variable redefinition, which has not caused any problems with other compilers so far (critical here)
  4. moving a subroutine to a different location in the same module (critical here).

If you prefer, I can also distribute these changes in several PRs.

The necessity of the action 4. is really not clear to me at all. Therefore I still suspect that the new compiler version is not quite stable yet. I would therefore be interested to know if this version (once uploaded) can be compiled by others using the new compiler.

@mcgratta
Copy link
Contributor

mcgratta commented May 11, 2021 via email

@SusanKilian
Copy link
Collaborator

Yes, probably that is exactly the reason. Although, as I said, this routine has been in place for years. Anyway. I actually consistently feel that the GNU compiler is more strict than Intel and helps parse the code even more strictly.

Meanwhile, I have compiled the code with Intel and Gnu (with and without MKL). It seems to work for me. My tests also seem to run. I'm about to send out the PR and hope that everything is now running with it.

@SusanKilian
Copy link
Collaborator

I have sent the PR. @marcfehling, could you please check if the code also compiles for you?

@marcfehling
Copy link
Collaborator Author

marcfehling commented May 11, 2021

I can confirm that #9500 allows me to successfully build the fds master branch with gfortran 11.1.0. Nice job Susan!

@SusanKilian
Copy link
Collaborator

@marcfehling: I am relieved, thank you Marc for testing!

@mcgratta
Copy link
Contributor

Thank you both for working this issue. These compiler errors are tricky, but we think it is very worth the effort to ensure that FDS compiles on all platforms.

@marcfehling
Copy link
Collaborator Author

marcfehling commented May 13, 2021

As a follow-up: I know that you are doing nightly builds for fds to check its integrity, but would it make sense to also think about Continuous Integration for multiple compilers and compiler versions?

github-actions is easy to set up and is even free for public repositories. Check out this webpage for further information.

As I see, you are already preparing smokeview to use this service: firemodels/smv#1051

@marcfehling
Copy link
Collaborator Author

marcfehling commented May 13, 2021

Further, talking from experience: Compiling code with more rigorous warning flags helps to increase code quality. Treating them as errors in your CI or nightly builds forces you to fix them. This increases code stability immensely, and maybe even prevents compiler problems as this one in the future.

According to the Intel documentation, something like this could be a good start: -warn all,errors
For gfortran, this would translate to: -Wall -Wextra -Werror

(EDIT: Shortly after writing these lines I realized that you already enable warnings in the *_db targets in your makefile and also in your "Firebot". Still, considering them as errors if you set up CI is worth a thought.)

@gforney
Copy link
Contributor

gforney commented May 13, 2021 via email

@emanuelegissi
Copy link
Collaborator

emanuelegissi commented May 13, 2021 via email

@SusanKilian
Copy link
Collaborator

Another good tip, Marc, thank you. I have been able to locate some further unclean lines via -Werror and -Wextra. The option -Wall was already included. Adding these two additional flags for testing also results in more messages in other routines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants