-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build fails with gfortran 11.1.0 #9468
Comments
What is the last known release of the GNU compiler that successfully builds FDS? Maybe it's just an issue with my machine. I'd be willing to try other versions. |
10 should work, not that long ago I used it. Also there is a wiki page on how to compile. |
Thanks. I will try that one out.
GCC 11 is an official release (fresh from two days ago), see https://gcc.gnu.org/. I've just built gcc 11 along with openmpi manually on our server, which otherwise only provides gcc 4.8.5 via the OS. As far as I know FDS requires at least gcc 9 now. We use CentOS 7. |
I can build FDS 6.7.5 with gfortran 10.3.0 successfully! But I still get the same error when building the master branch. This seems to be a separate issue introduced with #9361. |
We have introduced coding conventions that are compatible with the Fortran 2018 standard. This line
instructs the compile not to assume anything about the data types or external routines. We also started adhering to the MPI 2008 Fortran binding conventions. You may want to try the new Intel oneAPI compilers. They are now free, and can be installed on all platforms. |
I updated the -std flag for gnu targets to 2018. That seems to take care of the IMPLICIT NONE (TYPE,EXTERNAL) error. |
I can confirm that your change fixes the So I updated the -std flag as you did in my local copy of the Further, I tried to compile the current
As this error occurs in compiling |
Yes, I think this is a bug. We are potentially changing a loop index within the loop. I'll fix. I'm surprised our Intel compiler did not pick this up. |
FDS Source: Issue #9468. Do not change increment variable within a loop
I fixed what I believe was the problem. Can you try compiling again and let me know if it works. |
…ithin a loop Conflicts: Source/main.f90
I checked out the It appears that the above internal compiler error will be triggered by some commit since the last release. |
I'm confused. If you checkout the latest source code from firemodels/fds and compile with gfortran 11.1.0, does it succeed? And if not, what is the error message? |
Compilation on
|
But you said that the latest code does compile with Gnu 10? |
Yes, the latest code compiles with I ran a |
Susan -- a change that you made to the code caused the latest version of the Gnu Fortran compiler to fail, but the error message does not indicate why. Could you look through your changes made in 684f900 |
This commit also undid Randy's commits from the day before, was that intentional? ff26958#diff-284759de040cf76249bb9a4fa31f70d7e3efd1e6c8bd70a11a3ff708290a89a4 |
I would not think so. This is an on-going issue with Susan not updating enough. |
Jason,
|
The issue with overwrites is (was) because of two things:
|
Has this been corrected then?
… |
Susan's overwrite of my commit has been corrected, yes. |
I did not do it via a revert because it would have just added, then subtracted, then added again another 32k lines of code. Here is the commit that fixed things: |
I‘ll have a look as soon the FDS Usergroup meeting today will be over
Susan
… Am 06.05.2021 um 15:52 schrieb Kevin McGrattan ***@***.***>:
I would not think so. This is an on-going issue with Susan not updating enough.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I just had a look, but it seems that the package manager of my Arch Linux installation does not currently update to the new gfortran-11.1.0 compiler. As mentioned in the posts above, the old 10-version compiled the code without any problems, so I currently have no idea what the issue might be. Unfortunately, I won't manage to install the new 11-version in another quick way tonight. As soon as the 2nd day of the German FDS Usergroup meeting is over tomorrow, I will take care of the matter. |
A fellow Arch Linux user :) You can install the The server I was trying to compile FDS on only had |
Hello Marc, shortly after I wrote the post, I just noticed that it is already available via AUR. I started the AUR update in the meantime, but as you say, it seems to take time. Either way, there doesn't seem to be a quick fix. I will also pay attention to the MPI libraries. |
In the meantime I have installed gfortran 11.1.0, see $mpifort -- version and freshly compiled the code with it (after a fresh remote update to pull request #9491, 'add Contents line to PDF of Config Management Plan'). I also get the error message with the internal compiler error, but not with the compilation of the routine scrc.f90, but after the compilation of fire.f90. Here is the output of my compilation. mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -cpp -DGITHASH_PP="FDS6.7.5-965-gfecb325ec-dirty-master" -DGITDATE_PP=""Thu Feb 11 14:58:42 2021 +0100"" -DBUILDDATE_PP=""May 07, 2021 20:12:23"" -DCOMPVER_PP=""Gnu gfortran 11.1.0"" ../../Source/scrc.f90 I really want to help, but I'm afraid I'm not quite clear on how to fix something here. The Intel compiler doesn't complain about anything related to the compilation of scrc.f90 and I also don't get any error message with gfortran 11.1.0. Is it not possible that this is really still an internal compiler error that seems non-deterministic in some way? Or rather @marcfehling, in which way could you associate this error with scrc.f90 concerning the upper mentioned bisect? Can you give me any more hints on how to nail down the problem? |
Susan or Marc Could you comment out the lines with FINDLOC in scrc.f90 and soot.f90 and then try to compile again. I tried to compile the code with the beta Fortran compiler ifx that comes with oneAPI, and I get "internal errors" for lines with FINDLOC. I think the syntax of this line is OK, but maybe there is some problem with the function. It is a relatively new Fortran feature. |
I‘ll check that and let you know tomorrow,
Susan
… Am 07.05.2021 um 21:35 schrieb Kevin McGrattan ***@***.***>:
Susan or Marc
Could you comment out the lines with FINDLOC in scrc.f90 and soot.f90 and then try to compile again. I tried to compile the code with the beta Fortran compiler ifx that comes with oneAPI, and I get "internal errors" for lines with FINDLOC. I think the syntax of this line is OK, but maybe there is some problem with the function. It is a relatively new Fortran feature.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I assume you used the Usually
I commented out these lines as requested in both files. The code might not make any sense then, but we just want to see if we can compile it. Anyways, I get the exact same internal compiler error, so the cause might be something different. Let's see if Susan can reproduce it. |
OK, it is possible that the beta Intel Fortran compiler has an issue with
FINDLOC. If I comment them out, the code compiles. But this has nothing to
do with Gnu.
… |
I could implement the functionality of the FINDLOC by myself, if you like ... Concerning the gfortran 11.1.0 compiler error: I've been trying to understand the problem for quite a while now, but unfortunately without success so far: For example, I've split scrc into 2 routines, each compiled separately, successively packing more and more modules into the first routine to better narrow down the first occurrence of the error (such that the compilation error only occurs in routine 2). This suggests at first sight that it happens in Module scarc_methods. There again, I successively restricted subroutines to only their call (and commented away the contents) to see at what point even the second routine compiles again. But everything I do here changes the result in a non-deterministic way. Depending on what I'm commenting away, or even in what order each subroutine is listed, the result changes from when compilation is possible again. So it seems to happen sometimes in this or that routine (among them routines, which have been used for years in an unchanged way). So, there is no clear picture yet, but seems more or less random. In another test I commented away all MPI calls, but also without success. I have to do some more thinking to come up with another testing strategy. Do you probably have another idea, how/what to test more? Maybe it would actually make sense if I sent a bug report to Gnu, since the whole behavior seems very non-deterministic to me. |
Do not change the FINDLOC calls. I think there must be a bug in the beta
version of the Intel compiler, which we are not using. I see nothing wrong
with your FINDLOC calls. There must be something else that the Gnu compiler
is having trouble with.
… |
@SusanKilian Are you able to compile your separate ScaRC repository with |
@marcfehling, that is a very good hint and I have already checked that. But it shows the same behavior, which is not surprising since scrc is nothing else than the merged version of these scarc_XXX routines in the appropriate order. It breaks in the same module as for scrc, but without a corresponding error message and the procedure described above (successively commenting out of individual segments) in this module shows the same non-deterministic and non-targeting behavior. It must be a more overarching effect that is not yet clear to me. |
After a very extensive testing (with successive commenting away and re-inserting code segments) it now looks like I have a compilable version. On the one hand I have reworked my USE statements again more strictly. On the other hand I had to change the position of the routine SCARC_METHOD_COARSE in the corresponding module. To be honest, I still don't understand what the exact reason is, or rather I'm still trying to figure it out (comparing to the now compilable version), so that it can hopefully be avoided in the future. I am actively working on it. Also, I would very much like to check my test cases again to make sure that no new side effect has been created by all these tests. Very many effects during testing seemed to occur more or less randomly and the results were very often non-deterministic. Perhaps it is really the case that the new compiler is not yet completely stable as the upper error message also suggests. At least it seems to be much more picky than the Intel compiler or its 10 predecessor. But I still have a question: Is there a certain reason that in some routines (e.g. main) 'USE MPI_F08' is called explicitly, although it is also contained in 'USE GLOBAL_CONSTANTS', which was called before. I had some of these redundancies in scrc as well and have removed them in the currently running version. |
I do not know whether it is best to add USE statements at the very top of
the "tree" of calling routines, or to use it only within subroutines that
make MPI calls. Originally, I only had MPI calls within main.f90, and one
USE statement. Now there are MPI calls spread throughout the code, and we
have to add the USE statements in many places. I don't know the best
strategy.
… |
Regarding the USE-statements: I think that it probably doesn't matter or that the compiler should resolve/optimize possible redundancies. If you want, I could upload the new version now. To summarize again, this one includes
If you prefer, I can also distribute these changes in several PRs. The necessity of the action 4. is really not clear to me at all. Therefore I still suspect that the new compiler version is not quite stable yet. I would therefore be interested to know if this version (once uploaded) can be compiled by others using the new compiler. |
I have seen this sort of behavior many times with many compilers. I do not
know enough about compilers to understand why legitimate lines of Fortran
cause problems. As long as the changes needed to compile with Gnu 11 do not
cause numerical issues with your routines, I suggest you make them and make
sure that the new version of the code compiles with both Intel and Gnu.
Sometimes the more strict Gnu compiler points out potential problems with
our coding practices, often related to "scoping".
It could be that moving the subroutine is related to order of compilation.
Sometimes a module at the end of the file USEs a module at the start. Or
maybe the parallel compilation (like 4 files at once) is causing confusion.
… |
Yes, probably that is exactly the reason. Although, as I said, this routine has been in place for years. Anyway. I actually consistently feel that the GNU compiler is more strict than Intel and helps parse the code even more strictly. Meanwhile, I have compiled the code with Intel and Gnu (with and without MKL). It seems to work for me. My tests also seem to run. I'm about to send out the PR and hope that everything is now running with it. |
I have sent the PR. @marcfehling, could you please check if the code also compiles for you? |
I can confirm that #9500 allows me to successfully build the |
@marcfehling: I am relieved, thank you Marc for testing! |
Thank you both for working this issue. These compiler errors are tricky, but we think it is very worth the effort to ensure that FDS compiles on all platforms. |
As a follow-up: I know that you are doing nightly builds for
As I see, you are already preparing smokeview to use this service: firemodels/smv#1051 |
Further, talking from experience: Compiling code with more rigorous warning flags helps to increase code quality. Treating them as errors in your CI or nightly builds forces you to fix them. This increases code stability immensely, and maybe even prevents compiler problems as this one in the future. According to the Intel documentation, something like this could be a good start: (EDIT: Shortly after writing these lines I realized that you already enable warnings in the |
Nightly builds are only the last step after the code is checked with
scripts firebot and smokebot .
We use these scripts to test fds and smokeview. Firebot runs every night,
builds the software, runs over 500 test cases and in addition to verifying
that the code builds correctly, checks numerous results against known
values. Firebot takes about 7 hours to run on 20 or more compute nodes.
The nightly builds are generated from repo revisions from the last firebot
pass.
Smokebot is similar but runs whenever changes are pushed to our fds and smv
central repos. It focuses on smokeview verification. It takes about 1 hour
to run on the same number of processors.
…On Wed, May 12, 2021, 11:59 PM Marc Fehling ***@***.***> wrote:
As a follow-up: I know that you are doing nightly builds for fds to check
its integrity, but would it make sense to also think about Continuous
Integration?
github-actions is easy to set up and is even free for public
repositories. Check out this webpage <https://github.com/features/actions>
for further information.
As I see, you are already preparing smokeview to use this service:
firemodels/smv#1051 <firemodels/smv#1051>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9468 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC6UCRRIHWY2WQ7L7IQDDCTTNNFATANCNFSM4325OHTQ>
.
|
They *have* continuous integration. And it is so good that allows the small
FDS team to give us (for free) the only state of the art, usable,
documented, validated, and stable tool for fire simulation.
Thank you again.
…On Thu, May 13, 2021, 11:45 gforney ***@***.***> wrote:
Nightly builds are only the last step after the code is checked with
scripts firebot and smokebot .
We use these scripts to test fds and smokeview. Firebot runs every night,
builds the software, runs over 500 test cases and in addition to verifying
that the code builds correctly, checks numerous results against known
values. Firebot takes about 7 hours to run on 20 or more compute nodes.
The nightly builds are generated from repo revisions from the last firebot
pass.
Smokebot is similar but runs whenever changes are pushed to our fds and smv
central repos. It focuses on smokeview verification. It takes about 1 hour
to run on the same number of processors.
On Wed, May 12, 2021, 11:59 PM Marc Fehling ***@***.***>
wrote:
> As a follow-up: I know that you are doing nightly builds for fds to check
> its integrity, but would it make sense to also think about Continuous
> Integration?
>
> github-actions is easy to set up and is even free for public
> repositories. Check out this webpage <
https://github.com/features/actions>
> for further information.
>
> As I see, you are already preparing smokeview to use this service:
> firemodels/smv#1051 <firemodels/smv#1051>
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#9468 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AC6UCRRIHWY2WQ7L7IQDDCTTNNFATANCNFSM4325OHTQ
>
> .
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9468 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACXMVY2ILO3O37JA2E5SY53TNONTXANCNFSM4325OHTQ>
.
|
Another good tip, Marc, thank you. I have been able to locate some further unclean lines via -Werror and -Wextra. The option -Wall was already included. Adding these two additional flags for testing also results in more messages in other routines. |
I've been trying to build FDS with the latest GNU Fortran compiler 11.1.0 and the latest release of OpenMPI 4.4.1.
However, if building FDS with this configuration either on the latest release 6.7.5 or the master branch, compilation fails with different errors as follows:
6.7.5
master
The text was updated successfully, but these errors were encountered: