Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fortran+C configure test fails when LTO LDFLAGS are specified #12674

Open
eli-schwartz opened this issue Jul 11, 2024 · 5 comments
Open

Fortran+C configure test fails when LTO LDFLAGS are specified #12674

eli-schwartz opened this issue Jul 11, 2024 · 5 comments

Comments

@eli-schwartz
Copy link

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

5.0.3 tarball

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Using Gentoo Portage (I am currently working to upgrade Gentoo from openmpi 4.1.6 to 5.0.3).


Details of the problem

I tried to build with the following *FLAGS to optimize the build: -flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing

Note the -Werror=* flags are used to help detect cases where the compiler tries to optimize by assuming UB cannot exist in the source code -- if it does exist, ordinarily the code would be miscompiled, and this says to make the miscompilation a fatal error.

I wasn't able to successfully finish ./configure

I got this error:

*** Fortran compiler
checking for x86_64-pc-linux-gnu-gfortran... x86_64-pc-linux-gnu-gfortran
checking whether the compiler supports GNU Fortran... yes
checking whether x86_64-pc-linux-gnu-gfortran accepts -g... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/x86_64-pc-linux-gnu-nm -B
checking the name lister (/usr/bin/x86_64-pc-linux-gnu-nm -B) interface... BSD nm
configure: WARNING: Open MPI now ignores the F77 and FFLAGS environment variables; only the FC and FCFLAGS environment variables are used.
checking whether ln -s works... yes
checking if Fortran compiler works... yes
checking for extra arguments to build a shared library... none needed
checking for x86_64-pc-linux-gnu-gfortran warnings flags... none
checking for Fortran flag to compile .f files... none
checking for Fortran flag to compile .f90 files... none
checking if Fortran compilers preprocess .F90 files without additional flag... yes
checking to see if Fortran compilers need additional linker flags... none
checking  external symbol convention... single underscore
checking if C and Fortran are link compatible... no
**********************************************************************
It appears that your Fortran compiler is unable to link against
object files created by your C compiler.  This typically indicates
one of a few possibilities:

  - A conflict between CFLAGS and FCFLAGS
  - A problem with your compiler installation(s)
  - Different default build options between compilers (e.g., C
    building for 32 bit and Fortran building for 64 bit)
  - Incompatible compilers

Such problems can usually be solved by picking compatible compilers
and/or CFLAGS and FCFLAGS.  More information (including exactly what
command was given to the compilers and what error resulted when the
commands were executed) is available in the config.log file in this
directory.
**********************************************************************
configure: error: C and Fortran compilers are not link compatible.  Can not continue.

Here is the relevant snippet from config.log:

configure:30053: checking if C and Fortran are link compatible
configure:30099: x86_64-pc-linux-gnu-gcc -c -DNDEBUG -pipe -march=native -fstack-protector-all -O2 -fdiagnostics-color=always -frecord-gcc-switches -flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=strict-al
iasing  -Wformat -Werror=format-security -Werror=implicit-function-declaration -Werror=implicit-int -Werror=int-conversion -Werror=incompatible-pointer-types -finline-functions  conftest_c.c
configure:30102: $? = 0
configure:30120: x86_64-pc-linux-gnu-gfortran -o conftest -pipe -march=native -fstack-protector-all -O2 -fdiagnostics-color=always -frecord-gcc-switches -flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=stri
ct-aliasing -Wl,-O1 -Wl,--as-needed -Wl,-z,pack-relative-relocs -flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -Wl,--defsym=__gentoo_check_ldflags__=0  conftest.f90 conftest_c.o  >&5
conftest.f90:4:23: error: type of 'testfunc' does not match original declaration [-Werror=lto-type-mismatch]
    4 |        call testfunc(1)
      |                       ^
conftest_c.c:2:5: note: return value type mismatch
    2 | int testfunc_(int a) { return a; }
      |     ^
conftest_c.c:2:5: note: type 'int' should match type 'void'
conftest_c.c:2:5: note: 'testfunc_' was previously declared here
lto1: some warnings being treated as errors
lto-wrapper: fatal error: x86_64-pc-linux-gnu-gfortran returned 1 exit status
compilation terminated.
/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../x86_64-pc-linux-gnu/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
configure:30120: $? = 1
configure: failed program was:
|       program main
| 
|        external testfunc
|        call testfunc(1)
| 
|       end
configure:30147: result: no
configure:30173: error: C and Fortran compilers are not link compatible.  Can not continue.

Full build log: openmpi-5.0.3:20240711-181726.log
Contents of autotools' config.log: config.log

@jsquyres
Copy link
Member

We had a lengthy conversation about this over in openpmix/openpmix#3350.

The result of the conversation was that we went back and checked the MCA base code and validated that it is doing what we expect it to do, and we are convinced that the code is (still) correct. There may be some disagreement on that point from the OP, but we do not have the bandwidth to make the substantial changes that would be required to make LTO compilers be able to grok our tree; we're also not convinced that LTO would noticeably improve Open MPI's performance (especially since 1) the OMPI code base is already optimized down to single-digit microsecond -- and sometimes even just hundreds of nanoseconds -- overheads in critical code path, and also 2) since much of Open MPI's core functionality is invoked via function pointers that are dynamically determined at run time, there's not too much that an LTO could do, anyway). That being said, if Open MPI's code base ever breaks because a compiler determines that our code is wrong in this area, we give the OP the right to say "I told you so!" 😉

The conversation was fruitful and it was an excellent exercise to go re-validate that (we believe, at least, that) our code was written and working as intended.

PMIx ended up adding a configure-time check to see if LTO flags were enabled, and if so, abort (since that will simply result in a link failure later). We might do the same here in OMPI.

Thanks, everyone, for the conversation.

@eli-schwartz
Copy link
Author

we're also not convinced that LTO would noticeably improve Open MPI's performance (especially since 1) the OMPI code base is already optimized down to single-digit microsecond -- and sometimes even just hundreds of nanoseconds -- overheads in critical code path, and also 2) since much of Open MPI's core functionality is invoked via function pointers that are dynamically determined at run time, there's not too much that an LTO could do, anyway

Yup -- my primary concern is actually just that people will have it enabled system-wide and then ompi gets issues.

That being said -- in this specific case it's actually a configure time error because of the configure probes to check for a working Fortran compiler, so getting those configure probes working shouldn't require a redesign of MCA or the ompi function pointer design. (I think it's just an issue in how the wrappers pass information between Fortran and C. Basically, fortran is claiming it takes void.) Maybe it's not worth it if you aren't going to make the entire codebase support LTO anyway, but...

... if you're going to add a configure check and abort early, make sure to add it somewhere early in configure.ac, at least before checking for a Fortran compiler. :D Because the most confusing part of this is actually that it errored out and said "nope, sorry, your fortran compiler is broken and cannot compile code".

@jsquyres
Copy link
Member

Fair enough. I did actually start to investigate the Fortran compiler configure test failure yesterday and was able to replicate the issue. I ran out of time before figuring out the root cause (I'm not a Fortran expert). I'll keep poking at this, but to be honest: probably with only medium-level priority.

Suggestions for how to fix the test would be appreciated, if anyone else knows more about Fortran+C+LTO issues.

@jsquyres jsquyres reopened this Jul 23, 2024
@jsquyres jsquyres changed the title Build fails with LTO Fortran+C configure test fails when LTO LDFLAGS are specified Jul 23, 2024
@bosilca
Copy link
Member

bosilca commented Jul 24, 2024

I took a stab at this, and after spending few hours I reached the conclusion that this is a lot of grunt work, for very little benefits, other than being able to build with LTO support. The problem is that with LTO enabled the Fortran compiler will not simply accept and EXTERNAL function or subroutine, it needs the exact prototype. In general that could have been relatively easy to handle, until one starts messing around with array of CHARACTER when things get messy or the LTO checker decides that there is a mismatch between the Fortran and C types. I started to play around with the ISO_C types, but doing this defeat the original purpose of our configure checking.

If someone is interested in continuing this, I attached the sketch of what need to be done to all Fortran checks. Good luck !

diff --git a/config/ompi_fortran_get_sizeof.m4 b/config/ompi_fortran_get_sizeof.m4
index e25d982c58..b0229f9740 100644
--- a/config/ompi_fortran_get_sizeof.m4
+++ b/config/ompi_fortran_get_sizeof.m4
@@ -32,7 +32,13 @@ AC_DEFUN([OMPI_FORTRAN_GET_SIZEOF],[
          cat > conftestf.f90 <<EOF
 program fsize
 $1
-   external size
+    interface
+       subroutine size(x, y)
+          $2 :: x
+          $2 :: y
+       end subroutine size
+   end interface
+
    $2 :: x(2)
    call size(x(1),x(2))
 end program
@@ -52,6 +58,7 @@ $ompi_conftest_h
 #ifdef __cplusplus
 extern "C" {
 #endif
+void $ompi_ac_size_fn(char *a, char *b);
 void $ompi_ac_size_fn(char *a, char *b)
 {
     int diff = (int) (b - a);
diff --git a/config/opal_lang_link_with_c.m4 b/config/opal_lang_link_with_c.m4
index 496081f4b0..8242a1a69e 100644
--- a/config/opal_lang_link_with_c.m4
+++ b/config/opal_lang_link_with_c.m4
@@ -40,8 +40,18 @@ EOF
         LIBS="conftest_c.o $LIBS"
         m4_if(ompi_lang_link_with_c_fortran, 1,
           [AC_LINK_IFELSE([AC_LANG_PROGRAM([], [
-       external testfunc
-       call testfunc(1)
+       interface
+          function testfunc(n) result(r)
+             use, intrinsic :: iso_c_binding, only: c_int, c_char
+             implicit none
+             integer(c_int) :: r
+             integer(c_int), value :: n
+           end function testfunc
+       end interface
+
+       integer outcome
+       outcome = testfunc(1)
+
 ])],
              [AS_VAR_SET(lang_var, ["yes"])], [AS_VAR_SET(lang_var, ["no"])])],
           [AC_LINK_IFELSE([AC_LANG_PROGRAM([

@rhc54
Copy link
Contributor

rhc54 commented Jul 25, 2024

FWIW: I added configure logic to both PMIx and PRRTE to detect that LTO had been requested and error out due to incompatibility. So even if you got this to work, OMPI will still fail to build when it hits either of those packages.

I'd suggest following the last advice and just add the "detect LTO and error out" logic to occur before the Fortran check in OMPI so we don't even attempt this configury.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants