Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMPI configure script wrongly recognizes which directive to use for ignoring tkr in case of the new LLVM Fortran compiler #12506

Closed
pawosm-arm opened this issue Apr 30, 2024 · 9 comments

Comments

@pawosm-arm
Copy link

Background information

OpenMPI configure script wrongly recognizes which directive to use for ignoring tkr in case of the new LLVM Fortran compiler. This is due to the lack of the knowledge about such compiler in the https://github.com/open-mpi/ompi/blob/main/config/ompi_fortran_check_ignore_tkr.m4 script.

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

git, main branch

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

from a git clone

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

 6f81bfd163f3275d2b0630974968c82759dd4439 3rd-party/openpmix (v1.1.3-3983-g6f81bfd1)
 4f27008906d96845e22df6502d6a9a29d98dec83 3rd-party/prrte (psrvr-v2.0.0rc1-4746-g4f27008906)
 dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (heads/main)

Please describe the system on which you are running

  • Operating system/version: Ubuntu 22
  • Computer hardware: AArch64
  • Network type: LAN

Details of the problem

Consider simple program (extracted from some real HPC workload):

subroutine bad(var)
  use mpi
  implicit none
  real, intent(inout) :: var(:, :, :)
  integer :: a, b

  call MPI_RECV_INIT( var(1,1,1), 1, 0, 0, 0, 0, a, b)
end subroutine

This used to compile with any of the Fortran compilers at my disposal with OpenMPI 4.1.6. Unfortunately, this does not compile with the latest development of the OpenMPI, due to the change in the way the Fortran modules are being prepared. It now throws this error: Element of assumed-shape array may not be associated with a dummy argument 'buf=' array.

Previously, it was the ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces.F90 file which defined the MPI_Recv_init interface as such:

interface  MPI_Recv_init
subroutine MPI_Recv_init_f08(buf,count,datatype,source,tag,comm,request,ierror)
   use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request
   implicit none
   !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf
   !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf
   !$PRAGMA IGNORE_TKR buf
   !DIR$ IGNORE_TKR buf
   !IBM* IGNORE_TKR buf
   OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf
   INTEGER, INTENT(IN) :: count, source, tag
   TYPE(MPI_Datatype), INTENT(IN) :: datatype
   TYPE(MPI_Comm), INTENT(IN) :: comm
   TYPE(MPI_Request), INTENT(OUT) :: request
   INTEGER, OPTIONAL, INTENT(OUT) :: ierror
end subroutine MPI_Recv_init_f08
end interface  MPI_Recv_init

Now this file does not define that interface anymore, currently that interface is generated by the configure script from the following template held in the mpi-f08-interfaces.h.in file:

interface  MPI_Recv_init
subroutine MPI_Recv_init_f08(buf,count,datatype,source,tag,comm,request,ierror)
   use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request
   implicit none
   @OMPI_FORTRAN_IGNORE_TKR_PREDECL@ buf
   @OMPI_FORTRAN_IGNORE_TKR_TYPE@ OMPI_ASYNCHRONOUS :: buf
   INTEGER, INTENT(IN) :: count, source, tag
   TYPE(MPI_Datatype), INTENT(IN) :: datatype
   TYPE(MPI_Comm), INTENT(IN) :: comm
   TYPE(MPI_Request), INTENT(OUT) :: request
   INTEGER, OPTIONAL, INTENT(OUT) :: ierror
end subroutine MPI_Recv_init_f08
end interface  MPI_Recv_init

The IGNORE_TKR part is handled by the following lines of the config/ompi_fortran_check_ignore_tkr.m4 script:

AC_DEFUN([_OMPI_FORTRAN_CHECK_IGNORE_TKR], [
    OPAL_VAR_SCOPE_PUSH([internal_ignore_tkr_happy ompi_fortran_ignore_tkr_predecl ompi_fortran_ignore_tkr_type])

    # If we were called here, it means that the value was not cached,
    # so we need to check several different things.  Since CACHE_CHECK
    # puts up a MSG_CHECKING, we need to terminate it with a bogus
    # answer before doing the individual checks.
    AC_MSG_RESULT([not cached; checking variants])

    # Default values
    ompi_fortran_ignore_tkr_predecl=!
    ompi_fortran_ignore_tkr_type=real

    # Vendor-neutral, TYPE(*) syntax
    OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
         [!], [type(*)],
         [TYPE(*), DIMENSION(*)],
         [internal_ignore_tkr_happy=1], [internal_ignore_tkr_happy=0])

    # GCC compilers
    AS_IF([test $internal_ignore_tkr_happy -eq 0],
          [OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
              [!GCC\$ ATTRIBUTES NO_ARG_CHECK ::], [type(*), dimension(*)],
              [!GCC\$ ATTRIBUTES NO_ARG_CHECK],
              [internal_ignore_tkr_happy=1], [internal_ignore_tkr_happy=0])])
    # Intel compilers
    AS_IF([test $internal_ignore_tkr_happy -eq 0],
          [OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
              [!DEC\$ ATTRIBUTES NO_ARG_CHECK ::], [real, dimension(*)],
              [!DEC\$ ATTRIBUTES NO_ARG_CHECK],
              [internal_ignore_tkr_happy=1], [internal_ignore_tkr_happy=0])])
    # Solaris Studio compilers
    # Note that due to a compiler bug, we have been advised by Oracle to
    # use the "character(*)" type
    AS_IF([test $internal_ignore_tkr_happy -eq 0],
          [OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
              [!\$PRAGMA IGNORE_TKR], [character(*)],
              [!\$PRAGMA IGNORE_TKR],
              [internal_ignore_tkr_happy=1], [internal_ignore_tkr_happy=0])])
    # Cray compilers
    AS_IF([test $internal_ignore_tkr_happy -eq 0],
          [OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
              [!DIR\$ IGNORE_TKR], [real, dimension(*)],
              [!DIR\$ IGNORE_TKR],
              [internal_ignore_tkr_happy=1], [internal_ignore_tkr_happy=0])])
    # IBM compilers
    AS_IF([test $internal_ignore_tkr_happy -eq 0],
          [OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
              [!IBM* IGNORE_TKR], [real, dimension(*)],
              [!IBM* IGNORE_TKR],
              [internal_ignore_tkr_happy=1], [internal_ignore_tkr_happy=0])])

    AS_VAR_SET(fortran_ignore_tkr_data,
               [${internal_ignore_tkr_happy}:${ompi_fortran_ignore_tkr_type}:${ompi_fortran_ignore_tkr_predecl}])

    # Now put the original CACHE_CHECK MSG_CHECKING back so that it can
    # output the MSG_RESULT.
    AC_MSG_CHECKING([Fortran compiler ignore TKR syntax])
    OPAL_VAR_SCOPE_POP
])dnl

Sadly, this script is prone to fall into a false positive and the configure script output looks like this:

checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
checking for Fortran compiler support of !GCC$ ATTRIBUTES NO_ARG_CHECK... yes
checking Fortran compiler ignore TKR syntax... 1:type(*), dimension(*):!GCC$ ATTRIBUTES NO_ARG_CHECK ::

The new LLVM Fortran compiler does not support any of the !GCC$ directives, but also does not fail to compile the programs containing it. Whatever check is being done here is insufficient. But the real problem is that none of the other alternatives fits perfectly. The closest is Cray, but it specifies a wrong type for OMPI_FORTRAN_IGNORE_TKR_TYPE.

The easiest solution I could recommend is to add a check for LLVM Fortran compiler and place it BEFORE the GCC compilers, e.g.:

          [TYPE(*), DIMENSION(*)],
          [internal_ignore_tkr_happy=1], [internal_ignore_tkr_happy=0])

+    # LLVM compilers
+    AS_IF([test $internal_ignore_tkr_happy -eq 0],
+          [OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
+              [!DIR\$ IGNORE_TKR], [type(*), dimension(*)],
+              [!DIR\$ IGNORE_TKR],
+              [internal_ignore_tkr_happy=1], [internal_ignore_tkr_happy=0])])
     # GCC compilers
     AS_IF([test $internal_ignore_tkr_happy -eq 0],
           [OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(

After re-generating the configure script with autogen.pl, it generates the interface that does not prevent the compilation of the example Fortran program.

@pawosm-arm
Copy link
Author

pawosm-arm commented Apr 30, 2024

Note that the CMake scripts may be affected similar way.

The top-level OpenMPI project doesn't use CMake.

@ggouaillardet
Copy link
Contributor

Thanks for the report!

I am able to reproduce it and will have a look.

FWIW, with LLVM 18.1.4, I first faced an issue with ASYNCHRONOUS (accepted in interfaces but not subroutines (!))
the inline patch below helped me

Can you please confirm which LLVM version you are running and your Open MPI configure command line?

diff --git a/config/ompi_fortran_check_asynchronous.m4 b/config/ompi_fortran_check_asynchronous.m4
index 0cc3c84bfe..061511730b 100644
--- a/config/ompi_fortran_check_asynchronous.m4
+++ b/config/ompi_fortran_check_asynchronous.m4
@@ -35,6 +35,10 @@ SUBROUTINE binky(buf)
  REAL, DIMENSION(*), ASYNCHRONOUS :: buf
 END SUBROUTINE
 END INTERFACE
+CONTAINS
+SUBROUTINE wookie(buf)
+ REAL, DIMENSION(*), ASYNCHRONOUS :: buf
+END SUBROUTINE
 END MODULE asynch_mod]])],
              [AS_VAR_SET(asynchronous_var, yes)],
              [AS_VAR_SET(asynchronous_var, no)])

@pawosm-arm
Copy link
Author

pawosm-arm commented May 1, 2024

Thanks for your prompt response!

For my experiments, I'm using fairly recent LLVM built from the main branch:

$ mpifort --version
flang-new version 19.0.0git (https://github.com/llvm/llvm-project.git 3197146cc6ae1cefe0b21326f0509add3369decb)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir:
Build config: +unoptimized, +assertions

I guess the feature you've mentioned has been implemented by this commit:

commit f31ac3cb1ff7958fadf6e3e431790f2d668583b4
Author: Peter Klausler <35819229+klausler@users.noreply.github.com>
Date:   Fri Mar 1 14:43:31 2024 -0800

    [flang] Handle implied ASYNCHRONOUS attribute (#82638)

    The standard states that data objects involved in an asynchronous data
    transfer statement gain the ASYNCHRONOUS attribute implicitly in the
    surrounding subprogram or BLOCK scope. This attribute affects the checks
    in call semantics, as an ASYNCHRONOUS actual object associated with an
    ASYNCHRONOUS dummy argument must not require data copies in or out.

    (Most compilers don't implement implied ASYNCHRONOUS attributes
    correctly; XLF gets these right, and GNU is close.)

I've verified that the git log starting from the llvmorg-18.1.4 does NOT contain this commit.

@ggouaillardet
Copy link
Contributor

thanks, that makes more sense to me now!

I will rebuild LLVM and try again.

@pawosm-arm
Copy link
Author

I think I was too fast with my response. This commit refers to implied attribute, while your code explicitly name it.
If you still encounter the problem, can you paste the error message? I haven't seen it myself yet.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue May 2, 2024
as reported in open-mpi#12506, upcoming LLVM 19 can generate some
false positive that will make the mpi Fortran modules unusable.
Harden the test by using a module in order to fix that.

Thanks Paul Osmialowski for bringing this to our attention.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
@ggouaillardet
Copy link
Contributor

I issued #12512 to address the false positives and enable IGNORE_TKR with upcoming LLVM 19.

Can you please give it a try?

janjust pushed a commit to janjust/ompi that referenced this issue May 2, 2024
as reported in open-mpi#12506, upcoming LLVM 19 can generate some
false positive that will make the mpi Fortran modules unusable.
Harden the test by using a module in order to fix that.

Thanks Paul Osmialowski for bringing this to our attention.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 8c601e1)
@janjust
Copy link
Contributor

janjust commented May 2, 2024

fixed with: #12512

@janjust janjust closed this as completed May 2, 2024
@pawosm-arm
Copy link
Author

It works great,

checking Fortran compiler ignore TKR syntax... not cached; checking variants
checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
checking for Fortran compiler support of !GCC$ ATTRIBUTES NO_ARG_CHECK... no
checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
checking Fortran compiler ignore TKR syntax... 1:type(*):!DIR$ IGNORE_TKR
checking if Fortran compiler supports ISO_C_BINDING... yes
checking if building Fortran 'use mpi' bindings... yes

In the generated mpi-f08-interfaces.h file:

interface  MPI_Recv_init
subroutine MPI_Recv_init_f08(buf,count,datatype,source,tag,comm,request,ierror)
   use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request
   implicit none
   !DIR$ IGNORE_TKR buf
   type(*) OMPI_ASYNCHRONOUS :: buf
   INTEGER, INTENT(IN) :: count, source, tag
   TYPE(MPI_Datatype), INTENT(IN) :: datatype
   TYPE(MPI_Comm), INTENT(IN) :: comm
   TYPE(MPI_Request), INTENT(OUT) :: request
   INTEGER, OPTIONAL, INTENT(OUT) :: ierror
end subroutine MPI_Recv_init_f08
end interface  MPI_Recv_init

It took me a while, but I'd also needed to verify it works with spack and finally, I've managed to build all the workload I needed! Thanks for quick solving this problem! :)

@jsquyres
Copy link
Member

jsquyres commented May 2, 2024

FYI: This fix will be included in the next v5.0.x release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants