Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openmpi 4.0.0: test/usnic_btl_run_tests.c:42:30: error: expected ‘)’ before ‘OMPI_LIBMPI_NAME’ #6441

Closed
georgemarselis opened this issue Feb 27, 2019 · 13 comments

Comments

@georgemarselis
Copy link

georgemarselis commented Feb 27, 2019

Thank you for taking the time to submit an issue!

Background information

Trying to install openmpi by hand for one of my scientific users.

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

v4.0.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

tarball from open-mpi website

Please describe the system on which you are running

  • Operating system/version: Linux Centos 7.5
  • Computer hardware: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz, 64 cores total
  • Network type: cisco gbit

Details of the problem

Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.

download && check checksum && untarball && cd openmpi-4.0.0

./configure --prefix=/lsc/openmpi/4.0.0 --enable-branch-probabilities --enable-pretty-print-stacktrace --enable-pty-support --enable-weak-symbols --enable-dlopen --enable-show-load-errors-by-default --enable-heterogeneous --enable-binaries --enable-script-wrapper-compilers --enable-per-user-config-files --enable-ipv6 --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default --enable-mpi-interface-warning --enable-sparse-groups --enable-peruse --enable-mpi-fortran --enable-mpi-cxx --enable-mpi-cxx-seek --enable-mpi1-compatibility --enable-grequest-extensions --enable-spc --enable-shared --enable-static --enable-wrapper-rpath --enable-wrapper-runpath --enable-cxx-exceptions --enable-builtin-atomics --enable-openib-udcm --enable-openib-rdmacm --enable-openib-rdmacm-ibaddr --enable-btl-portals4-flow-control --enable-opal-btl-usnic-unit-tests --enable-event-evport --enable-event-debug --enable-hwloc-pci --enable-visibility --enable-memchecker --enable-install-libpmix --enable-pmix-timing --enable-ft --enable-mpi-ext=affinity,cuda,pcollreq --enable-visibility --enable-fast-install --with-libnl --with-devel-headers --with-max-processor-name=256 --with-max-error-string=256 --with-max-object-name=64 --with-max-info-key=36 --with-max-info-val=256 --with-max-port-name=1024 --with-max-datarep-string=128 --with-zlib-libdir=/lsc/zlib/lib --with-cuda=/lsc/nvidia/cuda/8.0-GA1 --with-pmix=external --with-pmix-libdir=/usr/lib64 --with-mpi-param-check=always --with-oshmem-param-check=always --with-jdk-dir=/usr/lib/jvm/java-1.8.0 --with-cs-fs --with-ofi-libdir=/usr/lib64 --with-xpmem --with-valgrind --with-pmi=/lsc/pmix/3.1 --with-slurm=/lsc/slurm/18.08.3 --with-tm --with-sge --with-moab --with-singularity --with-pvfs2 --with-psm --with-psm2 --with-ompi-pmix-rte --with-orte --with-treematch=/lsc/treematch/1.3 LDFLAGS='-L/lsc/pmix/3.1/lib' CFLAG='-I/lsc/pmix/3.1/include'

all other software precompiled already

make # not a parallel build

... wait a bit

make[2]: Entering directory '/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/btl/usnic'
  CC       test/usnic_btl_run_tests-usnic_btl_run_tests.o
test/usnic_btl_run_tests.c: In function ‘main’:
test/usnic_btl_run_tests.c:42:30: error: expected ‘)’ before ‘OMPI_LIBMPI_NAME’
     mpi_handle = dlopen("lib" OMPI_LIBMPI_NAME ".so", RTLD_NOW|RTLD_GLOBAL);
                              ^~~~~~~~~~~~~~~~
test/usnic_btl_run_tests.c:42:18: error: too few arguments to function ‘dlopen’
     mpi_handle = dlopen("lib"OMPI_LIBMPI_NAME".so", RTLD_NOW|RTLD_GLOBAL);
                  ^~~~~~
In file included from test/usnic_btl_run_tests.c:23:0:
/usr/include/dlfcn.h:57:14: note: declared here
 extern void *dlopen (const char *__file, int __mode) __THROW;
              ^~~~~~
make[2]: *** [Makefile:2062: test/usnic_btl_run_tests-usnic_btl_run_tests.o] Error 1
make[2]: Leaving directory '/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/btl/usnic'
make[1]: *** [Makefile:2378: all-recursive] Error 1
make[1]: Leaving directory '/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal'
make: *** [Makefile:1896: all-recursive] Error 1

I think the way the concatenation-via-substitution is attempted here is wrong. I am not familiar with the coding standards for openmpi, so I apologize in advance if I sound presumptuous, but I think that an strcat() before the dlopen would solve things.

Thank you for your time.

@jsquyres jsquyres self-assigned this Feb 27, 2019
@jsquyres
Copy link
Member

OMPI_LIBMPI_NAME is a macro, and it should have quotes around it. So it should resolve to something like:

mpi_handle = dlopen("lib" "mpi" ".so", RTLD_NOW|RTLD_GLOBAL);

which is valid C.

Can you make V=1 in the /lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/btl/usnic directory? It should be passing in -DOMPI_LIBMPI_NAME=\"mpi\" on the command line.

@jsquyres
Copy link
Member

This is what you should see:

$ cd opal/mca/btl/usnic
$ rm btl_usnic_test.lo
$ make V=1 btl_usnic_test.lo
depbase=`echo btl_usnic_test.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
        /bin/sh ../../../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c  -DBTL_IN_OPAL=1 -I/home/jsquyres/libfabric-current/install/include -DOMPI_LIBMPI_NAME=\"mpi\" -I../../../.. -I../../../../orte/include -I/home/jsquyres/git/ompi/opal/mca/event/libevent2022/libevent -I/home/jsquyres/git/ompi/opal/mca/event/libevent2022/libevent/include -I/home/jsquyres/git/ompi/opal/mca/hwloc/hwloc201/hwloc/include    -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -mcx16 -pthread -MT btl_usnic_test.lo -MD -MP -MF $depbase.Tpo -c -o btl_usnic_test.lo btl_usnic_test.c &&\
        mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -DBTL_IN_OPAL=1 -I/home/jsquyres/libfabric-current/install/include -DOMPI_LIBMPI_NAME=\"mpi\" -I../../../.. -I../../../../orte/include -I/home/jsquyres/git/ompi/opal/mca/event/libevent2022/libevent -I/home/jsquyres/git/ompi/opal/mca/event/libevent2022/libevent/include -I/home/jsquyres/git/ompi/opal/mca/hwloc/hwloc201/hwloc/include -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -mcx16 -pthread -MT btl_usnic_test.lo -MD -MP -MF .deps/btl_usnic_test.Tpo -c btl_usnic_test.c  -fPIC -DPIC -o .libs/btl_usnic_test.o

Note the -DOMPI_LIBMPI_NAME=\"mpi\" nestled in the middle there.

@georgemarselis
Copy link
Author

georgemarselis commented Feb 28, 2019

Hey! Thank you for your time! I apologize if I came across as rude. You guys are indeed doing awesome work!

Anyway, here is what you asked:

Removing and then making the .lo object does produce the .lo object. The compiler exits cleanly. Here is the output from the above 3 commands you have:

[root@intaristotle usnic]# make V=1 btl_usnic_test.lo
depbase=`echo btl_usnic_test.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ../../../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c  -DBTL_IN_OPAL=1  -DOMPI_LIBMPI_NAME=\"mpi\" -I../../../.. -I../../../../orte/include -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/event/libevent2022/libevent -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/event/libevent2022/libevent/include -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/hwloc/hwloc201/hwloc/include    -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -fexceptions -mcx16 -pthread -MT btl_usnic_test.lo -MD -MP -MF $depbase.Tpo -c -o btl_usnic_test.lo btl_usnic_test.c &&\
mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -DBTL_IN_OPAL=1 -DOMPI_LIBMPI_NAME=\"mpi\" -I../../../.. -I../../../../orte/include -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/event/libevent2022/libevent -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/event/libevent2022/libevent/include -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/hwloc/hwloc201/hwloc/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -fexceptions -mcx16 -pthread -MT btl_usnic_test.lo -MD -MP -MF .deps/btl_usnic_test.Tpo -c btl_usnic_test.c  -fPIC -DPIC -o .libs/btl_usnic_test.o
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -DBTL_IN_OPAL=1 -DOMPI_LIBMPI_NAME=\"mpi\" -I../../../.. -I../../../../orte/include -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/event/libevent2022/libevent -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/event/libevent2022/libevent/include -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/hwloc/hwloc201/hwloc/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -fexceptions -mcx16 -pthread -MT btl_usnic_test.lo -MD -MP -MF .deps/btl_usnic_test.Tpo -c btl_usnic_test.c -o btl_usnic_test.o >/dev/null 2>&1

-DOMPI_LIBMPI_NAME="mpi" does indeed show up in the middle

Typing "make V=1" exactly after the above three commands produces the following:

[root@intaristotle usnic]# make V=1
gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c  -DBTL_USNIC_RUN_TESTS_SYMBOL=\"opal_btl_usnic_run_tests\" -I../../../.. -I../../../../orte/include -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/event/libevent2022/libevent -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/event/libevent2022/libevent/include -I/lsc/sources/openmpi/4.0.0/openmpi-4.0.0/opal/mca/hwloc/hwloc201/hwloc/include    -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -fexceptions -mcx16 -pthread -MT test/usnic_btl_run_tests-usnic_btl_run_tests.o -MD -MP -MF test/.deps/usnic_btl_run_tests-usnic_btl_run_tests.Tpo -c -o test/usnic_btl_run_tests-usnic_btl_run_tests.o `test -f 'test/usnic_btl_run_tests.c' || echo './'`test/usnic_btl_run_tests.c
test/usnic_btl_run_tests.c: In function ‘main’:
test/usnic_btl_run_tests.c:42:31: error: expected ‘)’ before ‘OMPI_LIBMPI_NAME’
     mpi_handle = dlopen("lib" OMPI_LIBMPI_NAME ".so", RTLD_NOW|RTLD_GLOBAL);
                               ^~~~~~~~~~~~~~~~
test/usnic_btl_run_tests.c:42:18: error: too few arguments to function ‘dlopen’
     mpi_handle = dlopen("lib" OMPI_LIBMPI_NAME ".so", RTLD_NOW|RTLD_GLOBAL);
                  ^~~~~~
In file included from test/usnic_btl_run_tests.c:23:0:
/usr/include/dlfcn.h:57:14: note: declared here
 extern void *dlopen (const char *__file, int __mode) __THROW;
              ^~~~~~
make: *** [Makefile:2062: test/usnic_btl_run_tests-usnic_btl_run_tests.o] Error 1

So, yes, your C is correct and I apologize again if I sounded presumptuous. It looks like -DOMPI_LIBMPI_NAME is missing here

@jsquyres
Copy link
Member

No worries, you weren't rude or presumptuous at all.

But hmm -- my experiences are different than yours. When I make V=1, I get the rules from the Makefile.am, which inserts the -D... in there, and makes it all work out.

More telling is that your 2nd output directly invokes gcc -- it doesn't invoke libtool (which ultimately invokes gcc, but with a bunch of supplemental CLI arguments). Why is that?

Are you running with the Centos 7.5-default make and gcc?

@georgemarselis
Copy link
Author

georgemarselis commented Mar 1, 2019

I am using the devtoolset-7 from the software collections

[root@intaristotle ~]# make --version
GNU Make 4.2.1
Built for x86_64-redhat-linux-gnu
Copyright (C) 1988-2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.


[root@intaristotle ~]# gcc --version
gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

But hmm -- my experiences are different than yours.

I got more weirdness coming your way, but I'll file another ticket for those. Here is a preview:

In Debian testing (buster), openmpi 4.0.0 will build with the default make/gcc (sans the previous issue i reported). openmpi 3.1.3 with the exact same long ./configure line, sans path for version change, does not.

@jsquyres
Copy link
Member

jsquyres commented Mar 1, 2019

You didn't perchance, run autogen.pl on your Open MPI tarball, did you?

In the "bad" configuration, can you send the output from the following two things:

  1. Check your local system time vs. the file server time (if you're building locally, all the timestamps will match):
$ cd opal/mca/btl/usnic
$ date '+%Y-%m-%d %H:%M:%S.%N'; touch foo.txt; ls -l --full-time foo.txt; date '+%Y-%m-%d %H:%M:%S.%N'
  1. You may need to post the results of this in a gist or pastebin or the like:
$ cd opal/mca/btl/usnic
$ rm btl_usnic_test.lo
$ make -d V=1 btl_usnic_test.lo

Note the new -d in there. That should emit a LOT of output.

Also send the Makefile from that directory. I'd like to see why rules are somehow firing incorrectly for you.

@georgemarselis
Copy link
Author

Good morning! after inserting a a cup of coffee in my mug, I can answer:

You didn't perchance, run autogen.pl on your Open MPI tarball, did you?

Nope. Negative. Absolutely not. Not that I remember.

1. Check your local system time vs. the file server time (if you're building locally, all the timestamps will match

building locally, yes, but here is the output of the command you asked for:

[root@intaristotle openmpi-4.0.0]# cd opal/mca/btl/usnic
[root@intaristotle usnic]# date '+%Y-%m-%d %H:%M:%S.%N'; touch foo.txt; ls -l --full-time foo.txt; date '+%Y-%m-%d %H:%M:%S.%N'
2019-03-04 11:49:25.081457260
-rw-r--r--. 1 root root 0 2019-03-04 11:49:25.083114676 +0100 foo.txt
2019-03-04 11:49:25.087567153

2. You may need to post the results of this in a gist or pastebin or the like:

https://gist.github.com/georgemarselis/5df316c2b52ba5f5e9f6637c9bdb446e

Also send the Makefile from that directory. I'd like to see why rules are somehow firing incorrectly for you.

Makefile.txt

there you go! thank you for looking into this!

@jsquyres
Copy link
Member

jsquyres commented Mar 4, 2019

Hmm. The "make" output you sent was a correct compile -- it didn't show the problem of the compile issue (i.e., -D was in the command line, it used libtool, ...etc.). Could you capture an output with make -d V=1 that shows the failure?

Given that it -d output is incredibly verbose, if you could replicate the issue with just a single file (e.g., the btl_usnic_test.c), that would be most helpful.

@ggouaillardet
Copy link
Contributor

@jsquyres I am afraid you are not looking at the right place.
The issue is not with btl_usnic_test.c but test/usnic_btl_run_tests.c.

and indeed, in opal/mca/btl/usnic/Makefile.am, we first

AM_CPPFLAGS = -DBTL_IN_OPAL=1 $(opal_ofi_CPPFLAGS) -DOMPI_LIBMPI_NAME=\"$(OMPI_LIBMPI_NAME)\"

but then we

usnic_btl_run_tests_CPPFLAGS = \
-DBTL_USNIC_RUN_TESTS_SYMBOL=\"opal_btl_usnic_run_tests\"

and I can only assumes this overwrites AM_CPPFLAGS.

@georgemarselis can you please replace the last block with

usnic_btl_run_tests_CPPFLAGS = \
-DBTL_USNIC_RUN_TESTS_SYMBOL=\"opal_btl_usnic_run_tests\" \
-DOMPI_LIBMPI_NAME=\"$(OMPI_LIBMPI_NAME)\"

since you installed from the tarball, you first need to have autotools installed, run
./autogen.pl --force, and then the usual configure && make

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Mar 5, 2019
do define the OMPI_LIBMPI_NAME macro via the CPPFLAGS.
The issue occurs when Open MPI is configured with
--enable-opal-btl-usnic-unit-tests

Thanks George Marselis for reporting this issue

Refs. open-mpi#6441

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Mar 5, 2019
do define the OMPI_LIBMPI_NAME macro via the CPPFLAGS.
The issue occurs when Open MPI is configured with
--enable-opal-btl-usnic-unit-tests

Thanks George Marselis for reporting this issue

Refs. open-mpi#6441

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@b409762)
@georgemarselis
Copy link
Author

yay! 2/2 bugs! I am batting a 4.0 😆

I think I may have a couple of more for you now that it is spring break over here and there is quiet.

I am going to close this issue, since you guys have it under control.

@jsquyres
Copy link
Member

jsquyres commented Mar 5, 2019

@ggouaillardet Thanks for catching my cognitive dissonance.

Re-opening this so that we can use it to track merging the fix to the release branches.

@georgemarselis
Copy link
Author

ok sorry, my bad!

jsquyres pushed a commit to jsquyres/ompi that referenced this issue Mar 5, 2019
do define the OMPI_LIBMPI_NAME macro via the CPPFLAGS.
The issue occurs when Open MPI is configured with
--enable-opal-btl-usnic-unit-tests

Thanks George Marselis for reporting this issue

Refs. open-mpi#6441

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit b409762)
jsquyres pushed a commit to jsquyres/ompi that referenced this issue Mar 5, 2019
do define the OMPI_LIBMPI_NAME macro via the CPPFLAGS.
The issue occurs when Open MPI is configured with
--enable-opal-btl-usnic-unit-tests

Thanks George Marselis for reporting this issue

Refs. open-mpi#6441

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit b409762)
@jsquyres
Copy link
Member

This has now been merged everywhere.

markalle pushed a commit to markalle/ompi that referenced this issue Sep 12, 2020
do define the OMPI_LIBMPI_NAME macro via the CPPFLAGS.
The issue occurs when Open MPI is configured with
--enable-opal-btl-usnic-unit-tests

Thanks George Marselis for reporting this issue

Refs. open-mpi#6441

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@b409762)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants