Skip to content

Conversation

pkovacs
Copy link
Contributor

@pkovacs pkovacs commented Jan 16, 2018

When using ./configure --with-package=/usr/include or --with-package=/usr/local/include we do not want those directories to appear in the CPPFLAGS since they are standard system directories.

./configure --with-pmix=/usr/include --with-libevent --with-hwloc
grep opal_pmix config.log | grep CPPFLAGS
opal_pmix_ext1x_CPPFLAGS=''
opal_pmix_ext2x_CPPFLAGS=''
opal_pmix_ext3x_CPPFLAGS=''
opal_pmix_pmix3x_CPPFLAGS=''

Signed-off-by: Philip Kovacs pkdevel@yahoo.com

…with-package=DIR

Signed-off-by: Philip Kovacs <pkdevel@yahoo.com>
@ompiteam-bot
Copy link

Can one of the admins verify this patch?

@rhc54
Copy link
Contributor

rhc54 commented Jan 16, 2018

ok to test

@rhc54 rhc54 merged commit 6b3cf6f into open-mpi:master Jan 17, 2018
@ggouaillardet
Copy link
Contributor

let's say foo is installed in both default locations and in /opt/foo, and i want to use /opt/foo so i configure --with-foo=/opt/foo
my understanding is that _OPAL_CHECK_PACKAGE_HEADER will first try to
AC_CHECK_HEADERS with CPPFLAGS=$CPPFLAGS -I/opt/foo and will incorrectly success (success because the headers are in /usr/include so #include works, and incorrect because headers are not in /opt/foo, but likely in /opt/foo/include)

generally speaking, Open MPI gives the enduser what they ask for, so if an end user decides to
configure --with-foo=/usr/include --with-foo-lib=/opt/foo/lib, then -I/usr/include will end up in the CPPFLAGS. In this case, correct usage is configure --with-foo-lib=/opt/foo/lib, so there is nothing we should do on the Open MPI side.

if OPAL_CHECK_PACKAGE is invoked internally with dir_prefix=/usr and imho, this is a bug that should be fixed in the caller.

Generally speaking, /usr/local is compiler dependent. On my CentOS 7 box, GNU compilers do try this by default, but Oracle compilers do not (and i do not know how to detect that correctly)

bottom line, i am not sure @alex-mikheev and @pkovacs are discussing the same issue.

@pkovacs pkovacs deleted the master-opal-check branch January 17, 2018 01:08
@pkovacs
Copy link
Contributor Author

pkovacs commented Jan 17, 2018

@ggouaillardet In your example the code operates correctly. The supplied prefix is not one of these four (/usr /usr/include /usr/local /usr/local/include) so you get what you asked for: /opt/foo. If you name the location, that's what you get. If you do not mention a prefix, then it will find the one in /usr/include first.

@ggouaillardet
Copy link
Contributor

are you talking of the first example ?
if yes, you are basically suggesting what configure --with-foo=/opt/foo might use headers in /usr/include and libs in /opt/foo/lib because this is what i asked for. in other words, i should have configure --with-foo=/opt/foo/include --with-foo-lib=/opt/foo/lib.

@pkovacs
Copy link
Contributor Author

pkovacs commented Jan 17, 2018

I was originally trying to address the issue that --with-foo=DIR did not work as it did not search DIR.

If there is a requirement that multiple installations of the same package, one in a standard location and another in a non-standard location be supported, well, I don't know what to say to that. Multiple installations should be installed with careful separation to avoid stuff like AC_CHECK_HEADERS finding the wrong one. e.g. /opt/foo/v1 and /opt/foo/v2 or using other separation techniques.

This issue is small and uninteresting -- is something broken now that wasn't broken before?

@ggouaillardet
Copy link
Contributor

That's what i have to say : we have to support the (not so uncommon imho) case when package foo is available both as a default package (e.g. with include files in /usr/include and libs in /usr/lib64) and as an alternate version in /opt (with include files in /opt/foo/include and libs in /opt/foo/lib).

@jsquyres @rhc54 if you disagree, please comment so we can end this conversation right away.

In order to use the alternate package, an end user used to configure --with-foo=/opt/foo and it did work as expected.

If i read correctly, now, configure (likely) successes but silently uses headers in /usr/include and libs in /opt/foo/lib. incidentally, it adds -I/opt/foo to the CPPFLAGS though there is no header file there.

Regardless how small and/or uninteresting it can be perceived, that qualifies as something broken now that wasn't broken before

In order to address the initial issue, i'd rather have --with-foo=DIR look for headers in one place only

  • DIR/include if that directory exists
  • DIR otherwise
    (we would miss a few corner cases, but as far as i am concerned, they are uncommon and the result of very poor admin practices)

@rhc54
Copy link
Contributor

rhc54 commented Jan 17, 2018

I think we are confusing two distinct questions that were addressed by two separate PRs. First, we discussed and agreed on the first PR that when --with-foo=DIR is given, we should always look in DIR first, and then in DIR/include if the headers were not first found in DIR. This was our practice in other areas of the configure logic - it was simply something we had missed in check_package.

Second, this PR addressed the problem of the user specifying a standard location for DIR. We have always avoided adding standard locations to the CPPFLAGS and friends as otherwise we can inadvertently pick up incorrect installations of other packages (e.g., hwloc and libevent).

The issue that was raised was that we already excluded /usr and /usr/local, but failed to exclude /usr/include and /usr/local/include - which are also standard locations. All this PR did was add those exclusions. It won't in any way interfere with the --with-foo=/opt/foo case. The check_package code will look in /opt/foo, not find the header, and then look in /opt/foo/include - where it will find the header, and hence add -I/opt/foo/include to CPPFLAGS. The only time that last step will not be taken is when the user specified /usr or /usr/local for DIR.

If I understand the originally expressed concern, @ggouaillardet was raising the question of executing AC_CHECK_HEADERS with CPPFLAGS not being set strictly to -IDIR - i.e., the standard locations remaining in CPPFLAGS. I concur - we need to change that to CPPFLAGS=-IDIR and then CPPFLAGS=-IDIR/include to avoid mistakenly picking up the headers from a standard location.

I fear that concern got lost in the confusion. It would seem a rather trivial change, but should better have been raised against the first PR (and not this one).

@ggouaillardet
Copy link
Contributor

@rhc54 i do not think you can tell the preprocessor (and hence nor the compiler nor AC_CHECK_HEADERS) not to use the default location, so if you CPPFLAGS=-IDIR, the compiler will look for headers in DIR and then /usr/include (and possibly /usr/local/include, your mileage may vary).
i just confirmed the issue on my VM with the hwloc rpm (headers in /usr/include) and my own hwloc (headers in /opt/hwloc/include). when i configure --with-hwloc=/opt/hwloc, i end up with CPPFLAGS=-I/opt/hwloc. and since there is no header in that directory, /usr/include/hwloc.h is used instead of /opt/hwloc/include/hwloc.h

@rhc54
Copy link
Contributor

rhc54 commented Jan 17, 2018

Well then, the user is hosed - I see no way to resolve that problem.

@ggouaillardet
Copy link
Contributor

strictly speaking, i concur, there is not a 100% bulletproof solution.
(iirc, we were already aware of that)

pragmatically speaking, i still think the approach i suggested previously is better and good enough

  • if DIR/include is a directory, then try AC_CHECK_HEADERS with CPPFLAGS=-IDIR/include
  • else, try AC_CHECK_HEADERS with CPPFLAGS=-IDIR

this is not ideal, for example

  • if the /opt/foo/include directory exists but the header file is /opt/foo/foo.h, then the user is hosed (e.g. /usr/include/foo.h is used)
  • if there is no /opt/foo/include and no /opt/foo/foo.h, then the user is hosed again (e.g. /usr/include/foo.h is used)

imho, these are two corner cases that are unlikely to be met in the real world, so i would not bother too much. also, this is still less intrusive than adding a --with-foo-incdir=DIR option everywhere.

on the bright side, we should be a bit faster since i expect most headers will be in DIR/include rather than DIR

by the way, the initial issue was there was no way to search for pmix headers in /usr/include/pmix.
i just realized, an inelegant (and untested) workaround is to configure CPPFLAGS=-I/usr/include/pmix.

@rhc54
Copy link
Contributor

rhc54 commented Jan 17, 2018

I confess - you have successfully made my head really hurt and totally confused me ☹️

We had a discussion about this before, and opted to go the direction that was implemented. You appear to be suggesting we go back again? It feels like we keep revisiting these issues ad nauseum, so I'm going to give up at this point - if @jsquyres feels like continuing this debate, I'll defer to him.

Just for @pkovacs sake - this entire discussion has absolutely nothing to do with this PR. It has to do with the prior one we committed last week.

@ggouaillardet
Copy link
Contributor

well, go back again would mean we only test -IDIR/include and never -IDIR.
And this is not what i am suggesting here.

for the records, this PR started as a fix of the one we committed last week per the report from @alex-mikheev . i have no issue with this PR itself, but i do not think it fixes the reported issue.

@alex-mikheev
Copy link
Contributor

As @ggouaillardet says this PR does not fix my original issue. His #4722 (comment) describes the problem very well. I run into the same problem when trying to build ompi with my private UCX library while having it also installed with /usr prefix.

What about testing if .h file actually exists in $dir_prefix and $dir_prefix/include before invoking AC_CHECK_HEADERS ? That way we can be reasonably sure that compiler will not pick the header file from a default location.

@pkovacs
Copy link
Contributor Author

pkovacs commented Jan 17, 2018

You guys are the owners and of course I defer to your final decision here. If you're trying to manage overlapping development library installations, one in a standard and one in non-standard location, I have to shake my gray hairs, however, and say: "You're doing it wrong."

Suppose for example, one program correctly compiles and links the non-standard library and it works properly. Great. Now suppose some other program on the same system uses dlopen to lift in the library. Which one did he load?

@bwbarrett
Copy link
Member

@pkovacs I might agree with you, except that every HPC system I've ever seen and most Linux systems at any corporate office I've ever seen are doing exactly that. There's a base library (libhwloc, for example) installed with your distro, which is ancient (because you're still running CentOS 6) and you need a recent version for anything to compile, so you just build it yourself. I agree it's insane, but it's also reality and at some point, I got too old to fight reality.

@alex-mikheev I thought we used to do that and stopped for some reason. It's probably worth a dive into the CHECK_PACKAGE commit history to make sure there isn't a comment about why. I could, of course be totally wrong there. Barring some lesson learned, that seems like a good idea. Even just checking if the directory exists would probably weed out the really common cases.

@rhc54
Copy link
Contributor

rhc54 commented Jan 17, 2018

The point that @pkovacs made is still correct, however - the fact that the directory and header file exist somewhere has no bearing on whether or not you got the corresponding library. So as was pointed out somewhere in this mess, you may well wind up with a mismatched header and library - which is why we went to check_package in place of just checking for header file existence in the prior commit.

@jsquyres
Copy link
Member

Let's take a step back.

I believe I am at fault here. Both this PR and #4683 stem from an incorrect comment that I made on #4683 (specifically: #4683 (comment)). I then got too busy and didn't notice where #4683 went until this morning. Sorry folks. 😢

Here's the situation:

  1. I incorrectly stated that:
$ export FOO_DIRr=/path/to/foo
$ ./configure \
    --with-FOO=$FOO_DIR/include \
    --with-FOO-libdir=$FOO_DIR/lib ...

should find headers in $FOO_DIR/include/pmix, thereby implying that we should be checking -I with exactly the value of $with_foo. This is wrong. As @ggouaillardet @bwbarrett and @alex-mikheev have tried to point out, we should not be checking -IDIR -- it leads to false positives and generally should not be expected behavior (IMHO).

  1. We've now gotten in a situation where we, the developers, a) are getting confused trying to explain what the current code does/does not do, and b) therefore have little hope of trying to explain the behavior to an end user. We should therefore take this as a signal to step back and re-examine our core assumptions and desired behavior.

I wonder if our current scheme is simply too complex, and the PMIX situation simply exposed a weakness that the current scheme is ill-equipped to handle without adding even more complexity. Perhaps we should move towards a simplification of our current scheme. Here's a proposal:

  1. Revert Fix DIR, DIR/include search for --with-pmix #4683 so that we get back to the old behavior / fix the cases that are now broken.
    1. The current PMIX situation can be solved thusly: ./configure CPPFLAGS=-I/usr/include/pmix --with-pmix ...
  2. Revert opal_check_package: filter /usr[/local]/include from CPPFLAGS  #4722. As Gilles cited, we do not have a robust way to discover what a given compiler's default include/library search paths are. We know that /usr is almost universal, so we always exclude that. But the other paths that were added are not universal, and we shouldn't exclude them.
    1. If someone can come up with a robust way to discover a given compiler's default include/library search paths -- awesome. Let's probe for them during configure and exclude those.
  3. Use --with[out]-FOO as an indicator as to whether the user wants FOO support (pretty much unchanged compared to how we use it today)
    1. If --with-FOO is specified, it means the user wants FOO support. If configure cannot provide FOO support, it should abort.
    2. If --without-FOO is specified, it means the user does not want FOO support.
    3. If neither --with-FOO nor --without-FOO is specified, then configure tries to include support for FOO if it finds it and skips it if it cannot.
  4. Add two new CLI options:
    1. --with-FOO-cppflags
    2. --with-FOO-ldflags
  5. Here's how OPAL_CHECK_PACKAGE's discovery will work:
    1. Search for <whatever.h>:
      1. If --with-FOO-cppflags was specified, add those to CPPFLAGS before checking for <whatever.h>
    2. Otherwise, if --with-FOO=DIR was specified, add -IDIR/include to CPPFLAGS before checking for <whatever.h>
    3. Otherwise, don't add anything to CPPFLAGS before checking for <whatever.h>
    4. Search for libwhatever:
      1. If --with-FOO-ldflags was specified, add those to LDFLAGS before checking for libwhatever.
    5. Otherwise, if --with-FOO=DIR was specified:
      1. Add -LDIR/lib to LDFLAGS and check for libwhatever.
      2. If not found, try again with -LDIR/lib64, instead.
    6. Otherwise, don't add anything to LDFLAGS before checking for libwhatever.
  6. Because we now have --with-FOO-cppflags and --with-FOO-ldflags, deprecate --with-FOO-libdir. I.e., if someone tries to use it, we might want to print a friendly error and abort (because they probably didn't realize we changed CLI options -- so do the friendly OMPI thing and tell them how to adapt to the new flags and abort in order to give them an opportunity to do so).

This does add new CLI options -- which is something we tried not to do a long time ago -- but I think we're in a complex-enough scenario these days that giving the override / "do exactly this" options is now the Right Thing to do when our -with-FOO=DIR processing isn't sufficient.

If we all agree, @bwbarrett has graciously volunteered to implement --with-FOO-cppflags and --with-FOO-ldflags (and deprecate --with-FOO-libdir) this week.

@rhc54
Copy link
Contributor

rhc54 commented Jan 17, 2018

👎 I honestly don't think what you propose is going to help the problem @pkovacs and I were trying to solve. However, I also don't have time to continue the debate.

I will revert the two commits tonight (have to run now), and would advise that you not make the proposed changes as it only further confuses things (though it is up to you as it is your code). Instead, once the reverts are complete, @pkovacs and I will implement a custom configure for external pmix instead of using check_package. This should let us resolve our problems independent of impacting everyone else and avoid the circular argument.

@ggouaillardet
Copy link
Contributor

+1 for reverting both commits

Then the problem @pkovacs and you are (will be ?) facing can be solved by

configure --with-pmix CPPFLAGS=-I/usr/include/pmix

It seems that in a kind of collective hallucination, we all jumped to the conclusion
that Open MPI could not be built with PMIx headers in /usr/include/pmix and had to be fixed, instead of
suggesting this simple solution.

if @jsquyres proposal is implemented, then

configure --with-pmix --with-pmix-cppflags=-I/usr/include/pmix

will solve the issue in a (imho) more elegant manner.

As far as i am concerned, we faced only one similar case and we handled it accordingly.
SLURM is known to install its headers in DIR/include/slurm instead of DIR/slurm, so OPAL_CHECK_PMI_LIB was designed/updated so it works out of the box.

I am not aware PMIx has any plans to install its headers in DIR/include/pmix (regardless it is a standard, a de facto standard, a given distro standard, some best practice, or whatever reason).
If it does, then i would advocate mimicking OPAL_CHECK_PMI_LIB.
Otherwise, and this is not a strong opinion, i am fine doing nothing in the generic configury part.
Obviously, you are free to do whatever you want in the external pmix configury part.

@rhc54
Copy link
Contributor

rhc54 commented Jan 18, 2018

Thank you for the input - we will do what is required so it only impacts PMIx...and get out of circular debates 😄

@pkovacs
Copy link
Contributor Author

pkovacs commented Jan 18, 2018

Using CPPFLAGS would work if the pmix check actually called AC_CHECK_HEADERS, but, once again, it just fiddles around with a manual ls command and fails miserably.

ls /usr/include/pmix
pmi2.h  pmi.h  pmix_common.h  pmix.h  pmix_rename.h  pmix_server.h  pmix_tool.h  pmix_version.h
./configure --with-pmix=external CPPFLAGS=-I/usr/include/pmix | grep -i pmix
checking if user requested external PMIx support(external)... yes
checking --with-external-pmix value... not found
configure: WARNING: Expected file /usr/include/pmix.h not found
./configure --with-pmix=/usr/include/pmix CPPFLAGS=-I/usr/include/pmix | grep -i pmix
checking if user requested external PMIx support(/usr/include/pmix)... yes
checking --with-external-pmix value... not found
configure: WARNING: Directory /usr/include/pmix/include not found

@rhc54
Copy link
Contributor

rhc54 commented Jan 18, 2018

Understood. Let's take this into a new PR that just works within the OPAL_CHECK_PMIX macro so we avoid having to worry about impacting anyone else. Would you like to create a new PR we can collaborate on? Or shall I create it?

@ggouaillardet
Copy link
Contributor

i will likely open a PR shortly (based on the initial PR)

@rhc54
Copy link
Contributor

rhc54 commented Jan 18, 2018

Please don't take offense, but it might be best if @pkovacs and I took this from here as the problem was first surfaced via the packagers and we are somewhat more familiar with the issues. This debate wound up being centered on only one of the issues, and probably the least important one.

@pkovacs
Copy link
Contributor Author

pkovacs commented Jan 19, 2018

The new approach outlined by @jsquyres in #4722 (comment) above is actually pretty darn good. It solves the problem generally. If a new m4 macro is implemented, i.e. a better version of OPAL_CHECK_PACKAGE, the package-specific checks could call this new macro with a package name, header file, library name, library function to find, cppflags, ldflags, etc. and get the results back. In the case of pmix, once we get the flags/status back from the new macro we can do the additional local checks to discern the pmix version etc.

The problem exists really for any of the dependent packages. I want to be able to install them anywhere and find them anywhere.

@jsquyres
Copy link
Member

@ggouaillardet @bwbarrett I was talking about this issue with @rhc54 today -- the issue is still unresolved, even though this PR is closed (without merging). I think we actually did come to more-or-less consensus on a new proposal for OPAL_CHECK_PACKAGE (and @rhc54 verbally agrees with me). Would either of you two have any time to make a PR based on #4722 (comment)?

@ggouaillardet
Copy link
Contributor

@jsquyres i will take a crack at it !

@jsquyres
Copy link
Member

Thanks @ggouaillardet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants