Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable runtime-enabled configure CLI params (to pass to PRRTE) #7486

Closed
jsquyres opened this issue Feb 27, 2020 · 17 comments · Fixed by #8409
Closed

Re-enable runtime-enabled configure CLI params (to pass to PRRTE) #7486

jsquyres opened this issue Feb 27, 2020 · 17 comments · Fixed by #8409
Assignees

Comments

@jsquyres
Copy link
Member

The master README currently still describes the following configure CLI params:

  • --enable-orte-static-ports
  • --with-alps
  • --with-lsf[-libdir]
  • --with-pmi
  • --with-slurm
  • --with-sge
  • --with-tm

With the exception of --with-pmi (which is already handled) and --enable-orte-static-ports (see below), they others don't seem to exist in OMPI's ./configure --help output any more. Some plumbing needs to be put in to support this kind of functionality in some manner (e.g., just pass them directly through to PRRTE's configure, or ...something else...).

I don't know if PRRTE supports static ports like ORTE did, but if so, that one should probably be plumed through, too. If not, --enable-orte-static-ports should probably be added to ompi_deleted_options.m4.

In Slack discussions, @jjhursey volunteered to look at this.

@jjhursey
Copy link
Member

jjhursey commented May 5, 2020

I think this needs to be a blocker for v5. Otherwise, I don't know how you compile in support for things like Slurm and LSF into PRRTE from the OMPI configure.

This fell off my radar, but I'll try to get to it this week.

@gpaulsen
Copy link
Member

gpaulsen commented May 5, 2020

Brian said to wait until the branch he posts later today.

@jjhursey
Copy link
Member

Ref #8384
I think we might be ready to tackle this? I was waiting on an all clear from Biran. I'll bring it up on today's call.

@jjhursey
Copy link
Member

I got the all-clear from Brian that nothing more is expected. I have a branch in progress for this that I hope to have up this week.

@jjhursey
Copy link
Member

Problem

  • We need to automatically pull in all/most of the configure options for each of the 3rd party projects.
  • Hardcoding such a list is not sustainable for a variety of reasons.
  • The 3rd party might add a configure option that conflicts with an OMPI option or another 3rd party's configure option.
  • We cannot force a 3rd party to adjust their configure options based on our needs especially because we may be using already released versions of the 3rd party.
  • The user should know which options are passed to which 3rd party library where it is important.
  • Backwards compatibility with old configure options would be nice to have.

Solution

  1. Harvest the --enable/--disable and --with/--without configure options from the m4.
    1. There will be some options that OMPI knows that it should not harvest either because it cannot be well supported, is libtool specific, or it is handled in a special way internal to OMPI's configure for that 3rd party. So OMPI should have an excluded list of files and options that should be excluded from harvesting both in general and for each specific 3rd party project.
    2. OMPI will take care not to define an OMPI option that, when passed to a 3rd party, will confuse it. IN the case that is unavoidable it should be handled in OMPI's 3rd party specific m4 file before calling the 3rd party configure.
    3. While harvesting be sure to only collect the first two arguments (variable and help string) so as not to pick up any actions that might otherwise harm OMPI's configure (e.g., change global compiler flags).
    4. For each 3rd party configure option add a project specific prefix to the option. For example, for the project foo we would change it's --with-abc option to --with-foo-abc for OMPI's configure. Users would specify the latter (--with-foo-abc) to the OMPI level config so that they know they are setting it for that specific 3rd party.
    5. Automatically generate a m4 file for each 3rd party project with these harvested options and include the contents of that m4 in OMPI's configure.ac.
  2. At configure time, for each 3rd party project m4 processes the configure arguments to strip off the prefix before passing it down to the 3rd party's configure. Changing --with-foo-abc to --with-abc for that 3rd party.
    1. Note that all configure options are passed to the 3rd party not just those with the prefix. This allows OMPI to have some global configure options that all 3rd party projects will see the same way (for example --enable-debug).
    2. The true configure list (stored in ac_configure_args) is restored after the 3rd party is finished processing.

Left discuss/complete

  1. Do we want backwards compatibility with old OMPI configure options (e.g., --with-slurm)?
    • In this new model, for example, --with-slurm is a PRRTE configure option so the user would have to specify --with-prrte-slurm at the OMPI configure line which would still pass --with-slurm to PRRTE (so PRRTE does not need to change).

@jjhursey
Copy link
Member

I created a draft PR in #8409 with the design above.
The exclude file needs to be refined a bit more and hwloc/libevent need to be re-enabled in autogen.pl, but otherwise it is functional if you want to play around with it for PMIx/PRRTE.

@bwbarrett
Copy link
Member

I think we should kill the PR. I think Jeff's original issue is incorrect. THe right way to handle this in Autoconf is to use --help=recursive. All those options still have meaning to recursive packages, and there is an existing, Autoconf-approved way of seeing them today, without all the extra code.

I also really want to understand the renaming idea; that seems very bad, and we intentionally didn't do that when the third party patches were written, because they only lead to madness of package selection differing between 3rd party packages (and OMPI).

@rhc54
Copy link
Contributor

rhc54 commented Jan 23, 2021

I wouldn't do the renaming - I agree with that concern.

The problem is that the world out there (a) doesn't know or care that OMPI suddenly added submodules, and therefore (b) doesn't know that they now have to do something different than normal for discovering the configure options. We had questions coming in (some actually from Amazon!) about OMPI suddenly removing RTE options and concerns that we had arbitrarily dropped support for various HPC environments.

It's easy to understand why people would think that - if you do what you have done for nearly 20 years, those options are all gone. If you created any automation for configuring OMPI (and some people, like the labs and packagers, have done so), then that automation is suddenly failing due to lost configure options.

So I don't know the right answer here. I understand your objection, but that may not be what users will accept. Regardless, I don't really care as this is an OMPI issue - my PR was solely a stopgap to respond to the concerns coming (from Amazon as well!) about lost options and discontinued support.

@jjhursey
Copy link
Member

  • It's (very much) too bad that this objection was not raised during the community teleconf, when I updated the ticket to say I was working on it, nor when I asked Brian for the all-clear to start on it. It would have saved me a couple of valuable days of work that, in retrospect, I should have spent on other things. I'll factor that in the next time I take an issue from the community to work on. :(
  • The prefixing (renaming) of configure options was something that I added as an option to the perl script. We can drop the -m CLI option and it goes away. That was something I wanted to bring to the community for discussion for sure.
    • The assumption is that we cannot control the addition/modification/deletion of configure options from 3rd party packages. Additionally, a user can update the 3rd party package after checking out the ompi repo (developers already do that with submodule updates today) thus picking up options that the base OMPI logic doesn't know about.
    • It's possible/likely that we pick up a --enable-awesome-new-feature option from a 3rd party package that users will anticipate will alter the functionality of OMPI in general. When it doesn't they are confused (it is an OMPI option after all). Worse if this configure option conflicts with one of the same name in OMPI.
    • We could add aliases for the old configure options turning --with-slurm into --with-prrte-slurm in the background. This does not need to be a deprecation if OMPI wants to support it long term. It becomes somewhat brittle if the 3rd party package changes/removes that option, but it should be fairly manageable.
    • Those are the arguments for renaming. I understand the arguments against it as well. I'm fine either way, but the community should discuss this so they understand the pros/cons. We are at a major release point and changing a ton of other things that are way more user-facing (i.e., How you pass MCA options for PMIx/PRRTE/OMPI differently, dropping/renaming a fair number of the CLI options to mpirun). So if we wanted to make this change to configure now would be the time.
  • Telling users that they are using ./configure --help wrong seems strange to me. Is there a way to make -configure --help automatically translate to ./configure --help=recursive behind the scenes?

@bwbarrett
Copy link
Member

To be clear, I said that I was not working on anything in the 3rd party code right now, not that I thought this was a good idea. I thought I made it pretty clear I had no idea what was going on, other than seeing my name go by in a proposal.

I'm one person, who raised objections to the PR because it's not how Autoconf expects packages to work. Maybe we still should do the argument forwarding. I tend to disagree, but we should have that discussion, not throw up our arms.

I do have very strong feelings about argument renaming. While I understand your points, I think 1) that we need to be shooting for less third party packages over time (within 1-2 years, we should be able to remove libevent, for example), 2) that we do need to think about feature propagation carefully, outside of argument naming, and 3) that the risk of someone building with conflicting SLURM arguments because of our renaming far outweighs the small potential benefits of the feature.

We do need to look at your question about ./configure --help=recursive. I don't know off the top of my head. I do know that that is how Autoconf expects this situation to be handled.

@jjhursey
Copy link
Member

There are a small number of people familiar enough with the build system to do this work, and thus have opinions about how it should function. All I ask is that you take care in the future either directly or indirectly green lighting a ticket like this, especially if there is more context to add (i.e., your prior conversation with Ralph about this topic) so folks don't waste time. It was not clear to me that you had "no idea what was going on".

To aid in the discussion, there are two technical points being discussed here:

  1. Should we manually slurp in all 3rd party configure options to the OMPI level so they are visible in ./configure --help or should we tell users to just use ./configure --help=recursive
    1. Someone (not me) should investigate if we can make --help automatically mean --help=recursive
    2. The --help=recursive has the advantage that it adds a separate section for each 3rd party package (well only pmix and prrte right now - i don't know why hwloc and libevent do not show up. Someone, not me, should investigate). Their sections are prefixed in --help with a line like Configuration of prte gitclone:. In contrast, by slurping in the options they appear alongside all of the other OMPI options.
  2. Should we prefix the configure options from 3rd party packages (e.g., --with-slurm becomes --with-prrte-slurm) to distinguish them from main OMPI options? This has the potential benefit that only those options targeted at the 3rd party get passed to them (so an option for package X is not also passed to package Y).

Ultimately I don't care if the community doesn't want to take the direction seen in PR #8409 - the best solution should be taken no matter what. What the community must do is outline explicitly what they want to do here before someone else wastes effort on a solution.
@gpaulsen Please add this to the agenda for the teleconf - it might be good if the RMs discuss this in their meeting beforehand to come with ideas.

@jjhursey
Copy link
Member

jjhursey commented Feb 3, 2021

Per Teleconf Feb. 2, 2021:

  • Brian sent an email to the mailing list here
  • The consensus on the call was
    • Do not rename the configure options for the 3rd party libraries (e.g., --with-slurm becomes --with-prrte-slurm)
      • Rationale: This exposes the internal build structure of OMPI to users who should not care about this detail. If they do need to configure a 3rd party library differently than OMPI will by default (passing different configure options) then they can build them externally and point OMPI to them. Do not complicate the build system for this more advanced scenario.
    • Do slurp in the configure options from 3rd party libraries into the top level configure so they display by default with OMPI's ./configure --help
    • Do have an option to the slurp script to skip some files and options from specific packages from being picked up. This will help avoid duplicates and provide a mechanism to skip any option we want to handle differently at the OMPi level.
    • Investigate how configure handles duplicate options being registered (e.g., --disable-dlopen). Does it show up multiple times or just once.
    • (Aspire Investigation) There was some discussion of adding a (From PMIx) suffix, for example, to the help string for options we pick up from 3rd party libraries. No clear consensus on if we want that or not.
  • Rationale:
    • Users should not care that we reorganized the OMPI library by default.
    • Without slurping in these options, if the user passes --with-slurm they are likely to see configure warnings about an unrecognized option at the top level even though it has been recognized by the 3rd party library. This will confuse users. By making these 3rd party configure options valid at the OMPI level this will avoid that situation. Note that 3rd party libraries may still emit that warning when their specific configure is run since we pass the options to all of the 3rd party libraries equally.

@jjhursey
Copy link
Member

jjhursey commented Feb 9, 2021

Per Teleconf Feb. 9, 2021:

  • Do not add the (From PMIx) suffix. Since there might be options with the same name from multiple packages. Sorting out the duplication is more effort than it is worth right now. There are autogenerated .m4 files if we really need to track it down.
  • Investigate duplicate options. Some reports that options are shown multiple times, others think that autogen should be removing duplicates.
  • @jjhursey committed to finishing PR Extract 3rd party configure options for OMPI configure #8409 with these commands before we branch for v5 at the end of Feb.

@jjhursey
Copy link
Member

jjhursey commented Mar 3, 2021

I investigated the duplicate configure options. If a configure option is declared more than once then it appears more than once in the configure --help output.

For example (top from OMPI's config and bottom from OpenPMIx's config):

  --enable-debug          enable developer-level debugging code (not for
                          general MPI users!) (default: disabled)
...
  --enable-debug          enable developer-level debugging code (not for
                          general PMIx users!) (default: disabled)

I will need to work on some deduplication.

@bwbarrett
Copy link
Member

@jjhursey I was curious why this wasn't a problem with all the places we call OMPI_CHECK_PACKAGE for multiple components. You'd expect if we don't dedupe to see three --with-ucx help strings, for example.

Reading the autoconf m4 code, the dedupe is only if the entire help string is identical. In this case, Autoconf didn't dedupe because the description is different ("not for general MPI users" vs. "not for general PMIx users"). That probably makes your job harder, but that's what Autoconf does...

@jjhursey
Copy link
Member

jjhursey commented Mar 4, 2021

Yeah, I noticed it wasn't 100% of the time that it duplicated. For example PRTE's --enable-debug looks like:

  --enable-debug          enable developer-level debugging code (not for
                          general MPI users!) (default: disabled)

Which matched OMPI's (something we should fix in PRTE), but it did not show up twice. Only the OMPI and OpenPMIx versions. I did not deep dive into why though. Thanks for figuring that out.

Checking the help strings does make the parsing a bit trickier, but possible if needed. However, if we have multiple options marked with AC_ARG_ENABLE([debug] but with different help strings then we have a duplication regardless of the help string.

@bwbarrett
Copy link
Member

Yes, you're right, just deduping on the AC_ARG_ENABLE first option seems like the right thing to do here, particularly if that's easiest for you. I don't think the corner cases of descriptions really matters.

Mostly, your statement about deduping was an itch that I had to scratch :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants