Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about mca parameter passing #1731

Closed
wckzhang opened this issue Apr 11, 2023 · 30 comments
Closed

Question about mca parameter passing #1731

wckzhang opened this issue Apr 11, 2023 · 30 comments

Comments

@wckzhang
Copy link
Contributor

Using 20ee752 for PRRTE.

I've been messing around with the ompi schizo component in order to add some warning/aborts around the old --mca mca_base_param_files parameters. However it looks like the arg is dropped before we get to the mca parameter parsing. See my garbage parameters:

[ec2-user@ip-10-0-0-28 runtime]$ ~/tmp/ompi/install/bin/mpirun --np 2 -N 1 --mca mca_garbage_param 1 --mca opal_mca_garbage_param 1 --mca garbage_param 1 --hostfile ~/hostfile /bin/true
mca p1 is: opal_mca_garbage_param
mca p1 is: garbage_param

It looks like parameters prefixed with "mca" are being dropped. Do you have any idea what's happening here or if this is intended behavior?

@rhc54
Copy link
Contributor

rhc54 commented Apr 11, 2023

Don't look at me - I haven't looked at the ompi schizo component in a very long time. Command line parsing starts with your parse_cli function.

@wckzhang
Copy link
Contributor Author

wckzhang commented Apr 11, 2023

It looks like it's because the options becomes --pmixmca mca_garbage_param 1
It looks like the ompi_schizo component does not look at this prefix (pmixmca), I can add another loop for it

@wckzhang
Copy link
Contributor Author

So it turns out that --mca mca_base_param_files is not ignored, it was just that pmixmca prefixes were all ignored. The relevant code is all in place

@wckzhang
Copy link
Contributor Author

Actually let me ask a clarifying question: Should the parameter be prefixed with pmixmca and should this be read by the ompischizo component? (This prefixing seems to happen somewhere before parse_cli is called)

@rhc54
Copy link
Contributor

rhc54 commented Apr 11, 2023

You have a problem. PRRTE and PMIx share a common MCA "base", and so any generic MCA param on the cmd line gets interpreted as belonging to PMIx. Those get converted at the very beginning of prte as we have to process the PMIx and PRRTE params prior to calling their init functions.

Problem you have is that this means all generic MCA params that begin with mca are going to be converted to PMIx. Only solution I can see is to mandate that PMIx MCA params that belong to the MCA "base" must be prefixed with pmix or prte on the cmd line. I'll have to think about that as this feels more like an OMPI problem for failing to prefix their base params, but I suspect your side of the house will strenuously object to having to change.

@rhc54
Copy link
Contributor

rhc54 commented Apr 12, 2023

Tentative proposal: we reject all generic --mca params. All params must be prefixed by their intended project. It's the only solution I can come up with that doesn't unfairly penalize one project over another.

Alternative option: apply that only to params that start with mca_ as they are ambiguous.

@rhc54
Copy link
Contributor

rhc54 commented Apr 12, 2023

@jsquyres @bwbarrett Any thoughts? You two generally care the most about such things.

@rhc54
Copy link
Contributor

rhc54 commented Apr 20, 2023

Alternative option: we reject all generic params that start with mca_ as they are ambiguous.

In the absence of any feedback, I'm going to implement this option. Please note that this applies to more than just the mca_base_param_files option.

@naughtont3
Copy link
Contributor

naughtont3 commented Apr 20, 2023

@rhc54 There was discussion on this point during weekly ompi-dev call, but several key people were out due to travel/conflicts. So not sure if there was any consensus. I'll defer to @wckzhang & others , but just wanted to chime in just in case.

@wckzhang
Copy link
Contributor Author

Yes, I've been trying to get @jsquyres or @bwbarrett to chime in but they've been busy. I've seen push back from George and Edgar about the change as it would break lots of user scripts. George proposed everyone just check every argument, and there was some debate about why pmix prefixed an --mca mca_ as --pmixmca, we could leave it generic and have each component check to see if they can handle it.

I didn't want to make a decision either way without more consensus from our community, but I have not forgotten about this issue.

@rhc54
Copy link
Contributor

rhc54 commented Apr 21, 2023

about why pmix prefixed an --mca mca_ as --pmixmca, we could leave it generic and have each component check to see if they can handle it.

I'm afraid that isn't possible. We only change it for the MCA base params, and you cannot have those generally apply as the various layers can't discern which layer is the intended target. It winds up generating a great deal of confusion.

I'm probably not available this coming Tues, but I could participate on the OMPI call the week after if it would help.

@jsquyres
Copy link
Contributor

Sorry for the huge delay in replying to this; my bandwidth has been sucked up elsewhere. 😦

Hey @rhc54 Could we just add mca_base_param_files to the list of prefixes that are examined by OMPI schizo to know that this parameter belongs to OMPI? We already pass in a list of all of Open MPI's frameworks via the OMPI_MCA_PREFIXES env variable, which is then examined by

static void setup_ompi_frameworks(void)
{
if (ompi_frameworks_setup) {
return;
}
ompi_frameworks_setup = true;
char *env = getenv("OMPI_MCA_PREFIXES");
if (NULL == env) {
return;
}
// If we found the env variable, it will be a comma-delimited list
// of values. Split it into an argv-style array.
char **tmp = PMIX_ARGV_SPLIT_COMPAT(env, ',');
if (NULL != tmp) {
ompi_frameworks = tmp;
}
}
static bool check_generic(char *p1)
{
setup_ompi_frameworks();
/* See if the parameter we were passed belongs to one of the OMPI
frameworks or prefixes */
for (int j = 0; NULL != ompi_frameworks[j]; j++) {
if (0 == strncmp(p1, ompi_frameworks[j], strlen(ompi_frameworks[j]))) {
return true;
}
}
return false;
}

Having OMPI add mca_base_param_files to the comma-delimited list would effectively disambiguate that name and make it "owned" by OMPI (we can/probably should also do the same thing to mca_base_env_list, which is already special-cased -- and potentially any other mca_-prefixed OMPI variable...?).

However, I notice that by the time OMPI schizo is invoked, any --mca CLI tokens have been converted to --pmixmca, making at least the current logic ineffective

if (NULL != (opt = pmix_cmd_line_get_param(results, "mca"))) {

@rhc54
Copy link
Contributor

rhc54 commented Apr 25, 2023

The problem is that PMIx has a duplicate of every MCA base parameter, so you cannot disambiguate them. Root problem is that the MCA base doesn't prefix its params by project, and we now have multiple projects that contain an MCA base.

We do look for frameworks and convert them - no ambiguity there as we already ensure that the params are properly prefixed with the framework/component name. It is just the MCA base that is the problem, and only when the user doesn't prefix the --mca cmd line option.

One possible solution is to say that non-prefixed --mca shall be reserved for OMPI. I can kinda wrap my head around it from the standpoint that direct PRRTE users are outnumbered by pre-existing OMPI users (I don't worry about PMIx users as we are talking about prterun, a PRRTE tool here). Just concerned that it creates more confusion as you can use a non-prefixed cmd line option for PMIx/PRRTE frameworks/components - but now we say you can't for their MCA base.

Probably no ideal solution 🤷‍♂️

@bosilca
Copy link
Contributor

bosilca commented Apr 25, 2023

Depends on what perspective you are looking from. From a user perspective what is not ideal is to be forced to learn the internal naming scheme of a software I am using. I am passing an argument to OMPI, and everything behind shall obey it. If PRRTE wants to have their own parameter naming other than --mca their problem, but when I provide a file with the MCA parameters it shall be read and used by the entire software stack.

@rhc54
Copy link
Contributor

rhc54 commented Apr 25, 2023

So this problem has two parts:

  • how to deal with --mca on the cmd line? Like I said, we have no problem with framework/component level params. The issue is what to do with the MCA base params. You have to understand how the MCA system works - i.e., cmd line options get translated into envars, and it is only envars that get processed.
  • how to process param pre-existing param files? People specified ORTE params in those too, and now they will be ignored. Probably need to pass the contents of such files thru the param converter to get them properly prefixed.

Another issue, I suppose, is that we lost the "source" tag for a param - i.e., all our params get converted to envars, and so the MCA system really doesn't know how the envar got set. I'm not sure there is a simple way to resolve that one, but it might merit some thought as it can be useful to know how a param got set.

@rhc54
Copy link
Contributor

rhc54 commented Apr 25, 2023

There is a fourth problem: we don't currently intercept and translate envar params. For example, if someone sets OMPI_MCA_oob_tcp_if_include=foo in their environment, it will currently be ignored instead of transparently translated it to PRTE_MCA_oob_tcp_if_include. It does happen for the cmd line, but not for anything placed in the environment.

@jsquyres
Copy link
Contributor

We chatted on the OMPI webex about this today and came up with plans for this. I posted the info on the wiki: https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20230425#pmix-mca-parameter-issues

@rhc54 Let me know if my text accurately reflects what we discussed.

@rhc54
Copy link
Contributor

rhc54 commented Apr 28, 2023

Looks accurate, but incomplete - you missed the fourth problem, that we don't currently translate the environment variables. However, the fix for the param file problem will solve the envar one as well, so the text is probably fine as-is.

@qkoziol
Copy link
Contributor

qkoziol commented Jun 22, 2023

I've been reviewing the code, in order to address this issue. I see that the fix that @rhc54 merged into PMIx is exposing parameters that start with "mca_base_" and not parameters that start with just "mca_". I believe that the intention was to expose the latter (which would include the former). @jsquyres - Is that a correct reading of your notes on the wiki?

@qkoziol
Copy link
Contributor

qkoziol commented Jun 22, 2023

Looking into the tuned file aspect of it now

@rhc54
Copy link
Contributor

rhc54 commented Jun 22, 2023

Is that a correct reading of your notes on the wiki?

Sigh - I feel like this has become the never ending treadmill of explanations. Let's try one more time.

The MCA component parameters are already dealt with - there is no ambiguity there as we know what framework they belong to, and which project includes that framework. So we can trivially expose those and already did so.

The same is not true for the "mca_base_" params as they all refer to the MCA base, and we have multiple projects with their own MCA bases. So we needed a way of resolving them, which is what I provided in the referenced fix.

The tuned file support already had that implementation (minus the "mca_base_" problem), so it didn't need changing.

@qkoziol
Copy link
Contributor

qkoziol commented Jun 22, 2023

So, just the tuned files need updating?

@qkoziol
Copy link
Contributor

qkoziol commented Jun 22, 2023

Or, is this completely finished now, and I can close the issue?

@qkoziol
Copy link
Contributor

qkoziol commented Jun 22, 2023

This is the current behavior, for command line parameters:

dev-dsk-qkoziol-2b-7eef74db % ./mpirun --np 1 -N 1 --mca mca_garbage_param 1 --mca opal_mca_garbage_param 1 --mca garbage_param 1 --mca mca_base_garbage_param 1 --hostfile ../../hostfile /bin/true
parse_cli:438 - pargv[1] = '--np'
parse_cli:438 - pargv[2] = '1'
parse_cli:438 - pargv[3] = '-N'
parse_cli:438 - pargv[4] = '1'
parse_cli:438 - pargv[5] = '--pmixmca'
parse_cli:438 - pargv[6] = 'mca_garbage_param'
parse_cli:438 - pargv[7] = '1'
parse_cli:438 - pargv[8] = '--mca'
parse_cli:438 - pargv[9] = 'opal_mca_garbage_param'
parse_cli:438 - pargv[10] = '1'
parse_cli:438 - pargv[11] = '--mca'
parse_cli:438 - pargv[12] = 'garbage_param'
parse_cli:438 - pargv[13] = '1'
parse_cli:438 - pargv[14] = '--mca'
parse_cli:438 - pargv[15] = 'mca_base_garbage_param'
parse_cli:438 - pargv[16] = '1'
parse_cli:438 - pargv[17] = '--hostfile'
parse_cli:438 - pargv[18] = '../../hostfile'
parse_cli:438 - pargv[19] = '/bin/true'
parse_cli:438 - pargv[1] = '--np'
parse_cli:438 - pargv[2] = '1'
parse_cli:438 - pargv[3] = '-N'
parse_cli:438 - pargv[4] = '1'
parse_cli:438 - pargv[5] = '--pmixmca'
parse_cli:438 - pargv[6] = 'mca_garbage_param'
parse_cli:438 - pargv[7] = '1'
parse_cli:438 - pargv[8] = '--mca'
parse_cli:438 - pargv[9] = 'opal_mca_garbage_param'
parse_cli:438 - pargv[10] = '1'
parse_cli:438 - pargv[11] = '--mca'
parse_cli:438 - pargv[12] = 'garbage_param'
parse_cli:438 - pargv[13] = '1'
parse_cli:438 - pargv[14] = '--mca'
parse_cli:438 - pargv[15] = 'mca_base_garbage_param'
parse_cli:438 - pargv[16] = '1'
parse_cli:438 - pargv[17] = '--hostfile'
parse_cli:438 - pargv[18] = '../../hostfile'
parse_cli:438 - pargv[19] = '/bin/true'

@qkoziol
Copy link
Contributor

qkoziol commented Jun 22, 2023

The first MCA parameter ("mca_garbage_param") is getting the "pmixmca" component prefix, but the others are not. Is this correct behavior?

@qkoziol
Copy link
Contributor

qkoziol commented Jun 22, 2023

Also, is there a document that describes the overall (intended) behavior?

@rhc54
Copy link
Contributor

rhc54 commented Jun 22, 2023

What @jsquyres wrote was an accurate description of what they wanted to do - much of it was already implemented. The referenced PR addressed the remaining pieces. The only piece not yet addressed is what to do with envars set by the user prior to invoking an executable.

@rhc54
Copy link
Contributor

rhc54 commented Jun 22, 2023

@qkoziol What the blazes are you talking about? Are you quoting the current behavior on OMPI v5??? If so, try updating the submodule pointers before generating outdated reports.

@qkoziol
Copy link
Contributor

qkoziol commented Jun 22, 2023

@qkoziol What the blazes are you talking about? Are you quoting the current behavior on OMPI v5??? If so, try updating the submodule pointers before generating outdated reports.

That output is from a fresh checkout.

@rhc54
Copy link
Contributor

rhc54 commented Jun 22, 2023

That output is from a fresh checkout.

OF WHAT BRANCH?

Truly, I'm going to close this issue as it has passed beyond being useful and is just an irritation.

@rhc54 rhc54 closed this as completed Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants