-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI_Comm_spawn inherit job options from the original job #5376
Comments
Yeah, this has always been a debatable point. Here is the relevant code (recall that if (NULL == jdata->map->ppr && NULL != orte_rmaps_base.ppr) {
jdata->map->ppr = strdup(orte_rmaps_base.ppr);
}
if (NULL != jdata->map->ppr) {
/* get the procs/object */
ppx = strtoul(jdata->map->ppr, NULL, 10);
if (NULL != strstr(jdata->map->ppr, "node")) {
pernode = true;
} else {
pernode = false;
}
} else {
if (orte_rmaps_base_pernode) {
ppx = 1;
pernode = true;
} else if (0 < orte_rmaps_base_n_pernode) {
ppx = orte_rmaps_base_n_pernode;
pernode = true;
} else if (0 < orte_rmaps_base_n_persocket) {
ppx = orte_rmaps_base_n_persocket;
persocket = true;
}
}
if (0 == jdata->map->cpus_per_rank) {
jdata->map->cpus_per_rank = orte_rmaps_base.cpus_per_rank;
} You can see that we apply the MCA params given at the start of the job unless you override them. However, it is a one-to-one process - i.e., you can change the value of a specific MCA param directive, but you can't turn it "off". So I guess the questions are: do MCA params only apply to the initial launch? Is that true for all MCA params (e.g., does it include BTL directives)? If only some, then which ones? Does the user decide, and if so, how do they tell us? |
Let's assume my app is created by 2 different services and that I need to start my app in 2 phases, first a set of processes (one per node) that will act as managers, and then additional processes equitably divided among the available nodes. The npernode for the original mpirun is convenient, so that I don't really need to know how many nodes my allocation has (they are automatically extracted from the RM). But then I can't figure out how to start my second set of processes without handling HWLOC information in my application, and then messing around with MPI view of the resources. I see your concern about the scope of the original mpirun parameters. Personally, I think the mpirun parameters should only apply to the original app, while those in the MCA configuration file must be global. More precisely, we should treat all MCA parameters not from configuration files as equal and provide a mean either to clean the environment for spawned applications so that the user can populate the new environment with only thee informations necessary for the new job, or to inherit all MCA parameters from the original job. Going one step further, when spawning new processes, we should not only be able either to add more processes to the current allocation (current behavior), but also request for a new allocation and spawn the new processes directly there. I am not sure how we can mix these two together yet, but if we want to provide generic dynamic processes support we need to support for all cases. |
I grok your suggestion about the mpirun cmd line params, and it would be relatively easy to remove those from the environment passed to the child processes. Solving your immediate problem, however, only requires that we not apply params related to launch (mapping, ranking, etc.) to dynamically spawned apps unless directed to do so. This would be a trivial change. How do we get the community to bless it? I agree with your "one step further", and PMIx v2 supports that request. Problem is that we don't yet have an RM that supports the |
The community that would be impact by such a change is minimal, as there are right now very few users of the dynamic processing capabilities. We can bring this up during one of our weekly calls to see what the rest of the community thinks about. For the addition of |
We talked about this on today's telecon and decided on a "first step" for OMPI v4.0 which branches at the end of this week. I'll add a new MCA param and cmd line option to indicate if launch directives are to be inherited or not (default to not) and then modify ORTE accordingly. This will affect the map-by, rank-by, bind-to, npernode, pernode, npersocket, persocket, and cpus-per-rank directives. I'll review the code and report any others I can identify that fit in this category. The broader issue of inheritance got too thorny to resolve in time for the OMPI v4.0 branch - we'll deal with those later. |
👍 |
oops, sorry for closing. |
In looking at it, I wonder if the |
Committed to v4.0.x |
Thank you for taking the time to submit an issue!
Background information
Processes started with MPI_Comm_spawn inherit [all] parameters from the original job. As an example, if the original job was spawned with "-npernode 1", all future dynamic processes carry the same constraint. I could not figured out a way using MPI info keys to clean the environment to remove the constraints.
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
Original issue has been discovered in master, and can be replicated on all 3.x branches (and certainly in older versions but I haven't checked).
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Developer installation, aka. git clone followed configure and make, with --enable-debug.
Please describe the system on which you are running
Issue can be replicated in multiple environments, with and without RM, and with multiple networks.
The text was updated successfully, but these errors were encountered: