Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command line issues #10698

Closed
2 of 14 tasks
awlauria opened this issue Aug 22, 2022 · 29 comments
Closed
2 of 14 tasks

Command line issues #10698

awlauria opened this issue Aug 22, 2022 · 29 comments
Milestone

Comments

@awlauria
Copy link
Contributor

awlauria commented Aug 22, 2022

Here's an overview of issues @gpaulsen found when testing ompi command line options.

NOTE: Box's should be checked as complete when fixes find their way to their release branches (openpmix v4.2/prrte v3.0).
As they come in feel free to edit in links to PR's in this comment.

      mpirun --ppr 1:core --host hostA ./hw.x                        
      --------------------------------------------------------------------------
      There are not enough slots available in the system to satisfy the 44
      slots that were requested by the application:

        ./hw.x

Removing the --host and it works, or specifying hostA:44.

  • rankfile - some known issues already reported, needs verification
  • --cpu-set/--cpu-list Works - somewhat. If you ask for more ranks than cpus
    you get:
    PRTE ERROR: Unable to map job in file rmaps_rr.c at line 184
     even with --oversubscribe
    
@rhc54
Copy link
Contributor

rhc54 commented Aug 23, 2022

--display-devel-allocation, should map to --display devel-map

Actually, it should map to --display alloc-devel. The simpler allocation output is --display alloc

--display-topo should map to --display topo

Correct - should work. I can check it

--merge-stderr-to-stdout does not seem to work as expected.

I'll check it - it was working last I checked. Maps to --output merge

--do-not-launch does not work as expected (processes are launched)

Hmmm...I've been using it extensively and it works fine, so probably just a translation issue. Maps to --map-by :donotlaunch, but may get moved to the new --controls xxx option as that is a better fit.

--show-progress, doesn't seem to do anything. Removal candidate?

The code is still there, but I'll have to look at why it isn't working

--stream-buffering - Can't tell if this is working as expected

Code is unchanged, so I assume it is still working. Perhaps @jjhursey can check (since he wrote it)?

--ppr - not well documented.

Yeah, that option is way old. I suspect it isn't being mapped correctly.

--cpu-set/--cpu-list Works - somewhat. If you ask for more ranks than cpus you get...

Actually, cpu-list seems to be working just fine if correctly translated. I suspect the problem is in translation. Remember, --cpu-list is now mapped to --map-by pe-list=a,b,c and there is the ordered qualifier to take into account. Don't know about cpu-set - I'm not sure PRRTE has an equivalent to that one other than pe-list. Perhaps the difference lies in the qualifier (cpu-set => nonordered and cpu-list => ordered)?

@awlauria
Copy link
Contributor Author

--display-devel-allocation, should map to --display devel-map

Actually, it should map to --display alloc-devel. The simpler allocation output is --display alloc

It appears this option no longer exists:

--------------------------------------------------------------------------
The specified display directive is not recognized:

  Directive: alloc-devel
  Valid directives: allocation:map:bind:map-devel:topo

does it need to be re-implemented or should it just be removed?

@jjhursey
Copy link
Member

We should also double check the $HOME/.prte/mca-params.conf and $HOME/.openmpi/mca-params.conf per the note below:

@rhc54
Copy link
Contributor

rhc54 commented Aug 23, 2022

does it need to be re-implemented or should it just be removed?

I'll take a look - might just be an error.

@rhc54
Copy link
Contributor

rhc54 commented Aug 23, 2022

We should also double check the $HOME/.prte/mca-params.conf and $HOME/.openmpi/mca-params.conf

The PRRTE default param file is definitely working as I am using it and all get set as specified. I cannot speak to the OMPI one so that might need checking - it should get picked up by the pml/ompi component, but it might not be implemented there yet? Probably should first check that the OMPI personality is getting passed to PMIx (pretty sure that is happening, but worth checking).

@awlauria awlauria added this to the v5.0.0 milestone Aug 23, 2022
@rhc54
Copy link
Contributor

rhc54 commented Aug 23, 2022

For the problem listed in #10702 (comment), what MCA param is he trying to set? I suspect it is either an incorrect param (due to it being PRRTE and not ORTE) or something that no longer exists as a param due to PRRTE treating things as job-specific instead (so maybe it needs to be restored as setting a default policy that is overridden by cmd line directives)

@rhc54
Copy link
Contributor

rhc54 commented Aug 23, 2022

For the problem listed in #10702 (comment), what MCA param is he trying to set? I suspect it is either an incorrect param (due to it being PRRTE and not ORTE) or something that no longer exists as a param due to PRRTE treating things as job-specific instead (so maybe it needs to be restored as setting a default policy that is overridden by cmd line directives)

I looked further at the description there and saw where he set the param - it looks okay. I suspect the problem is the same as was just reported to me offlist by the DDT folks - he isn't hitting a slot limit, but rather he is hitting the overload limit. Setting oversubscribe isn't turning off the check on CPUs. I'll have to fix that separately.

@rhc54
Copy link
Contributor

rhc54 commented Aug 23, 2022

Should be fixed in openpmix/prrte#1463

@rhc54
Copy link
Contributor

rhc54 commented Aug 23, 2022

It appears this option no longer exists:


The specified display directive is not recognized:

Directive: alloc-devel
Valid directives: allocation:map:bind:map-devel:topo

does it need to be re-implemented or should it just be removed?

Fixed in openpmix/prrte#1464

@rhc54
Copy link
Contributor

rhc54 commented Aug 24, 2022

I'm sensing some confusion over the user-facing options vs what PRRTE accepts, particularly with regard to the --map-by CLI. After looking into it a bit, I can understand why that would be happening as we overloaded --map-by with display options - probably a bad idea.

So please give me a bit to split all this out. Should be done by end of week, but I need some flexibility on the timing as my back has really flared up and so my time-at-keyboard is becoming hit/miss.

@rhc54
Copy link
Contributor

rhc54 commented Aug 24, 2022

I have implemented the --display topo=<list> option, where <list> is a comma-delimited list of the hosts whose topology you wish to have output. Now working on resolving the confusion mentioned above.

@rhc54
Copy link
Contributor

rhc54 commented Aug 24, 2022

--merge-stderr-to-stdout does not seem to work as expected.

Just to keep track of things here: @awlauria implemented the fix for this (which has been committed)

@awlauria
Copy link
Contributor Author

Thanks @rhc54 . Feel free to update the top level comment with links to PR's. I haven't marked any of them as complete until they make their way to release branches.

@rhc54
Copy link
Contributor

rhc54 commented Aug 24, 2022

Afraid I lack permissions to checkoff your boxes, so please feel free to do so when you have time and see them as complete.

@rhc54
Copy link
Contributor

rhc54 commented Aug 24, 2022

I have backported all the fixes to-date to PMIx and PRRTE release branches. Note that PRRTE v3.0 now requires a minimum of PMIx v4.2.1 due to all the fixes on both sides.

I'm most of the way done with the refactor - will update here when complete.

@rhc54
Copy link
Contributor

rhc54 commented Aug 24, 2022

openpmix/prrte#1468 completes the refactoring. Still need to decide what cmd line options to move into the --runtime-options directives as opposed to being directly on the cmd line, but that is under your control for OMPI. Just remember to translate them across so they are seen inside PRRTE.

I'll go back to looking into some of the above operations to ensure they are functional.

@rhc54
Copy link
Contributor

rhc54 commented Aug 24, 2022

-N doesn't appear in --help, possible removal candidate. Maps to --map-by ppr:1:node, so when used with --map-by this is confusing.

$ prterun --help N
Specify number of application processes per node to be started

Probably not in the schizo/ompi help file. Note that it is the OMPI community that required this option - IIRC, it might be part of the standard?

@rhc54
Copy link
Contributor

rhc54 commented Aug 24, 2022

The only one left in your list that I can resolve is --show-progress. I took a quick peek and see the problem. I'll fix it tomorrow morning, and then complete the work on the --runtime-options directive.

I don't have a way of testing --stack-traces at scale, and the --stream-buffering option is best left to @jjhursey as I have no idea how to test it. So I believe once I finish the runtime options work, my role in this issue is complete.

@jjhursey
Copy link
Member

I'll look at the stream-buffering option. Once upon a time (way back in 1.7 in 4dd9f89a9) it was an MCA option (ess_base_stream_buffering). I'm not sure when it became a CLI option. I'll take a look and sort it out

@awlauria
Copy link
Contributor Author

@rhc54 the scale was fairly small with --get-stack-traces, a single node should reproduce PMIx errors like this:

./exports/bin/mpirun --timeout 1 --np 10 --get-stack-traces ./hello_sleep 
--------------------------------------------------------------------------
The user-provided time limit for job execution has been reached:

  Timeout: 1 seconds

The job will now be aborted.  Please check your code and/or
adjust/remove the job execution time limit (as specified by --timeout
command line option or MPIEXEC_TIMEOUT environment variable).
--------------------------------------------------------------------------
Waiting for stack traces (this may take a few moments)...
[c685f8n02:3468407] PMIX bfrop:unpack: got type 27 when expecting type 3
STACK TRACE FOR PROC [prterun-c685f8n02-3468407@1,0] (c685f8n02, PID 3468424)
	Thread 3 (Thread 0x20001140ef80 (LWP 3468452)):
	#0  0x00002000008082ac in epoll_wait () from /lib64/glibc-hwcaps/power9/libc-2.28.so
	#1  0x00002000034e2e88 in ucs_event_set_wait () from /lib64/libucs.so.0
	#2  0x00002000034c9814 in ucs_async_thread_func () from /lib64/libucs.so.0
	#3  0x0000200000689678 in start_thread () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
	#4  0x0000200000807dd8 in clone () from /lib64/glibc-hwcaps/power9/libc-2.28.so
	Thread 2 (Thread 0x2000021fef80 (LWP 3468441)):
	#0  0x00002000008082ac in epoll_wait () from /lib64/glibc-hwcaps/power9/libc-2.28.so
	#1  0x00002000012ea288 in epoll_dispatch () from /lib64/libevent_core-2.1.so.6
	#2  0x00002000012dad00 in event_base_loop () from /lib64/libevent_core-2.1.so.6
	#3  0x0000200000fe05a4 in progress_engine (obj=0x32ca02d0) at runtime/pmix_progress_threads.c:228
	#4  0x0000200000689678 in start_thread () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
	#5  0x0000200000807dd8 in clone () from /lib64/glibc-hwcaps/power9/libc-2.28.so
	Thread 1 (Thread 0x200001da0010 (LWP 3468424)):
	#0  0x0000200000697ea0 in write () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
	#1  0x0000200000be6ab8 in ibv_exp_cmd_modify_qp () from /lib64/libibverbs.so.1
	#2  0x0000200003258684 in mlx5_modify_qp_ex () from /lib64/libmlx5-rdmav2.so
	#3  0x0000200000bdbe30 in __ibv_exp_modify_qp () from /lib64/libibverbs.so.1
	#4  0x00002000039bbba0 in uct_dc_mlx5_iface_dci_connect () from /lib64/ucx/libuct_ib.so.0
	#5  0x00002000039bc67c in uct_dc_mlx5_iface_t_init () from /lib64/ucx/libuct_ib.so.0
	#6  0x00002000039bc9e8 in uct_dc_mlx5_iface_t_new () from /lib64/ucx/libuct_ib.so.0
	#7  0x00002000033e01cc in uct_iface_open () from /lib64/libuct.so.0
	#8  0x0000200003366d80 in ucp_worker_iface_open () from /lib64/libucp.so.0
	#9  0x00002000033674e8 in ucp_worker_add_resource_ifaces () from /lib64/libucp.so.0
	#10 0x000020000336883c in ucp_worker_create () from /lib64/libucp.so.0
	#11 0x0000200003838458 in hmca_bcol_ucx_p2p_init_query.part () from /opt/mellanox/hcoll/lib/hcoll/hmca_bcol_ucx_p2p.so
	#12 0x00002000009b6418 in hmca_bcol_base_init () from /opt/mellanox/hcoll/lib/libhcoll.so.1
	#13 0x0000200000931964 in hmca_coll_ml_init_query () from /opt/mellanox/hcoll/lib/libhcoll.so.1
	#14 0x00002000009a786c in hcoll_init_with_opts () from /opt/mellanox/hcoll/lib/libhcoll.so.1
	#15 0x00002000002c3974 in mca_coll_hcoll_comm_query (comm=0x20000061da68 <ompi_mpi_comm_world>, priority=0x7fffe711e5b4) at coll_hcoll_module.c:305
	#16 0x00002000002903e4 in query_2_4_0 (component=0x200000610730 <mca_coll_hcoll_component>, comm=0x20000061da68 <ompi_mpi_comm_world>, priority=0x7fffe711e5b4, module=0x7fffe711e6e0) at base/coll_base_comm_select.c:544
	#17 0x0000200000290368 in query (component=0x200000610730 <mca_coll_hcoll_component>, comm=0x20000061da68 <ompi_mpi_comm_world>, priority=0x7fffe711e5b4, module=0x7fffe711e6e0) at base/coll_base_comm_select.c:527
	#18 0x00002000002901e0 in check_one_component (comm=0x20000061da68 <ompi_mpi_comm_world>, component=0x200000610730 <mca_coll_hcoll_component>, module=0x7fffe711e6e0) at base/coll_base_comm_select.c:492
	#19 0x000020000028fd28 in check_components (components=0x20000060f750 <ompi_coll_base_framework+80>, comm=0x20000061da68 <ompi_mpi_comm_world>) at base/coll_base_comm_select.c:412
	#20 0x000020000028449c in mca_coll_base_comm_select (comm=0x20000061da68 <ompi_mpi_comm_world>) at base/coll_base_comm_select.c:114
	#21 0x000020000015d790 in ompi_mpi_init (argc=1, argv=0x7fffe711fbd8, requested=0, provided=0x7fffe711f610, reinit_ok=false) at runtime/ompi_mpi_init.c:550
	#22 0x00002000001f727c in PMPI_Init (argc=0x7fffe711f7b0, argv=0x7fffe711f7b8) at init.c:67
	#23 0x00000000100009fc in main (argc=1, argv=0x7fffe711fbd8) at [c685f8n02:3468407] 
pmix_ptl_base: send_msg: write failed: Connection reset by peer (104) [sd = 43]

Specifically the PMIX bfrop:unpack: got type 27 when expecting type 3 and pmix_ptl_base: send_msg: write failed: Connection reset by peer (104) [sd = 43] errors.

@rhc54
Copy link
Contributor

rhc54 commented Aug 25, 2022

The "show_progress" option has been re-enabled here openpmix/prrte#1471

Note that it is no longer an MCA param and is now under the --runtime-options CLI

@rhc54
Copy link
Contributor

rhc54 commented Aug 25, 2022

Specifically the PMIX bfrop:unpack: got type 27 when expecting type 3 and pmix_ptl_base: send_msg: write failed: Connection reset by peer (104) [sd = 43] errors.

Okay, I can look at that one. Should be easy to fix.

@rhc54
Copy link
Contributor

rhc54 commented Aug 25, 2022

I'll look at the stream-buffering option. Once upon a time (way back in 1.7 in 4dd9f89a9) it was an MCA option (ess_base_stream_buffering). I'm not sure when it became a CLI option. I'll take a look and sort it out

I believe it still is an MCA param - I'm not sure you can make it a per-job cmd line option, can you?

@rhc54
Copy link
Contributor

rhc54 commented Aug 25, 2022

Stack trace issue is fixed here openpmix/prrte#1472

@awlauria
Copy link
Contributor Author

Thanks!

@awlauria
Copy link
Contributor Author

awlauria commented Aug 30, 2022

I retested all of these using prrte/openpmix latest main this morning:

--display-devel-allocation, should map to --display devel-map
Need to remove the devel-allocation option.
--display-topo/--display topo, does nothing. Should be removed? --display-topo should map to --display topo
mpirun mapping for display-topo needs adjustment
--merge-stderr-to-stdout does not seem to work as expected. fixes in master: openpmix/openpmix#2709 + openpmix/prrte#1462
WORKS
--do-not-launch does not work as expected (processes are launched) Fixed in main: openpmix/prrte#1459
seems broken on master:

$ ./exports/bin/mpirun --do-not-launch --np 2 ./hello
--------------------------------------------------------------------------
The map-by directive contains an unrecognized qualifier:

 Qualifier: donotlaunch
 Valid qualifiers: pe=,span,oversubscribe,nooversubscribe,nolocal,hwtcpus,corecpus,inherit,noinherit,file=,ordered

Please check for a typo or ensure that the qualifier is a supported one.
--------------------------------------------------------------------------
$ 

--show-progress, doesn't seem to do anything. Removal candidate?
no noticeable change
--stream-buffering - Can't tell if this is working as expected
no noticeable change
--app, deprecated translation is not working
*can't get this to work *

$ ./exports/bin/prterun --app ./appfile
--------------------------------------------------------------------------
No executable was specified on the prterun command line.

Aborting.
--------------------------------------------------------------------------
$ 

-N doesn't appear in --help, possible removal candidate. Maps to --map-by ppr:1:node, so when used with --map-by this is confusing.
still there - should we just remove this??
--get-stack-traces seems to fail at larger scales via a hang or crash
still shows various pmix errors
--stop-in-app shouldn't require an argument. Fixed in main: openpmix/prrte#1458
does not stop in app now...
--ppr - not well documented. Slot detection when used in conjuction with --host foo (without specifying slots) seems broken, for
*still an issue *

rankfile - some known issues already reported, needs verification
--cpu-set/--cpu-list Works - somewhat. If you ask for more ranks than cpus
still prints error like so

PRTE ERROR: Unable to map job in file rmaps_rr.c at line 184

@jjhursey
Copy link
Member

jjhursey commented Sep 1, 2022

Re: --stream-buffering: Two PRs to adjust the handling to be more Open MPI centric and less PRRTE dependent.

We can now set it via an MCA parameter, instead of only being able to set it from the PRRTE CLI. This restores some functionality from when it was originally introduced in 4dd9f89a9

@jjhursey
Copy link
Member

Cross-reference Issue #10705

@awlauria
Copy link
Contributor Author

Closing this in favor of #10705

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants