Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to delete a manager? #2

Open
einzigsue opened this issue Sep 10, 2017 · 0 comments
Open

How to delete a manager? #2

einzigsue opened this issue Sep 10, 2017 · 0 comments
Assignees

Comments

@einzigsue
Copy link

Hi All,

I just installed MPI.jl and find I cannot shutdown the MPIManager in a secured way. Is there a way to gracefully shut down a MPIManager in MPI_ON_WORKERS mode?

When I do

julia > using MPI
julia > manager = MPIManager(np=4)
julia > workers = addprocs(manager)
julia > @parallel (+) for i in 1:4 rand(Bool) end
julia > exit()

I receive error message as following.

WARNING: Forcibly interrupting busy workers
INFO: INFO: INFO: pid=6516 id=3 op=interrupt
pid=6516 id=4 op=interrupt
pid=6516 id=5 op=interrupt
CompositeException(Any[CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)]), CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)]), CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)])])

I then tried to call rmprocs(workers,waitfor=60.0) before exit() but it returns error message in Julia v0.6.0, saying "ERROR: UndefVarError: set_worker_state not defined".

If I call MPI.Finalize() before exit() in the head, it terminated julia and returns to the error message "

*** The MPI_Finalize() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.

I then tried to call MPI.Finalize() in each worker

@everywhere using MPI
 for w in workers
         @spawnat w MPI.Finalize()
    end

before exit(), I receive the error message in julia like the following.

From worker 2: [(null):21884] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
From worker 3: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked.
From worker 3: *** This is disallowed by the MPI standard.
From worker 3: *** Your MPI job will now abort.
From worker 3: [(null):21886] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
From worker 5: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked.
From worker 5: *** This is disallowed by the MPI standard.
From worker 5: *** Your MPI job will now abort.
From worker 5: [(null):21891] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
From worker 4: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked.
From worker 4: *** This is disallowed by the MPI standard.
From worker 4: *** Your MPI job will now abort.
From worker 4: [(null):21888] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
INFO: pid=21626 id=2 op=deregister
INFO: INFO: INFO: pid=21626 id=3 op=deregister
pid=21626 id=4 op=deregister
pid=21626 id=5 op=deregister
Worker 3 terminated.
Worker 4 terminated.ERROR (unhandled task failure): EOFError: read end of file

Worker 5 terminated.ERROR (unhandled task failure): EOFError: read end of file

ERROR (unhandled task failure): EOFError: read end of file

How is the MPIManager in mode MPI_ON_WORKERS expected to be closed?

Cheers
Yue

@simonbyrne simonbyrne transferred this issue from JuliaParallel/MPI.jl Aug 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants