Skip to content

Commit

Permalink
btl tcp: Add workaround for "dropped connection" issue
Browse files Browse the repository at this point in the history
Work around a race condition in the TCP BTL's proc setup code.
The Cisco MTT results have been failing on TCP tests due to a
"dropped connection" message some percentage of the time.
Some digging shows that the issue happens in a combination of
multiple NICs and multiple threads.  The race is detailed in
#3035 (comment).

This patch doesn't fix the race, but avoids it by forcing
the MPI layer to complete all calls to add_procs across the
entire job before any process leaves MPI_INIT.  It also
reduces the scalability of the TCP BTL by increasing start-up
time, but better than hanging.

The long term fix is to do all endpoint setup in the first
call to add_procs for a given remote proc, removing the
race.  THis patch is a work around until that patch can
be developed.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
  • Loading branch information
bwbarrett committed Oct 17, 2018
1 parent 37a3f32 commit 2acc4b7
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions opal/mca/btl/tcp/btl_tcp_component.c
Original file line number Diff line number Diff line change
Expand Up @@ -1300,6 +1300,24 @@ mca_btl_base_module_t** mca_btl_tcp_component_init(int *num_btl_modules,
}
}

/* Avoid a race in wire-up when using threads (progess or user)
and multiple BTL modules. The details of the race are in
https://github.com/open-mpi/ompi/issues/3035#issuecomment-429500032,
but the summary is that the lookup code in
component_recv_handler() below assumes that add_procs() is
atomic across all active TCP BTL modules, but in multi-threaded
code, that isn't guaranteed, because the locking is inside
add_procs(), and add_procs() is called once per module. This
isn't a proper fix, but will solve the "dropped connection"
problem until we can come up with a more complete fix to how we
initialize procs, endpoints, and modules in the TCP BTL. */
if (mca_btl_tcp_component.tcp_num_btls > 1 &&
(enable_mpi_threads || 0 < mca_btl_tcp_progress_thread_trigger)) {
for( i = 0; i < mca_btl_tcp_component.tcp_num_btls; i++) {
mca_btl_tcp_component.tcp_btls[i]->super.btl_flags |= MCA_BTL_FLAGS_SINGLE_ADD_PROCS;
}
}

#if OPAL_CUDA_SUPPORT
mca_common_cuda_stage_one_init();
#endif /* OPAL_CUDA_SUPPORT */
Expand Down

0 comments on commit 2acc4b7

Please sign in to comment.