-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neighborhood Collectives #2324
Comments
Oh, sorry. The problem is just in |
@dalcinl can you please point to the test case ? i briefly check the source code, and https://github.com/open-mpi/ompi/blob/v2.x/ompi/mca/coll/libnbc/nbc_ineighbor_allgatherv.c nor https://github.com/open-mpi/ompi/blob/v2.x/ompi/mca/coll/basic/coll_basic_neighbor_alltoallv.c use i do not think Neighbor collective work on intercomm. |
Are are attempting a neighborhood collective on a communicator without a topology? |
@hjelmn this erroneous case is tested when checking params if (! OMPI_COMM_IS_TOPO(comm)) {
OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_TOPOLOGY, FUNC_NAME); |
@ggouaillardet That is only enabled if the mpi parameter check is turned on. That is not on by default. |
Hate that close and comment button. |
@hjelmn I don't think you're right -- I believe the MPI param check is set to "runtime" by default, and is enabled at runtime. |
A reproducer is a little hard to write. Moreover, I'm still not sure why my Python tests are failing, maybe there are additional issues. Graph topologies seem to work, but I get errors with Cartesian topologies. I think there is a bug in Open MPI in the case of Cartesian topologies, but I have to investigate further and open a new issue. Meanwhile, please see the following, you still have to fix neighbor allgatherv for reads past the end of the recvcount array. See the C reproducer at the end. The code is a bit contrived, but it is attempting to send and receive zero length messages. I get the following running under valgrind control. Indeed, valgrind complains in the loop checking for the recvcounts array.
#include <mpi.h>
#include <stdlib.h>
#include <assert.h>
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int size,rank;
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int left = rank!=0 ? rank-1 : size-1;
int right = rank!=size-1 ? rank+1 : 0;
MPI_Comm comm;
int indegree = 1, sources[] = { left };
int outdegree = 1, destinations[] = { right };
MPI_Dist_graph_create_adjacent(MPI_COMM_WORLD,
indegree, sources, MPI_UNWEIGHTED,
outdegree, destinations, MPI_UNWEIGHTED,
MPI_INFO_NULL, 0, &comm);
int sendbuf[2] = { rank, -1 };
int sendcount = 0;
int recvbuf[2] = { -1, -1};
int *recvcounts = malloc(1*sizeof(int)); recvcounts[0] = 0;
int *recvdispls = malloc(1*sizeof(int)); recvdispls[0] = 0;
MPI_Neighbor_allgatherv(sendbuf, sendcount, MPI_INT,
recvbuf, recvcounts, recvdispls, MPI_INT,
comm);
assert(recvbuf[0] == -1);
assert(recvbuf[1] == -1);
MPI_Comm_free(&comm);
MPI_Finalize();
return 0;
} |
Ah, ok. This is a bug up in the binding itself. Looks like a C&P error. |
Shows how well-used these functions are :D. |
This commit adds a helper function to get the inbound and outbound neighbor count and updates the neighbor_allgatherv bindings to use the correct count when checking the input parameters. Fixes open-mpi#2324 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
See Open MPI issue #2324 (open-mpi/ompi#2324)
This commit adds a helper function to get the inbound and outbound neighbor count and updates the neighbor_allgatherv bindings to use the correct count when checking the input parameters. Fixes open-mpi#2324 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds a helper function to get the inbound and outbound neighbor count and updates the neighbor_allgatherv bindings to use the correct count when checking the input parameters. Fixes open-mpi#2324 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds a helper function to get the inbound and outbound neighbor count and updates the neighbor_allgatherv bindings to use the correct count when checking the input parameters. Fixes #2324 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds a helper function to get the inbound and outbound neighbor count and updates the neighbor_allgatherv bindings to use the correct count when checking the input parameters. Fixes open-mpi#2324 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (cherry picked from commit 3c0e94a) Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit adds a helper function to get the inbound and outbound neighbor count and updates the neighbor_allgatherv bindings to use the correct count when checking the input parameters. Fixes open-mpi#2324 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (cherry picked from commit 3c0e94a) Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit adds a helper function to get the inbound and outbound neighbor count and updates the neighbor_allgatherv bindings to use the correct count when checking the input parameters. Fixes open-mpi#2324 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (cherry picked from commit 3c0e94a) Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit adds a helper function to get the inbound and outbound neighbor count and updates the neighbor_allgatherv bindings to use the correct count when checking the input parameters. Fixes open-mpi#2324 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (cherry picked from commit 3c0e94a) Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit adds a helper function to get the inbound and outbound neighbor count and updates the neighbor_allgatherv bindings to use the correct count when checking the input parameters. Fixes open-mpi#2324 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (cherry picked from commit 3c0e94a) Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit adds a helper function to get the inbound and outbound neighbor count and updates the neighbor_allgatherv bindings to use the correct count when checking the input parameters. Fixes open-mpi#2324 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (cherry picked from commit 3c0e94a) Signed-off-by: Nathan Hjelm <hjelmn@me.com>
@jsquyres I've found serious issues in the implementation of neighborhood collectives. These issues showed up after a memory allocation optimization I've implemented in mpi4py in commit c2eaa292, which failed badly with Open MPI 2.0.1 in this Bitbucket Pipelines build.
After checking the source code in ompi/v2.x, I found a major issue for the case of intracommunicators with a topology. The code simply do not check the topology to determine the number or sources and destinations, and use
ompi_comm_size()
instead. This leads to wrong checks and reads after end of arrays.In mpi4py, I'm using the following code to determine the number of sources and destinations. This piece of code only handles intracommunicators, but could be extended to intercommunicators.
BTW, the wording of the MPI standard seems to require a topology for neighborhood collectives, although supporting intracomms and intercomms seems possible and the result should be equivalent to regular allgather and alltoall. Or maybe the right thing to do is to just error in case of intercomms or intracomms with no topology.
Am I missing something?
PS: MPICH seems to implement neighbor collectives just for Cartesian/graph topologies, the helper routine
MPIR_Topo_canon_nhb_count
is quite similar to what I'm using in mpi4py (though the MPICH one fails in case of no topology). @roblatham00 Any comments to you competitors :-) about this?The text was updated successfully, but these errors were encountered: