-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opal/datatype: fix handling of unused opt_desc #3439
opal/datatype: fix handling of unused opt_desc #3439
Conversation
@hppritcha this issue was reported on the ML so i put the "blocker" label. i have no objection you update the milestone if you want to release a new version asap |
@bosilca could you please review this (related to #3438) here is a reproducer #include <stdio.h>
#include <mpi.h>
int
main (int argc, char **argv)
{
int rank;
int bsize = 1024 * 1024;
MPI_Datatype vec_type;
MPI_Datatype sub_type;
MPI_Datatype dup_type;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (0 != rank) bsize *= 1024;
MPI_Type_contiguous(bsize, MPI_BYTE, &vec_type);
MPI_Type_commit(&vec_type);
int sizes[1] = {4};
int subsizes[1] = {4};
int starts[1] = { 0 };
MPI_Type_create_subarray(1, sizes, subsizes, starts, MPI_ORDER_C,
vec_type, &sub_type);
MPI_Type_commit(&sub_type);
MPI_Type_dup(sub_type, &dup_type);
MPI_Type_commit(&dup_type);
MPI_Type_free(&vec_type);
MPI_Type_free(&sub_type);
MPI_Type_free(&dup_type);
MPI_Finalize();
} currently, the program fails on non zero rank. |
@ggouaillardet the original design always initialized the desc (it is basically built while we construct that datatype, and should always be used in a heterogenous case). As a result, the desc was also always supposed to be released. The opt_desc should only be constructed if there is any benefit of building one. Moreover, it is set to NULL during the OBJ_NEW, so there should be no need to set it to NULL in other part of the code. |
@bosilca, so that means that the first commit is an overkill, right ?
|
Thanks George for the guidance Refs open-mpi#3438 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
4358233
to
83706c0
Compare
I think the problem is a little deeper than that. If the datatype is contiguous, then there is no point of having an optimized description, as the default one (the desc) should be good enough. Thus, we should have the opt_desc point back to the desc, instead to pointing to an allocated but empty description. I'm trying to see where we miss the detection of the contiguous data to fix it there. |
In addition to the previous comment, as your example is only using subarray all ranks should generate exactly the same datatype, thus they are all supposed to segfault. |
in my example, |
I see now the problem. I missed the fact that the length was different on the other ranks. |
No description provided.