Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4/coll: Fix reduce composition alpha with non-zero root #6543

Merged
merged 1 commit into from
Jun 5, 2023

Conversation

raffenet
Copy link
Contributor

@raffenet raffenet commented May 30, 2023

Pull Request Description

We need to handle the case where a non-zero root uses
MPI_IN_PLACE. Otherwise we could try reading from a bad address and
crash. Fixes #6540.

NOTE: For single node reduce operation with non-zero root, this
composition incurs an unnecessary extra copy.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@raffenet
Copy link
Contributor Author

test:mpich/ch4/most
test:mpich/ch3/most

@raffenet
Copy link
Contributor Author

raffenet commented May 30, 2023

This patch is wrong. The issue is the root rank is using MPI_IN_PLACE but an internal collective is treating the root as a regular sender rank and trying to copy from a bad address. Still working on a fix.

@raffenet
Copy link
Contributor Author

test:mpich/ch4/most
test:mpich/ch3/most

@raffenet raffenet changed the title ch4/coll: Fix reduce compositions with non-zero root ch4/coll: Fix reduce composition alpha with non-zero root May 30, 2023
@raffenet raffenet requested a review from hzhou May 31, 2023 15:12
@hzhou
Copy link
Contributor

hzhou commented May 31, 2023

I was trying to enhance the test to cover this bug, with following patch:

diff --git a/test/mpi/coll/reduce.c b/test/mpi/coll/reduce.c
index 5c61a85150..97fcf104de 100644
--- a/test/mpi/coll/reduce.c
+++ b/test/mpi/coll/reduce.c
@@ -81,9 +81,25 @@ static int test_reduce(mtest_mem_type_e oddmem, mtest_mem_type_e evenmem)
                 MTestCopyContent(recvbuf_h, recvbuf, count * sizeof(int), memtype);

                 MPI_Reduce(sendbuf, recvbuf, count, MPI_INT, MPI_SUM, root, comm);
-                MTestCopyContent(recvbuf, recvbuf_h, count * sizeof(int), memtype);

                 if (rank == root) {
+                    MTestCopyContent(recvbuf, recvbuf_h, count * sizeof(int), memtype);
+                    check_buf(rank, size, count, &errs, recvbuf_h);
+                }
+
+                /* test again using MPI_IN_PLACE */
+                if (rank == root) {
+                    set_send_buf(count, recvbuf_h);
+                    MTestCopyContent(recvbuf_h, recvbuf, count * sizeof(int), memtype);
+                    MPI_Reduce(MPI_IN_PLACE, recvbuf, count, MPI_INT, MPI_SUM, root, comm);
+                } else {
+                    set_send_buf(count, sendbuf_h);
+                    MTestCopyContent(sendbuf_h, sendbuf, count * sizeof(int), memtype);
+                    MPI_Reduce(sendbuf, NULL, count, MPI_INT, MPI_SUM, root, comm);
+                }
+
+                if (rank == root) {
+                    MTestCopyContent(recvbuf, recvbuf_h, count * sizeof(int), memtype);
                     check_buf(rank, size, count, &errs, recvbuf_h);
                 }
             }

But strangely, the enhanced test passes even without your fix. It appears that the posix release gather algorithm works even with the wrong usage of MPI_IN_PLACE (on non-root). I have confirmed it segfaults if we set MPIR_CVAR_REDUCE_POSIX_INTRA_ALGORITHM=mpir. I would like to understand more clearly how the selection works, and fix the release gather algorithm (i.e. it should error out on the wrong usage of MPI_IN_PLACE).

@raffenet
Copy link
Contributor Author

test:mpich/ch4/most

@raffenet
Copy link
Contributor Author

But strangely, the enhanced test passes even without your fix. It appears that the posix release gather algorithm works even with the wrong usage of MPI_IN_PLACE (on non-root). I have confirmed it segfaults if we set MPIR_CVAR_REDUCE_POSIX_INTRA_ALGORITHM=mpir. I would like to understand more clearly how the selection works, and fix the release gather algorithm (i.e. it should error out on the wrong usage of MPI_IN_PLACE).

I was wondering why red3.c or red4.c didn't trigger the bug, but yeah, the selection is complicated. We could always add the reporter's reproducer with permission. The latest push is working with the example finally.

@hzhou
Copy link
Contributor

hzhou commented May 31, 2023

I was wondering why red3.c or red4.c didn't trigger the bug

Both red3.c and red4.c are for non-commutative op. We need the coverage for commutative (built-in) op.

@raffenet
Copy link
Contributor Author

raffenet commented Jun 5, 2023

@hzhou did you figure out a consistent test config for this one?

@hzhou
Copy link
Contributor

hzhou commented Jun 5, 2023

I haven't got time yet. I'll approve the PR and add the test as future PRs.

Copy link
Contributor

@hzhou hzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

TODO: add test as future PRs

We need to handle the case where a non-zero root uses
MPI_IN_PLACE. Otherwise we could try reading from a bad address and
crash. Fixes pmodels#6540.

NOTE: For single node reduce operation with non-zero root, this
composition incurs an extra copy from rank 0->root.
@raffenet raffenet merged commit 8451885 into pmodels:main Jun 5, 2023
@raffenet raffenet deleted the ch4-reduce-root branch June 5, 2023 19:08
@@ -594,14 +594,19 @@ MPL_STATIC_INLINE_PREFIX int MPIDI_Reduce_intra_composition_alpha(const void *se
recvbuf = (void *) ((char *) recvbuf - true_lb);
}

/* non-zero root needs to send from recvbuf if using MPI_IN_PLACE */
intra_sendbuf = (sendbuf == MPI_IN_PLACE && root != 0) ? recvbuf : sendbuf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found why the release_gather code was immune to the bug. It is due to

if (send_buf == MPI_IN_PLACE) {
send_buf = recv_buf;
}
. Essentially it was doing the same recvbuf replacement inside the algorithm.

In general, are we allowed to have sendbuf and recvbuf overlap? Apparently all our reduce algorithms works with this replacement, but I wonder whether this is a potential hazard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash with MPI_Reduce( MPI_IN_PLACE, ...) when destination rank > 0 and two processes on same host
2 participants