Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-blocking collectives: retain MPI_op and MPI_Datatype(s) #2154

Merged

Conversation

ggouaillardet
Copy link
Contributor

No description provided.

@ggouaillardet ggouaillardet force-pushed the topic/retain_op_and_datatypes branch from 5b1cc1d to a525940 Compare October 4, 2016 04:16
@ggouaillardet
Copy link
Contributor Author

@bosilca can you please review this ?

when a non blocking collective is invoked, user operators and datatypes (e.g. MPI_Op and MPI_Datatype) are OBJ_RETAIN'ed, and then OBJ_RELEASE'd when the request completes.

this solves issues when user op/datatype are free'd by the app before a non-blocking collective completes. i previously made a prototype but at the coll/libnbc level. these patches do it at the ompi level.

there cannot be any issues with intrinsic operators nor predefined datatypes, so i did not bother retain/release them. my understanding is there is a plan to stop doing that for non blocking point-to-point (e.g. MPI_Isend, MPI_Irecv), so i think that is fine.

i implemented that by "inserting" a completion callback (req_complete_cb) in a non blocking request.
ideally, i guess that should be done in the req_free callback (or maybe in both req_complete_cb and req_free ?) to properly handle cancelled requests.
a pointer is required to release op/datatype(s), and currently, only req_complete_cb_data is available, so it was easier to fix req_complete_cb

@jsquyres @hjelmn FYI

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a Signed-off-by line to this PR's commit.

@jjhursey
Copy link
Member

bot:ibm:retest

@jjhursey jjhursey added the bug label Nov 18, 2016
@jjhursey jjhursey added this to the v2.0.2 milestone Nov 18, 2016
@jjhursey
Copy link
Member

@ggouaillardet Is this PR is supposed to fix Issue #2151?

@ggouaillardet
Copy link
Contributor Author

ggouaillardet commented Nov 18, 2016

@jjhursey yes
@bosilca this is the PR we briefly discussed yesterday
Note i might rework it a bit after #2393 is merged

@bosilca
Copy link
Member

bosilca commented Nov 22, 2016

@ggouaillardet as we briefly discussed during SC, I think the best approach will be to stop doing any refcount inside the library and instead move them all up into the MPI layer. This will allow us to release/retain all the non-predefined types and ops only once per operation.

@hppritcha
Copy link
Member

moving to 2.0.3

@hppritcha hppritcha modified the milestones: v2.0.3, v2.0.2 Dec 7, 2016
@ggouaillardet
Copy link
Contributor Author

@bosilca this PR can be seen as a first step.

it handles ddt/ops for non blocking collectives at the MPI level.
next step is to move ddt handling for pt2pt to the MPI level

meanwhile, do you have any comments on this PR ?

@bosilca
Copy link
Member

bosilca commented Dec 8, 2016

This patch looks extremely big, a lot of code for a little benefit. I wonder why can't we simply add the datatype and op release code next to the calls to ompi_request_complete ?

Aren't we supposed to retain the datatype (ompi_coll_base_retain_datatypes) and op (ompi_coll_base_retain_op) before the call to the non-blocking functions themselves ?

@ibm-ompi
Copy link

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/3a4218f168345072c128648ce107aeef

@hppritcha hppritcha modified the milestone: v2.0.3 Jun 1, 2017
@ggouaillardet ggouaillardet force-pushed the topic/retain_op_and_datatypes branch from c9b7ac4 to e776fab Compare September 1, 2017 07:51
@ibm-ompi
Copy link

ibm-ompi commented Sep 1, 2017

The IBM CI (GNU Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/2d5c1a28af517e6eef3a24c202f7cce9

@ibm-ompi
Copy link

ibm-ompi commented Sep 1, 2017

The IBM CI (XL Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/c7c9ec868e4ece21fece1f8e503c878c

@ibm-ompi
Copy link

ibm-ompi commented Sep 1, 2017

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/3b47b61d703cc97ac46275531114cc15

@ggouaillardet ggouaillardet force-pushed the topic/retain_op_and_datatypes branch from e776fab to 4fe431b Compare September 1, 2017 08:08
@ibm-ompi
Copy link

ibm-ompi commented Sep 1, 2017

The IBM CI (GNU Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/e0d9e4795c9d80a35b550f2ed3ae84ea

@ggouaillardet
Copy link
Contributor Author

:bot:aws:retest

@ggouaillardet ggouaillardet force-pushed the topic/retain_op_and_datatypes branch from 8f6a839 to 646391b Compare April 9, 2019 07:12
@ggouaillardet
Copy link
Contributor Author

@bosilca this is quite an old PR we discussed way before it was issued.
Currently, onesided operations retain/release the datatype in the pml component.
I initially wrote some code (PR ?) that retain/release the datatypes and operator in the coll component (e.g. coll/libnbc) but you argued it would be better do this for all components at the ompi level, so I ended up with this PR, that will work for components other than coll/libnbc but that is a bit heavy (e.g. we end up chaining two callbacks when completing a request). We might be able to avoid this chaining by moving some data into the ompi_request_t struct (and I wanted to avoid this in the first place)

@jjhursey
Copy link
Member

jjhursey commented May 7, 2019

Someone asked me about the state of this PR today. Just checking in to see where things are standing? This seems to fix Issue #2151

@ggouaillardet
Copy link
Contributor Author

I am waiting for directions from @bosilca since he objected this PR.

@ggouaillardet
Copy link
Contributor Author

@jjhursey note I did not test this PR vs persistent non blocking collectives.

@jsquyres
Copy link
Member

jsquyres commented Jul 3, 2019

@bosilca Thoughts on this PR?

@bosilca
Copy link
Member

bosilca commented Jul 3, 2019

Here are few questions and suggestions:

  • Create a specialized class of collective requests that contains the retain_op_data in the request (save a memory allocation/deallocation)
  • In ompi_coll_base_retain_datatypes_w do you really need to make a copy of the datatype arrays ? I could not find anything in the MPI standard that define the life-expectancy of the array argyments for the non-blocking calls. But it would make sense if they were expected to exists for the entire duration of the collective. Anyway, the answer to this question might change the implementation of ompi_coll_base_retain_datatypes_w.
  • I can see why we want to have distinct functions for all possible combinations of ops and datatype. However, in these functions we only use the opal_object_t capabilities of the different objects, so we could reduce the number of callbacks to 2 (2 opal_objects and 1 array of opal_objects).

@ggouaillardet
Copy link
Contributor Author

Thanks @bosilca, I will update the PR based on your feedback.

The MPI standard is explicit about the life expectancy of the array arguments for the non blocking calls, and this makes things even easier as you suggested.
From MPI 3.1 chapter 5.12 page 197

Once initiated, all associated send buffers and buffers associated with input arguments (such
as arrays of counts, displacements, or datatypes in the vector versions of the collectives)
should not be modified, and all associated receive buffers should not be accessed, until the
collective operation completes.

@bosilca
Copy link
Member

bosilca commented Jul 4, 2019

Thanks @ggouaillardet, I've been looking for that sentence for hours.

@ggouaillardet ggouaillardet force-pushed the topic/retain_op_and_datatypes branch from 646391b to e4dcf83 Compare July 4, 2019 06:24
@ggouaillardet
Copy link
Contributor Author

I pushed a new commit to address the first point

@ggouaillardet ggouaillardet force-pushed the topic/retain_op_and_datatypes branch from 541a868 to a578e45 Compare July 8, 2019 04:38
@ggouaillardet
Copy link
Contributor Author

@bosilca I addressed all your points. Can you please give this PR a final review ?
@jjhursey FYI

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good. I had few minor suggestions for improvement, mostly to make the code more readable. If you make them, feel free to merge directly.

ompi/mca/coll/base/coll_base_util.c Show resolved Hide resolved

static void release_vecs_callback(ompi_coll_base_nbc_request_t *request) {
ompi_communicator_t *comm = request->super.req_mpi_object.comm;
int count = OMPI_COMM_IS_INTER(comm)?ompi_comm_remote_size(comm):ompi_comm_size(comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work for persistent neighbor collectives ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you meant non blocking neighbor alltoallw (regardless this is a persistent operation or not), right ? in this case, this is a good catch I fixed in the latest commit.

ompi/mca/coll/libnbc/nbc.c Outdated Show resolved Hide resolved
ompi/mca/coll/retain.diff Outdated Show resolved Hide resolved
@gpaulsen
Copy link
Member

Two questions:

  1. Is this at all dependent on A big refresh of the datatype engine #6695 ?
  2. How difficult would it be to port this to v4.0.x?

@ggouaillardet
Copy link
Contributor Author

@gpaulsen

@ggouaillardet
Copy link
Contributor Author

@bosilca I made the requested changes. Unless some changes are requested, I will squash the commits and merge this PR tomorrow.

@ggouaillardet
Copy link
Contributor Author

@gpaulsen fwiw, this PR applies nicely on the v4.0.x branch once 65660e5 has been backported.

MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#2151
Fixes open-mpi#1304

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
@gpaulsen
Copy link
Member

gpaulsen commented Nov 4, 2020

@ggouaillardet @hppritcha, This functionality (along with 65660e5) was never backported to either the v4.0.x, or v4.1.x release streams.

My opinion is that this is quite a bit of change for this late in the release streams, and that we should probably wait for v5.0.0 for this functionality. What are your thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants