Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the leak of fragments for persistent sends (issue #6565) #6621

Merged
merged 1 commit into from
May 3, 2019

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented Apr 27, 2019

Fix for #6565.

The rdma_frag attached to the send request was not correctly released
upon request completion, leaking until MPI_Finalize. A quick solution
would have been to add RDMA_FRAG_RETURN at different locations on the
send request completion, but it would have unnecessarily made the
sendreq completion path more complex. Instead, I added the length to
the RDMA fragment so that it can be completed during the remote ack.

Be more explicit on the comment.

The rdma_frag can only be freed once when the peer forced a protocol
change (from RDMA GET to send/recv). Otherwise the fragment will be
returned once all data pertaining to it has been trasnferred.

Signed-off-by: George Bosilca bosilca@icl.utk.edu

The rdma_frag attached to the send request was not correctly released
upon request completion, leaking until MPI_Finalize. A quick solution
would have been to add RDMA_FRAG_RETURN at different locations on the
send request completion, but it would have unnecessarily made the
sendreq completion path more complex. Instead, I added the length to
the RDMA fragment so that it can be completed during the remote ack.

Be more explicit on the comment.

The rdma_frag can only be freed once when the peer forced a protocol
change (from RDMA GET to send/recv). Otherwise the fragment will be
returned once all data pertaining to it has been trasnferred.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
@bosilca bosilca force-pushed the topic/persistent_req_leak branch from 172c7f2 to a16cf0e Compare May 2, 2019 13:40
@jsquyres
Copy link
Member

jsquyres commented May 2, 2019

FWIW, I verified that this PR actually does fix the memory leak.

@kawashima-fj If you're happy, let's merge this PR (and cherry pick it to the release branches).

@kawashima-fj kawashima-fj merged commit dabad08 into open-mpi:master May 3, 2019
@kawashima-fj
Copy link
Member

@bosilca Thanks for your explanation.

I also verified the leak fix yesterday. I cannot access a terminal until the next week (because it is ten-day holiday in Japan now!). I'll cherry-pick next Tuesday or so. If you cannot wait, please cherry-pick instead of me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants