-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create file for file backed shared memory in process job session dir. #3739
Create file for file backed shared memory in process job session dir. #3739
Conversation
Prevents file collisions and can also be cleaned by orte-clean properly. Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
@edgargabriel something for you to review when you're back from vacation |
I'll open PRs against the release branches once this is merged in to master. |
I liked originally the idea behind this commit, and I can confirm that it works as expected. My hesitation comes from a more theoretical scenario, and maybe I don't understand the session directories well enough, so somebody please feel free to correct me. Basically, the scenario that could cause trouble when using this patch is: a) an MPI job connecting with an MPI job, b) using Intercomm_merge to create an intra communicator that contains processes from the two different jobs (i.e. they will have different session directories if I understand correctly). I understand that this is a very theoretical scenario, but it would not work with this patch. Could we bcast the session directory of rank zero, such that everybody uses that one, similarly to how we deal with the jobid? |
Your basic understanding of the session directories is correct. I think the critical question here is: what is supposed to happen to this file when the two jobs disconnect? Does the file get removed during the disconnect procedure? If so, then it can exist in one job's session directory since neither job can leave before disconnecting (else it's an error and the file gets dumped anyway). If one job is supposed to inherit the file, then it gets a little trickier. If we know at the beginning which job is to "own" the file, then we put it there. Otherwise, we'd have to figure out how to deal with it. If multiple jobs connect, then we get the same question: if one job disconnects, does the shared file go away? Or does it stay so long as any of the jobs remains alive? |
thanks for the explanations @rhc54 . Yes, the file is supposed to be removed whenever the file has been closed and the communicator that contains the processes from the connected jobs is automatically freed during the MPI_File_close. Maybe we can talk briefly about that at the devel meeting next week. |
I added it to the agenda |
I think its probably a 5 min conversation between you and me, since I really have mostly some administrative questions on how to approach this. |
Agreed - just wanted a reminder 😄 |
We had a conversation with @rhc54 at the OMPI meeting, and I think the consensus is to not worry about dynamically connected jobs. Shared memory components are also are not being used for communication in dynamically connected jobs since they run into exactly the same problems that we would run into here, namely where to place the backend shared memory file, how to ensure that processes from two different jobs have read/write permission to the different session directories (which might have been launched by different user ids) etc. I will have one minor change request to this pr, and will make a modification to disable the component for dynamically connected communicators in the component query function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the session directory is unique on a per-job basis, we do not need to have the jobid in the filename anymore, so I would suggest to remove that part of the code.
We could add instead the communicator id instead of the job-id as part of the filename. This would allow to support the scenario, where a code opens the same file multiple times and have unique shared file pointers per opening instance. Since Open MPI does a comm-dup on every file open, the communicator id for each file open operation is unique and could be used for the identification.
…terjobid. Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks @cniethammer
bot:retest I think this got caught in a different bug in master; the failures are related to using MPI headers in non-MPI code. Let's make sure... |
bot:mellanox:retest |
Prevent file collisions and allow cleaning of temporary files by orte-clean used for shared file pointer management: Create file for file backed shared memory in process job session dir instead in base /tmp.