-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16722 client: to intercept PMPI_Init() in libpil4dfs #15336
Conversation
Features: pil4dfs Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <lei.huang@intel.com>
Ticket title is 'Hang in zeInit in pil4dfs interception library when preloading darshan' |
Features: pil4dfs Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <lei.huang@intel.com>
Features: pil4dfs Signed-off-by: Lei Huang <lei.huang@intel.com>
Features: pil4dfs Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <lei.huang@intel.com>
All CI tests finished without issues. |
int rc; | ||
|
||
if (next_pmpi_init == NULL) { | ||
next_pmpi_init = dlsym(RTLD_NEXT, "PMPI_Init"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, we are expecting to get the reference on the original PMPI_Init function.
However, I do understand how we are sure of which function reference we will get, if there is several LD_PRELOAD on this last one. For example, if Darshan is redefining it, could it be possible that we get the reference on the darshan one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue of deadlock is caused by calling daos_init() inside MPI runtime library's MPI_Init()/PMPI_Init(). We could avoid this issue as long as the PMPI_Init() from libpil4dfs is executed before the PMPI_Init() implemented in MPI runtime library. It does not not matter the PMPI_Init() from libpil4dfs or the PMPI_Init() from darshan is executed first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for the explanation.
let's hold off merging for now until i test on aurora since this is not extremelly time critical anymore. |
Thank you very much! |
Intercept PMPI_Init() to avoid calling daos_init() if MPI_Init() is intercepted by other library (like darshan and mpip).
Features: pil4dfs
Required-githooks: true
Skipped-githooks: codespell
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: