-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to minimize the external dependencies for OMPI wrappers. #9893
Conversation
There is one point that needs to be clarified: why we need to register our output files with PMIX. |
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/da05bde09b142e948898e3900c5ef795 |
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/e5e90c92a9e9e7e1e307bd6c5cb5d223 |
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/63423e6ee0481c5803b7bc1aaab2e605 |
@@ -782,9 +782,13 @@ static int open_file(int i) | |||
free(filename); /* release the filename in all cases */ | |||
return OPAL_ERR_IN_ERRNO; | |||
} | |||
|
|||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prevents PMIx from deleting these files during cleanup (since they all end up in the PMIx directory in many situations). We don't want to lose this functionality, but it makes the util / non-util OPAL split harder. I think the right answer is yet another function pointer. have an opal_pmix_opal_output_handler() function in pmix base that is set as the function pointer in the util/output.c code when the opal_init() call occurs (as opposed to the opal_init_util() cal). We don't need this for utilities that aren't going to use PMIx, agree there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that is not enough, for any software that explicitly calls opal_util_init
before opal_init
. We could build a temporary list and once the RTE is initialized migrate all files there, but this seems extremely invasive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we didn't have libopen_pal.la include libopen_pal_util.la? We can have libopen_pal.la include all the same source files as libopenpal_util.la, except for a single file which provided opal_pmix_register_cleanup() as an empty function? A better solution would be weak symbols, but since Darwin doesn't support weak symbols, that's out.
@@ -171,6 +171,6 @@ OPAL_DECLSPEC extern struct opal_proc_t *(*opal_proc_for_name)(const opal_proces | |||
* our own. This is to be used by all BTLs so we don't retrieve hostnames | |||
* unless needed. The returned value MUST NOT be free'd as it is | |||
* owned by the proc_t */ | |||
OPAL_DECLSPEC char *opal_get_proc_hostname(const opal_proc_t *proc); | |||
OPAL_DECLSPEC extern char *(*opal_get_proc_hostname)(const opal_proc_t *proc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels like the proc code shouldn't be in the util library. I'm guessing there's a dependency I'm missing, but that would certainly be my preference on a split point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, but there is no easy way because util being used via SUBDIRS, it generates a convenience library. If we want to split this, we will have to generate 2 convenience libraries, and include them accordingly at the upper level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally leaning that way (splitting the util directory). I'm out of time for a bit, but tonight I'll try to see if we can get rid of the subdirs for the util directory. It's either because of the lex file (harder to get rid of) or the LTDLINCL, which shouldn't be needed at this point.
860f6dc
to
8636d39
Compare
The solution put forward here consist of creating a convenience library, opal-tiny, that holds all the basic things we need to have a working app with minimal external dependencies. More detailed discussion in open-mpi#9869. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
8636d39
to
3a4bf18
Compare
@@ -782,9 +782,13 @@ static int open_file(int i) | |||
free(filename); /* release the filename in all cases */ | |||
return OPAL_ERR_IN_ERRNO; | |||
} | |||
|
|||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we didn't have libopen_pal.la include libopen_pal_util.la? We can have libopen_pal.la include all the same source files as libopenpal_util.la, except for a single file which provided opal_pmix_register_cleanup() as an empty function? A better solution would be weak symbols, but since Darwin doesn't support weak symbols, that's out.
@@ -171,6 +171,6 @@ OPAL_DECLSPEC extern struct opal_proc_t *(*opal_proc_for_name)(const opal_proces | |||
* our own. This is to be used by all BTLs so we don't retrieve hostnames | |||
* unless needed. The returned value MUST NOT be free'd as it is | |||
* owned by the proc_t */ | |||
OPAL_DECLSPEC char *opal_get_proc_hostname(const opal_proc_t *proc); | |||
OPAL_DECLSPEC extern char *(*opal_get_proc_hostname)(const opal_proc_t *proc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally leaning that way (splitting the util directory). I'm out of time for a bit, but tonight I'll try to see if we can get rid of the subdirs for the util directory. It's either because of the lex file (harder to get rid of) or the LTDLINCL, which shouldn't be needed at this point.
Typo in shmem_posix_module.c Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
I linked this to issue #9869. Is that correct? |
What's the status on this PR, are we agreed that this is the correct fix for #9869 ? |
I started https://github.com/bwbarrett/ompi/tree/bugfix/mpicc-dependencies to make this a bit more clean to merge into main, but have run out of time and I'm not sure when I'll be able to come back to it. I think most of the hard problems are solved in that branch, but there's definitely some cleanup work in all the patches before it can become a PR. |
The solution put forward here consist of creating a convenience library,
opal-tiny, that holds all the basic things we need to have a working
app with minimal external dependencies.
More detailed discussion in #9869.
Signed-off-by: George Bosilca bosilca@icl.utk.edu