Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to minimize the external dependencies for OMPI wrappers. #9893

Closed
wants to merge 2 commits into from

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented Jan 19, 2022

The solution put forward here consist of creating a convenience library,
opal-tiny, that holds all the basic things we need to have a working
app with minimal external dependencies.

More detailed discussion in #9869.

Signed-off-by: George Bosilca bosilca@icl.utk.edu

@bosilca bosilca added this to the v5.0.0 milestone Jan 19, 2022
@bosilca bosilca requested review from jsquyres and bwbarrett January 19, 2022 23:05
@bosilca
Copy link
Member Author

bosilca commented Jan 19, 2022

There is one point that needs to be clarified: why we need to register our output files with PMIX.

@ibm-ompi
Copy link

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/da05bde09b142e948898e3900c5ef795

@ibm-ompi
Copy link

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/e5e90c92a9e9e7e1e307bd6c5cb5d223

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/63423e6ee0481c5803b7bc1aaab2e605

opal/util/proc.c Outdated Show resolved Hide resolved
@@ -782,9 +782,13 @@ static int open_file(int i)
free(filename); /* release the filename in all cases */
return OPAL_ERR_IN_ERRNO;
}

#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents PMIx from deleting these files during cleanup (since they all end up in the PMIx directory in many situations). We don't want to lose this functionality, but it makes the util / non-util OPAL split harder. I think the right answer is yet another function pointer. have an opal_pmix_opal_output_handler() function in pmix base that is set as the function pointer in the util/output.c code when the opal_init() call occurs (as opposed to the opal_init_util() cal). We don't need this for utilities that aren't going to use PMIx, agree there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that is not enough, for any software that explicitly calls opal_util_init before opal_init. We could build a temporary list and once the RTE is initialized migrate all files there, but this seems extremely invasive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we didn't have libopen_pal.la include libopen_pal_util.la? We can have libopen_pal.la include all the same source files as libopenpal_util.la, except for a single file which provided opal_pmix_register_cleanup() as an empty function? A better solution would be weak symbols, but since Darwin doesn't support weak symbols, that's out.

opal/tools/wrappers/Makefile.am Outdated Show resolved Hide resolved
@@ -171,6 +171,6 @@ OPAL_DECLSPEC extern struct opal_proc_t *(*opal_proc_for_name)(const opal_proces
* our own. This is to be used by all BTLs so we don't retrieve hostnames
* unless needed. The returned value MUST NOT be free'd as it is
* owned by the proc_t */
OPAL_DECLSPEC char *opal_get_proc_hostname(const opal_proc_t *proc);
OPAL_DECLSPEC extern char *(*opal_get_proc_hostname)(const opal_proc_t *proc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels like the proc code shouldn't be in the util library. I'm guessing there's a dependency I'm missing, but that would certainly be my preference on a split point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, but there is no easy way because util being used via SUBDIRS, it generates a convenience library. If we want to split this, we will have to generate 2 convenience libraries, and include them accordingly at the upper level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally leaning that way (splitting the util directory). I'm out of time for a bit, but tonight I'll try to see if we can get rid of the subdirs for the util directory. It's either because of the lex file (harder to get rid of) or the LTDLINCL, which shouldn't be needed at this point.

opal/Makefile.am Outdated Show resolved Hide resolved
opal/mca/pmix/base/pmix_base_fns.c Outdated Show resolved Hide resolved
The solution put forward here consist of creating a convenience library,
opal-tiny, that holds all the basic things we need to have a working
app with minimal external dependencies.

More detailed discussion in open-mpi#9869.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
opal/Makefile.am Outdated Show resolved Hide resolved
@@ -782,9 +782,13 @@ static int open_file(int i)
free(filename); /* release the filename in all cases */
return OPAL_ERR_IN_ERRNO;
}

#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we didn't have libopen_pal.la include libopen_pal_util.la? We can have libopen_pal.la include all the same source files as libopenpal_util.la, except for a single file which provided opal_pmix_register_cleanup() as an empty function? A better solution would be weak symbols, but since Darwin doesn't support weak symbols, that's out.

@@ -171,6 +171,6 @@ OPAL_DECLSPEC extern struct opal_proc_t *(*opal_proc_for_name)(const opal_proces
* our own. This is to be used by all BTLs so we don't retrieve hostnames
* unless needed. The returned value MUST NOT be free'd as it is
* owned by the proc_t */
OPAL_DECLSPEC char *opal_get_proc_hostname(const opal_proc_t *proc);
OPAL_DECLSPEC extern char *(*opal_get_proc_hostname)(const opal_proc_t *proc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally leaning that way (splitting the util directory). I'm out of time for a bit, but tonight I'll try to see if we can get rid of the subdirs for the util directory. It's either because of the lex file (harder to get rid of) or the LTDLINCL, which shouldn't be needed at this point.

Typo in shmem_posix_module.c

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
@gpaulsen gpaulsen linked an issue Jan 27, 2022 that may be closed by this pull request
@gpaulsen
Copy link
Member

I linked this to issue #9869. Is that correct?

@wckzhang
Copy link
Contributor

wckzhang commented Jul 6, 2022

What's the status on this PR, are we agreed that this is the correct fix for #9869 ?

@bwbarrett
Copy link
Member

I started https://github.com/bwbarrett/ompi/tree/bugfix/mpicc-dependencies to make this a bit more clean to merge into main, but have run out of time and I'm not sure when I'll be able to come back to it. I think most of the hard problems are solved in that branch, but there's definitely some cleanup work in all the patches before it can become a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mpicc / mpirun / etc. carry unnecessary external dependencies
6 participants