-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI_*_get_info/set_info() #2941
Conversation
Ref PR #1308 |
bot:ibm:pgi:retest |
Build Failed with PGI compiler! Please review the log, and get in touch if you have questions. Gist: https://gist.github.com/0f51d7bc06a4ab54bd6a7adeed9f5ded |
I originally thought that the PGI CI failure was a local system issue. However, it seems legit, as best I can tell.
|
Yeah it looks like romio needs to be updated in this PR too. I'm not sure why the PGI compiler is the sensitive one here, but it's a good catch. The CI configures with: ./configure --without-x CC=pgcc CXX=pgc++ FC=pgfortran @markalle If you need access to the PGI compiler just ping me and I'll let you know where they are on the ppc64le dev machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only took a very cursory glance at this PR.
If it wasn't done so, I'm guessing @dsolt's original commit needs to be rebased to the top of master
. That might prevent some "you need to upgrade ROMIO" issues (i.e., the ROMIO at the top master
is working fine with the PGI compiler; there's no reason this PR should be regressing ROMIO).
Also, the signed-off-by checker failed. All commits need to be signed off.
AUTHORS
Outdated
@@ -81,6 +80,8 @@ Dave Goodell, Cisco | |||
dgoodell@cisco.com | |||
David Daniel, Los Alamos National Laboratory | |||
ddd@lanl.gov | |||
David Solt, IBM | |||
dsolt@us.ibm.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that AUTHORS
is generated automatically these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted it anyway, so dave's checkin now adds a line, and my checkin removes it so the two commits together should be a no-op on AUTHORS
ompi/communicator/comm.c
Outdated
@@ -22,6 +22,7 @@ | |||
* and Technology (RIST). All rights reserved. | |||
* Copyright (c) 2014-2015 Intel, Inc. All rights reserved. | |||
* Copyright (c) 2015 Mellanox Technologies. All rights reserved. | |||
* Copyright (c) 2016 IBM Corp. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these copyright updates be 2017 now?
// I don't think there's any advantage to most of the code being switched | ||
// to opal_info_t, which is causing the unnecessary extra complexity here: | ||
ompi_info_t *ompi_info; | ||
ompi_info = OBJ_NEW(ompi_info_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check for NULL here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I made a change for checking that NULL
// I'd be open to reverting most of it back to ompi_info_t and using | ||
// opal_info_* calls on the opal_info_t that's inside the ompi_info_t. | ||
// I don't think there's any advantage to most of the code being switched | ||
// to opal_info_t, which is causing the unnecessary extra complexity here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this comment should be addressed...? (a similar comment is repeated several times)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a decision needs made for what is the predominant design. Dave has ompi_info_t as the larger class that contains an opal_info_t plus a tiny bit extra (like a fortran int id). All the top level MPI calls that use info come in with an ompi_info_t*.
Dave's overall design is to migrate toward the code from that point down being largely opal_info_t. I don't really disagree, although it does end up with an awful lot of
some_function(mpiinfo)
becoming some_function(&mpiinfo->super)
// to get the opal_info_t
and equivalently, if you just want to pass MPI_INFO_NULL to some routine, but it wants an opal_info_t*, you have to give it &MPI_INFO_NULL->super.
And there are a tiny number of places where the lower level code calls back into MPI, and thus needs the larger ompi_info_t when all it has is an opal_info_t, thus having the extra code you quoted that's making another copy of the info that isn't logically necessary. That bothers me, but it's only 3 places... so I guess I'm leaning toward reverting my comment, and sticking with the design of having the lower level code predominantly using opal_info_t.
2b5f6c3
to
336f389
Compare
Build Failed with PGI compiler! Please review the log, and get in touch if you have questions. Gist: https://gist.github.com/56af0cc8bee2c10cc19913afd2737780 |
ffaf595
to
f9f4779
Compare
PGI compiler issues is fixes, Jeff's review comments have been updated. We think this is ready to merge to master. Can we merge it in, so that @hjelmn can begin using this? If the MPI Forum changes their minds on open questions about this, we can address those in a separate PR. |
bot:ibm:retest |
@jsquyres can you please re-review? |
7a4e195
to
dd9c108
Compare
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/7a1f378bca3c4e57a1bf40f857338b69 |
We'd like to come to a conclusion on this discussion and close this out. |
bot:lanl:retest |
bot:ibm:retest |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/b318cdde19cffc6796c960dce0b5196c |
Absolutely. The current theory is that after a rebase it might get resolved. But time will tell. |
@gpaulsen Ok. Then let's do the rebase before the next review (otherwise, it'll just have to be reviewed again, right?). |
dd9c108
to
2c95427
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance, I get a bunch of compiler warnings:
CC info_subscriber.lo
info_subscriber.c: In function 'opal_infosubscribe_testregister':
info_subscriber.c:175:11: warning: unused variable 'next_key' [-Wunused-variable]
char *next_key;
^
info_subscriber.c:174:11: warning: unused variable 'node' [-Wunused-variable]
void *node = NULL;
^
info_subscriber.c:173:12: warning: unused variable 'key_size' [-Wunused-variable]
size_t key_size;
^
info_subscriber.c:172:9: warning: unused variable 'err' [-Wunused-variable]
int err;
^
info_subscriber.c: In function 'opal_infosubscribe_change_info':
info_subscriber.c:300:18: warning: unused variable 'list' [-Wunused-variable]
opal_list_t *list = NULL;
^
info_subscriber.c:299:32: warning: unused variable 'item' [-Wunused-variable]
opal_callback_list_item_t *item;
^
info_subscriber.c:298:24: warning: unused variable 'table' [-Wunused-variable]
opal_hash_table_t *table = &object->s_subscriber_table;
^
info_subscriber.c:297:11: warning: unused variable 'next_key' [-Wunused-variable]
char *next_key;
^
info_subscriber.c:296:11: warning: unused variable 'node' [-Wunused-variable]
void *node = NULL;
^
info_subscriber.c:293:9: warning: unused variable 'flag' [-Wunused-variable]
int flag;
^
info_subscriber.c:292:12: warning: unused variable 'key_size' [-Wunused-variable]
size_t key_size;
^
And also some new common symbols (which should be avoided):
$ make install-exec-hook
WARNING! Common symbols found:
info.o: 0000000000000100 C ompi_mpi_info_env
info.o: 0000000000000100 C ompi_mpi_info_null
Also, the 30_get
test failed in ompi-tests/info
:
Running test: mpirun -np 4 ./30_get
[**ERROR**]: MPI_COMM_WORLD rank 2, file 30_get.c:68:
Info_get obtained value "valu", expected up to 5 characters of "value1"
[**ERROR**]: MPI_COMM_WORLD rank 0, file 30_get.c:68:
Info_get obtained value "valu", expected up to 5 characters of "value1"
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[**ERROR**]: MPI_COMM_WORLD rank 1, file 30_get.c:68:
Info_get obtained value "valu", expected up to 5 characters of "value1"
[**ERROR**]: MPI_COMM_WORLD rank 3, file 30_get.c:68:
Info_get obtained value "valu", expected up to 5 characters of "value1"
[savbu-usnic-a:10885] 3 more processes have sent help message help-mpi-api.txt / mpi-abort
Exit status: 1
FAIL 30_get (exit status: 1)
Also, if the 2nd commit on this PR is just fixes for the first commit, might as well squash them down into a single commit. |
ompi_communicator_t, ompi_win_t, ompi_file_t all have a super class of type opal_infosubscriber_t instead of a base/super type of opal_object_t (in previous code comm used c_base, but file used super). It may be a bit bold to say that being a subscriber of MPI_Info is the foundational piece that ties these three things together, but if you object, then I would prefer to turn infosubscriber into a more general name that encompasses other common features rather than create a different super class. The key here is that we want to be able to pass comm, win and file objects as if they were opal_infosubscriber_t, so that one routine can heandle all 3 types of objects being passed to it. MPI_INFO_NULL is still an ompi_predefined_info_t type since an MPI_Info is part of ompi but the internal details of the underlying information concept is part of opal. An ompi_info_t type still exists for exposure to the user, but it is simply a wrapper for the opal object. Routines such as ompi_info_dup, etc have all been moved to opal_info_dup and related to the opal directory. Fortran to C translation tables are only used for MPI_Info that is exposed to the application and are therefore part of the ompi_info_t and not the opal_info_t The data structure changes are primarily in the following files: communicator/communicator.h ompi/info/info.h ompi/win/win.h ompi/file/file.h The following new files were created: opal/util/info.h opal/util/info.c opal/util/info_subscriber.h opal/util/info_subscriber.c This infosubscriber concept is that communicators, files and windows can have subscribers that subscribe to any changes in the info associated with the comm/file/window. When xxx_set_info is called, the new info is presented to each subscriber who can modify the info in any way they want. The new value is presented to the next subscriber and so on until all subscribers have had a chance to modify the value. Therefore, the order of subscribers can make a difference but we hope that there is generally only one subscriber that cares or modifies any given key/value pair. The final info is then stored and returned by a call to xxx_get_info. The new model can be seen in the following files: ompi/mpi/c/comm_get_info.c ompi/mpi/c/comm_set_info.c ompi/mpi/c/file_get_info.c ompi/mpi/c/file_set_info.c ompi/mpi/c/win_get_info.c ompi/mpi/c/win_set_info.c The current subscribers where changed as follows: mca/io/ompio/io_ompio_file_open.c mca/io/ompio/io_ompio_module.c mca/osc/rmda/osc_rdma_component.c (This one actually subscribes to "no_locks") mca/osc/sm/osc_sm_component.c (This one actually subscribes to "blocking_fence" and "alloc_shared_contig") Signed-off-by: Mark Allen <markalle@us.ibm.com> Conflicts: AUTHORS ompi/communicator/comm.c ompi/debuggers/ompi_mpihandles_dll.c ompi/file/file.c ompi/file/file.h ompi/info/info.c ompi/mca/io/ompio/io_ompio.h ompi/mca/io/ompio/io_ompio_file_open.c ompi/mca/io/ompio/io_ompio_file_set_view.c ompi/mca/osc/pt2pt/osc_pt2pt.h ompi/mca/sharedfp/addproc/sharedfp_addproc.h ompi/mca/sharedfp/addproc/sharedfp_addproc_file_open.c ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c ompi/mpi/c/lookup_name.c ompi/mpi/c/publish_name.c ompi/mpi/c/unpublish_name.c opal/mca/mpool/base/mpool_base_alloc.c opal/util/Makefile.am
c0fba40
to
7eb4a3e
Compare
Okay, I removed the unused variables. I don't think the two commons are really additions:
they were previously And they're used from mpi.h as On the length I agree, I introduced that bug and I can't see why I made that change, I reverted it. |
Oh, for the squash, yeah I'm okay with that. It's just since Dave did the initial work it had been convenient for me to be able to distinguish what he did from what I changed. |
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/fe5ed39c3b59af6a683826aa06d86eda |
you can squash selectively - so you can squash yours down to one, and his down to another |
The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/ab258c38d7e8e3fdf7a9f13d9ac24c66 |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/0ec3785261b70f70f047bbf895c89c2b |
7eb4a3e
to
9b11854
Compare
The last rebase brought in a need for interlib.c to #include "mpi.h" so it could see MPI_THREAD_SINGLE. I don't know what changed in the chain of includes that used to let it see that, but that's what made the previous CI test fail, so I added that change and re-pushed. |
I've been keeping it squashed to two commits: one from dave, one from me. I think @jsquyres was proposing squashing dave's and mine together. Having them separated was useful for example when Jeff noted that test 30_get.c was failing and I could see whether that bug was my fault or Dave's (it was mine). But it's not critical, so I can squash to one if you want. |
nah - tell @jsquyres to get a life and leave it as two 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's only a little left:
- The copyrights are still a bit weird. You can do whatever you want there, but IMHO, it's a little weird to introduce new code in May of 2017 with a copyright of "2016" or "2016-17". I know that the code has sat around since 2016, but my point is that the first commit that actually gets into master will be in 2017.
- There was at least one copyright addition on a file with no other changes.
- The only real issue left is the common symbols:
$ make install-exec-hook
WARNING! Common symbols found:
info.o: 0000000000000100 C ompi_mpi_info_env
info.o: 0000000000000100 C ompi_mpi_info_null
Those should be fixed.
@@ -20,7 +20,7 @@ | |||
* Copyright (c) 2014-2015 Intel, Inc. All rights reserved. | |||
* Copyright (c) 2015 Research Organization for Information Science | |||
* and Technology (RIST). All rights reserved. | |||
* Copyright (c) 2017 IBM Corporation. All rights reserved. | |||
* Copyright (c) 2016-2017 IBM Corporation. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This retroactive copyright addition seems odd...?
ompi/dpm/dpm.c
Outdated
@@ -18,6 +18,7 @@ | |||
* Copyright (c) 2013-2016 Intel, Inc. All rights reserved. | |||
* Copyright (c) 2014-2017 Research Organization for Information Science | |||
* and Technology (RIST). All rights reserved. | |||
* Copyright (c) 2016-2017 IBM Corporation. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for this copyright addition?
ompi/file/file.h
Outdated
@@ -15,6 +15,7 @@ | |||
* Copyright (c) 2015 Research Organization for Information Science | |||
* and Technology (RIST). All rights reserved. | |||
* Copyright (c) 2016 University of Houston. All rights reserved. | |||
* Copyright (c) 2016 IBM Corp. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a new copyright, shouldn't it be 2017?
@@ -10,6 +10,7 @@ | |||
* Copyright (c) 2004-2005 The Regents of the University of California. | |||
* All rights reserved. | |||
* Copyright (c) 2008-2011 University of Houston. All rights reserved. | |||
* Copyright (c) 2016 IBM Corp. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another "new" 2016 copyright.
@@ -12,6 +12,7 @@ | |||
* Copyright (c) 2008-2015 University of Houston. All rights reserved. | |||
* Copyright (c) 2015 Research Organization for Information Science | |||
* and Technology (RIST). All rights reserved. | |||
* Copyright (c) 2016 IBM Corp. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..another "new" 2016 copyright. I'll stop noting these. 😄
@jsquyres They were previously And they're used from mpi.h as |
I'm guessing the copyright additions that say 2016 started back then, like you said. And the unmodified files with copyright additions had some change we later reverted. I guess removing all 2016 additions wouldn't be that hard. |
I don't know - I think that's getting into a little gray area. True, those changes weren't merged until 2017, but that isn't when the copyright applies. The copyright attests to when those changes were written, not when they became public. So I think @jsquyres is incorrect here and those copyrights should stay the way they are. That's why your IP lawyers always insist you enter dates into your "notebooks" - the legal origination of the idea is the date in the notebook, not the date of publication. |
@markalle Yes, I saw your explanations. But one way or another, we should not be introducing new common symbols. 😄 I guess I'm asking: shouldn't these new |
9b11854
to
c209196
Compare
I pushed again, with another pass through the copyrights, removing about a dozen files where the copyright was the only change, and changing a bunch of 2016 to 2017 |
As I said, I honestly don't believe the changes from 2016 to 2017 are legally correct - but I honestly also don't think it's important as the copyright notices don't really mean anything in this context. |
The expected sequence of events for processing info during object creation is that if there's an incoming info arg, it is opal_info_dup()ed into the obj at obj->s_info first. Then interested components register callbacks for keys they want to know about using opal_infosubscribe_infosubscribe(). Inside info_subscribe_subscribe() the specified callback() is called with whatever matching k/v is in the object's info, or with the default. The return string from the callback goes into the new k/v stored in info, and the input k/v is saved as __IN_<key>/<val>. It's saved the same way whether the input came from info or whether it was a default. A null return from the callback indicates an ignored key/val, and no k/v is stored for it, but an __IN_<key>/<val> is still kept so we still have access to the original. At MPI_*_set_info() time, opal_infosubscribe_change_info() is used. That function calls the registered callbacks for each item in the provided info. If the callback returns non-null, the info is updated with that k/v, or if the callback returns null, that key is deleted from info. An __IN_<key>/<val> is saved either way, and overwrites any previously saved value. When MPI_*_get_info() is called, opal_info_dup_mpistandard() is used, which allows relatively easy changes in interpretation of the standard, by looking at both the <key>/<val> and __IN_<key>/<val> in info. Right now it does 1. includes system extras, eg k/v defaults not expliclty set by the user 2. omits ignored keys 3. shows input values, not callback modifications, eg not the internal values Currently the callbacks are doing things like return some_condition ? "true" : "false" that is, returning static strings that are not to be freed. If the return strings start becoming more dynamic in the future I don't see how unallocated strings could support that, so I'd propose a change for the future that the callback()s registered with info_subscribe_subscribe() do a strdup on their return, and we change the callers of callback() to free the strings it returns (there are only two callers). Rough outline of the smaller changes spread over the less central files: comm.c initialize comm->super.s_info to NULL copy into comm->super.s_info in comm creation calls that provide info OBJ_RELEASE comm->super.s_info at free time comm_init.c initialize comm->super.s_info to NULL file.c copy into file->super.s_info if file creation provides info OBJ_RELEASE file->super.s_info at free time win.c copy into win->super.s_info if win creation provides info OBJ_RELEASE win->super.s_info at free time comm_get_info.c file_get_info.c win_get_info.c change_info() if there's no info attached (shouldn't happen if callbacks are registered) copy the info for the user The other category of change is generally addressing compiler warnings where ompi_info_t and opal_info_t were being used a little too interchangably. An ompi_info_t* contains an opal_info_t*, at &(ompi_info->super) Also this commit updates the copyrights. Signed-off-by: Mark Allen <markalle@us.ibm.com>
c209196
to
482d84b
Compare
Well darn, I didn't see Ralph's copyright comments until I had already made the change. But fortunately I still had a list of every file my previous script had run on, so for every file that used to contain 2016 I switched it back to 2016-2017. And I finally removed the commons. I figured there must have been some reason why Dave had removed the {{{{0}}}} initializer that used to be there, but I can't fathom any reason not to initialize it so I'm guessing it was just the inconvenience of the syntax of figuring out how many braces it should have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm that the common symbols are gone.
Thanks for you patience, @markalle
:bot:help |
:retest: |
:bot:retest: |
This is built off of Dave Solt's earlier pull request #1308.
Roughly what Dave did is:
ompi_{communicator,win,file}_t
to haveopal_infosubscriber_t
as their.super
classompi_info_t
to haveopal_info_t
as its.super
classompi_info_*
calls toopal_info_*
I fixed memory leaks, compiler warnings, and changed the semantics to support the current interpretation of the MPI standard.
The main functions are
opal_infosubscribe_infosubscribe()
: register a callback fn for a particular keyopal_infosubscribe_change_info()
: used atMPI_*_set_info()
timeopal_info_dup_mpistandard()
: extracts the k/v as defined forMPI_*_get_info()