Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

communicator: add errhandler_type back #11818

Merged
merged 1 commit into from
Jul 13, 2023

Conversation

wzamazon
Copy link
Contributor

@wzamazon wzamazon commented Jul 11, 2023

Previous commit 2d68804 removed "errhandler_type" from communicator, and replaced it with
"errhandler->eh_mpi_object_type".

However, for an errhandler to be invoked on a communicator, errhandler_type must always be OMPI_ERRHANDLER_TYPE_COMM. But, errhandler->eh_mpi_object_type can be
OMPI_ERRHANDLER_TYPE_PREDEFINED for predefined error handlers such as MPI_ERRORS_ARE_FATAL.

This patch added "errhandler_type" back to communicator to address the issue.

Refs #11817

@@ -71,6 +71,7 @@ int main(int argc, char **argv) {
#else
GAP_CHECK("error_handler", test_comm, error_handler, c_f_to_c_index, 1);
#endif
GAP_CHECK("errhandler_type", test_comm, errhandler_type, error_handler, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is another line that needs to be added back (in fact, looking at it now I am not sure the code was correct before, but I am not completely sure I understand how the debugger interfaces work). It seems that we are always checking for the distance from the previous element of the structure.
GAP_CHECK("c_pml_comm", test_comm, c_pml_comm, errhandler_type, 1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

added in new revision

@@ -125,7 +126,7 @@ int main(int argc, char **argv) {
GAP_CHECK("w_keyhash", test_win, w_keyhash, w_flags, 1);
GAP_CHECK("w_f_to_c_index", test_win, w_f_to_c_index, w_keyhash, 1);
GAP_CHECK("error_handler", test_win, error_handler, w_f_to_c_index, 1);

GAP_CHECK("errhandler_type", test_win, errhandler_type, error_handler, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well:
GAP_CHECK("w_osc_module", test_win, w_osc_module, errhandler_type, 1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -145,6 +146,7 @@ int main(int argc, char **argv) {
GAP_CHECK("f_flags", test_file, f_flags, f_amode, 1);
GAP_CHECK("f_f_to_c_index", test_file, f_f_to_c_index, f_flags, 1);
GAP_CHECK("error_handler", test_file, error_handler, f_f_to_c_index, 1);
GAP_CHECK("errhandler_type", test_file, errhandler_type, error_handler, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here too
GAP_CHECK("f_io_version", test_file, f_io_version, errhandler_type, 1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@wzamazon wzamazon force-pushed the communicator_add_errhandler_type branch from c6b8751 to 1022d0e Compare July 11, 2023 14:49
@wzamazon wzamazon requested a review from edgargabriel July 11, 2023 14:49
Copy link
Member

@edgargabriel edgargabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good as far as I can tell

@bosilca
Copy link
Member

bosilca commented Jul 11, 2023

Wait a little, I don't think we need to bring this back. Let me check something in the code and I'll get back to you later today.

@jsquyres
Copy link
Member

Even though it's probably obvious, let me state here for the record that this also affects 5.0 as well.

@jsquyres
Copy link
Member

@wzamazon You cited in #11817 that this issue stemmed from 2d68804, where we made some adjustments to the communicator struct. One of the foals of 2d68804 was:

This allows us to reduce the size of PREDEFINED_COMMUNICATOR_PAD back to 512 to maintain backwards compatibility with the ompi 4.1.x release series.

Does adding the errhandler_type back into the communicator struct jeopardize the ability to keep PREDEFINED_COMMUNICATOR_PAD at 512?

@wzamazon
Copy link
Contributor Author

Does adding the errhandler_type back into the communicator struct jeopardize the ability to keep PREDEFINED_COMMUNICATOR_PAD at 512?

No. The original commit 2d68804 removed 2 fields from ompi_communicator_t to make it below 512. One is c_name which is 64 bytes, the other is errhandler_type.

After 2d68804, sizeof(ompi_communicator_t) is 448 byte.

The proposed commit increased the sizeof(ompi_communicator_t) to 456 bytes, which is still below 512.

Previous commit 2d68804 removed "errhandler_type"
from communicator, and replaced it with
"errhandler->eh_mpi_object_type".

However, for an errhandler to be invoked on a communicator,
errhandler_type must always be OMPI_ERRHANDLER_TYPE_COMM.
But, errhandler->eh_mpi_object_type can be
OMPI_ERRHANDLER_TYPE_PREDEFINED for predefined error handlers
such as MPI_ERRORS_ARE_FATAL.

This patch added "errhandler_type" back to communicator
to address the issue.

Signed-off-by: Wei Zhang <wzam@amazon.com>
@wzamazon wzamazon force-pushed the communicator_add_errhandler_type branch from 1022d0e to 216c221 Compare July 12, 2023 13:04
@jsquyres
Copy link
Member

Wait a little, I don't think we need to bring this back. Let me check something in the code and I'll get back to you later today.

@bosilca Thoughts?

@bosilca
Copy link
Member

bosilca commented Jul 12, 2023

Yes, it took me a little longer to come up with a patch. Here is what I propose instead of this PR:

diff --git a/ompi/errhandler/errhandler.c b/ompi/errhandler/errhandler.c
index bcdd2f345c..33346f1574 100644
--- a/ompi/errhandler/errhandler.c
+++ b/ompi/errhandler/errhandler.c
@@ -282,7 +282,7 @@ ompi_errhandler_t *ompi_errhandler_create(ompi_errhandler_type_t object_type,
 
             new_errhandler->eh_mpi_object_type = object_type;
             new_errhandler->eh_lang = lang;
-            switch (object_type ) {
+            switch (object_type & OMPI_ERRHANDLER_TYPE_OBJECT) {
             case OMPI_ERRHANDLER_TYPE_COMM:
                 new_errhandler->eh_comm_fn = (MPI_Comm_errhandler_function *)func;
                 break;
@@ -388,10 +388,10 @@ int ompi_errhandler_proc_failed_internal(ompi_proc_t* ompi_proc, int status, boo
                              OMPI_NAME_PRINT(&ompi_proc->super.proc_name),
                              ompi_comm_print_cid(comm),
                              proc_rank,
-                             (OMPI_ERRHANDLER_TYPE_PREDEFINED == comm->error_handler->eh_mpi_object_type ? "P" :
-                              (OMPI_ERRHANDLER_TYPE_COMM == comm->error_handler->eh_mpi_object_type ? "C" :
-                               (OMPI_ERRHANDLER_TYPE_WIN == comm->error_handler->eh_mpi_object_type ? "W" :
-                                (OMPI_ERRHANDLER_TYPE_FILE == comm->error_handler->eh_mpi_object_type ? "F" : "U") ) ) )
+                             (ompi_errhandler_is_of_type(comm->error_handler, OMPI_ERRHANDLER_TYPE_PREDEFINED) ? "P" :
+                              (ompi_errhandler_is_of_type(comm->error_handler, OMPI_ERRHANDLER_TYPE_COMM) ? "C" :
+                               (ompi_errhandler_is_of_type(comm->error_handler, OMPI_ERRHANDLER_TYPE_WIN) ? "W" :
+                                (ompi_errhandler_is_of_type(comm->error_handler, OMPI_ERRHANDLER_TYPE_FILE) ? "F" : "U") ) ) )
                              ));
     }
 
diff --git a/ompi/errhandler/errhandler.h b/ompi/errhandler/errhandler.h
index 519a84d996..a26e5e1c89 100644
--- a/ompi/errhandler/errhandler.h
+++ b/ompi/errhandler/errhandler.h
@@ -82,11 +82,15 @@ typedef enum ompi_errhandler_lang_t ompi_errhandler_lang_t;
  * Enum used to describe what kind MPI object an error handler is used for
  */
 enum ompi_errhandler_type_t {
-    OMPI_ERRHANDLER_TYPE_PREDEFINED,
-    OMPI_ERRHANDLER_TYPE_COMM,
-    OMPI_ERRHANDLER_TYPE_WIN,
-    OMPI_ERRHANDLER_TYPE_FILE,
-    OMPI_ERRHANDLER_TYPE_INSTANCE,
+    OMPI_ERRHANDLER_TYPE_PREDEFINED = 0x0001,
+    OMPI_ERRHANDLER_TYPE_COMM       = 0x0010,
+    OMPI_ERRHANDLER_TYPE_WIN        = 0x0020,
+    OMPI_ERRHANDLER_TYPE_FILE       = 0x0040,
+    OMPI_ERRHANDLER_TYPE_INSTANCE   = 0x0080,
+    OMPI_ERRHANDLER_TYPE_OBJECT     = OMPI_ERRHANDLER_TYPE_COMM |
+                                      OMPI_ERRHANDLER_TYPE_WIN |
+                                      OMPI_ERRHANDLER_TYPE_FILE |
+                                      OMPI_ERRHANDLER_TYPE_INSTANCE
 };
 typedef enum ompi_errhandler_type_t ompi_errhandler_type_t;
 
@@ -442,10 +446,22 @@ OMPI_DECLSPEC void ompi_errhandler_registration_callback(int status,
  */
 static inline bool ompi_errhandler_is_intrinsic(ompi_errhandler_t *errhandler)
 {
-    if ( OMPI_ERRHANDLER_TYPE_PREDEFINED == errhandler->eh_mpi_object_type )
-	return true;
+    return (OMPI_ERRHANDLER_TYPE_PREDEFINED & errhandler->eh_mpi_object_type);
+}
 
-    return false;
+/**
+ * Check to see if the errorhandler is of the specified type
+ *
+ * @param errhandle The errhandler to check
+ * @param type      The type to be checked upon
+ *
+ * @return true  If the errhandler is of the specified type
+ * @return false otherwise
+ */
+static inline bool ompi_errhandler_is_of_type(ompi_errhandler_t *errhandler,
+                                              ompi_errhandler_type_t type)
+{
+    return (type & errhandler->eh_mpi_object_type);
 }
 
 #if OPAL_ENABLE_FT_MPI
diff --git a/ompi/errhandler/errhandler_invoke.c b/ompi/errhandler/errhandler_invoke.c
index fc9a0818e1..42ba2abaea 100644
--- a/ompi/errhandler/errhandler_invoke.c
+++ b/ompi/errhandler/errhandler_invoke.c
@@ -79,7 +79,7 @@ int ompi_errhandler_invoke(ompi_errhandler_t *errhandler, void *mpi_object,
     /* Figure out what kind of errhandler it is, figure out if it's
        fortran or C, and then invoke it */
 
-    switch (object_type) {
+    switch (object_type & OMPI_ERRHANDLER_TYPE_OBJECT) {
     case OMPI_ERRHANDLER_TYPE_COMM:
         comm = (ompi_communicator_t *) mpi_object;
         switch (errhandler->eh_lang) {
diff --git a/ompi/mpi/c/comm_set_errhandler.c b/ompi/mpi/c/comm_set_errhandler.c
index 0be31b7640..1c865845e5 100644
--- a/ompi/mpi/c/comm_set_errhandler.c
+++ b/ompi/mpi/c/comm_set_errhandler.c
@@ -57,10 +57,9 @@ int MPI_Comm_set_errhandler(MPI_Comm comm, MPI_Errhandler errhandler)
                                           FUNC_NAME);
         } else if (NULL == errhandler ||
                    MPI_ERRHANDLER_NULL == errhandler ||
-                   ( OMPI_ERRHANDLER_TYPE_COMM != errhandler->eh_mpi_object_type &&
-                     OMPI_ERRHANDLER_TYPE_PREDEFINED != errhandler->eh_mpi_object_type) ) {
-            return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_ARG,
-                                          FUNC_NAME);
+                   (!ompi_errhandler_is_of_type(errhandler, OMPI_ERRHANDLER_TYPE_COMM) &&
+                    !ompi_errhandler_is_intrinsic(errhandler)) ) {
+            return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_ARG, FUNC_NAME);
         }
     }
 
diff --git a/ompi/mpi/c/file_set_errhandler.c b/ompi/mpi/c/file_set_errhandler.c
index 7590ab6c85..9dd95f8c7a 100644
--- a/ompi/mpi/c/file_set_errhandler.c
+++ b/ompi/mpi/c/file_set_errhandler.c
@@ -57,8 +57,8 @@ int MPI_File_set_errhandler( MPI_File file, MPI_Errhandler errhandler)
                                           FUNC_NAME);
         } else if (NULL == errhandler ||
                    MPI_ERRHANDLER_NULL == errhandler ||
-                   (OMPI_ERRHANDLER_TYPE_FILE != errhandler->eh_mpi_object_type &&
-		    OMPI_ERRHANDLER_TYPE_PREDEFINED != errhandler->eh_mpi_object_type) ) {
+                   (!ompi_errhandler_is_of_type(errhandler, OMPI_ERRHANDLER_TYPE_FILE)  &&
+                    !ompi_errhandler_is_intrinsic(errhandler)) ) {
             return OMPI_ERRHANDLER_INVOKE(file, MPI_ERR_ARG, FUNC_NAME);
         }
     }
diff --git a/ompi/mpi/c/session_set_errhandler.c b/ompi/mpi/c/session_set_errhandler.c
index bb6c12af62..8b9da3adb8 100644
--- a/ompi/mpi/c/session_set_errhandler.c
+++ b/ompi/mpi/c/session_set_errhandler.c
@@ -52,7 +52,7 @@ int MPI_Session_set_errhandler(MPI_Session session, MPI_Errhandler errhandler)
         } else if (NULL == errhandler ||
                    MPI_ERRHANDLER_NULL == errhandler ||
                    ( OMPI_ERRHANDLER_TYPE_INSTANCE != errhandler->eh_mpi_object_type &&
-                     OMPI_ERRHANDLER_TYPE_PREDEFINED != errhandler->eh_mpi_object_type) ) {
+                     !ompi_errhandler_is_intrinsic(errhandler)) ) {
             return OMPI_ERRHANDLER_INVOKE(session, MPI_ERR_ARG,
                                           FUNC_NAME);
         }
diff --git a/ompi/mpi/c/win_set_errhandler.c b/ompi/mpi/c/win_set_errhandler.c
index 820ba51b33..ffd0884ed7 100644
--- a/ompi/mpi/c/win_set_errhandler.c
+++ b/ompi/mpi/c/win_set_errhandler.c
@@ -52,8 +52,8 @@ int MPI_Win_set_errhandler(MPI_Win win, MPI_Errhandler errhandler)
                                           FUNC_NAME);
         } else if (NULL == errhandler ||
                    MPI_ERRHANDLER_NULL == errhandler ||
-                   (OMPI_ERRHANDLER_TYPE_WIN != errhandler->eh_mpi_object_type &&
-                    OMPI_ERRHANDLER_TYPE_PREDEFINED != errhandler->eh_mpi_object_type) ) {
+                   (!ompi_errhandler_is_of_type(errhandler, OMPI_ERRHANDLER_TYPE_WIN) &&
+                    !ompi_errhandler_is_intrinsic(errhandler)) ) {
             return OMPI_ERRHANDLER_INVOKE(win, MPI_ERR_ARG, FUNC_NAME);
         }
     }

@wzamazon
Copy link
Contributor Author

wzamazon commented Jul 12, 2023

@bosilca

Can you explain how your patch solved the problem?

The issue is that OMPI_ERRHANDLER_INVOKE did not pass the correct errhandler_type to ompi_errhandler_invoke.

I do not understand how your patch solved the issue without modifying OMPI_ERRHANDLER_INVOKE

@bosilca
Copy link
Member

bosilca commented Jul 12, 2023

predefined error handlers are not tagged with a type, and can be called on any objects (communicator, windows and files). We just need to allow a distinction between these. My patch does modify ompi_errhandler_invoke by making sure we are testing with the correct part of the eh_mpi_object_type.

Do you have a test that still fails ?

@wzamazon
Copy link
Contributor Author

Do you have a test that still fails ?

Let me test

@wzamazon
Copy link
Contributor Author

wzamazon commented Jul 12, 2023

@bosilca

I applied your patch and tested using ompi-tests/intel_tests/MPI_Errhandler_fatal_c, and the test still fails.

From what I can see.

Whe predefined error handler was used, the object_type passed from OMPI_ERRHANDLER_INVOKE to ompi_errhander_invoke is 1, which is OMPI_ERRHANDLER_TYPE_PREDEFINED.

In function ompi_errhandler_invoke, there is a switch case loop like the following

switch(object_type & OMPI_ERRHANDLER_TYPE_OBJECT) {
case OMPI_ERRHANDLER_TYPE_COMM:
  invoke comm_fn;
  break;
case OMPI_ERRHANDLER_TYPE_WIN:
  invoke win_fn;
  break;
case OMPI_ERRHANDLER_TYPE_FILE:
  invoke win_fn;
  break;
}

For the code to work, the result of object_type & OMPI_ERRHANDLER_TYPE_OBJECT must be OMPI_ERRHANDLER_TYPE_COMM (which is 0x10) when object is a communicator.

But because object_type is 1, the value of object_type & OMPI_ERRHANDLER_TYPE_OBJECT is 0, so comm_fn was not called.

@wzamazon
Copy link
Contributor Author

On a higher level, because the predefined error handler can be applied to 3 types of object: communicator, win, file. There is no way we can use a variable inside the err handler to determine the MPI object type. Unless, we make a clone of the predefined errhandler each time it is used in MPI_Errhandler_set, and set the errhandler_type of the cloned error handler.

@bosilca
Copy link
Member

bosilca commented Jul 12, 2023

ok, I misunderstood this whole issue here. You are correct we need to have the type of the error handler associated with each object, to know how to manipulate the error handler when necessary.

@bosilca
Copy link
Member

bosilca commented Jul 12, 2023

To conclude this we are using a field in the MPI object to avoid having to specialize the call to the OMPI_ERRHANDLER_INVOKE macro. Laziness at its best.

@wzamazon
Copy link
Contributor Author

@bosilca

Thank you for the summarization!

@jsquyres Can you help with merging?

@bosilca bosilca merged commit a8fbb4d into open-mpi:main Jul 13, 2023
@gpaulsen
Copy link
Member

We need this in v5.0.x

@wenduwan
Copy link
Contributor

wenduwan commented Aug 1, 2023

@gpaulsen @edgargabriel Backported in 1f98cd3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants