-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topic/datatype #3441
Topic/datatype #3441
Conversation
@ggouaillardet can you try this in a heterogeneous environment ? |
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/805cc76376493b5d175dca9ac64833a2 |
The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/d227a5018dfdd6cb9d3336835b9338bd |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/30256b2e57e9a3a40fc8ef555877a110 |
@bosilca this PR build but all tests crash |
I haven't checked yet, but I think this patch would allow us to drastically reduce the size of the predefined datatypes. If we implement the reduction, we will change the ABI, so this might only be good for 3.0. For the others I will take a look at having a slimmed down version. |
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/2fce5a06ccb4313d246c882baaeb5239 |
The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/90bfae8e90e611662bc9637eb7aca130 |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/cc4639d8b917a0e8d3bac6fd2ed0dbd2 |
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/6fd8c492a06afe924bcdeedd43ed6d5c |
The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/f4a14ddc5fca9a636699d7d67d76ce26 |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/4d387051d59b342b93d537dea980d4c3 |
It turns out that in addition of being a bug fix, this patch also provide a performance boost. We have shred few nsecs from our shared memory latency from 95.450 nsecs to 91.865 nsecs on a E5-2650 v3 @ 2.30GHz. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datatype/getel
and datatype/darray-pack
from the ibm
test suite fail. possible fixes are in the inline comments
@@ -216,9 +216,14 @@ int32_t ompi_datatype_create_darray(int size, | |||
} | |||
|
|||
/* Build up array */ | |||
displs[0] = st_offsets[start_loop]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you really mean to commit changes to this file ?
before entering the loop, the st_offsets
array is uninitialized, and this commit causes a failure in the datatype/darray-pack
test from the ibm
test suite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was not supposed to get pushed. It is from another patch, but it is wrong and incomplete. I will remove.
* when we use get_element_count). Thus, we will pay the cost once per | ||
* datatype, but we will only update this array if/when needed. | ||
*/ | ||
int opal_datatype_compute_ptypes( opal_datatype_t* datatype ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the datatype/getel
test from the ibm
test suite loops forever.
i got things working with the patch below.
opal_datatype_get_element_count
might have to be updated as well, though i could not write a program that causes the same endless loop
diff --git a/opal/datatype/opal_datatype_get_count.c b/opal/datatype/opal_datatype_get_count.c
index a860d5f..c7c87ea 100644
--- a/opal/datatype/opal_datatype_get_count.c
+++ b/opal/datatype/opal_datatype_get_count.c
@@ -169,10 +169,13 @@ int opal_datatype_compute_ptypes( opal_datatype_t* datatype )
while( 1 ) { /* loop forever the exit condition is on the last OPAL_DATATYPE_END_LOOP */
if( OPAL_DATATYPE_END_LOOP == pElems[pos_desc].elem.common.type ) { /* end of the current loop */
if( --(pStack->count) == 0 ) { /* end of loop */
- stack_pos--; pStack--;
- if( stack_pos == -1 ) return 0; /* completed */
+ if( stack_pos == 0 ) return 0; /* completed */
+ stack_pos--;
+ pStack--;
+ pos_desc++;
+ } else {
+ pos_desc = pStack->index + 1;
}
- pos_desc = pStack->index + 1;
continue;
}
if( OPAL_DATATYPE_LOOP == pElems[pos_desc].elem.common.type ) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing. The same pattern appears in several location, I fixed them all. Please review the updated version.
Change the type of the count to be a size_t (it does not alter the total size of the internal structures, so has no impact on the ABI). Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
The internal array of counts of predefined types is now only created when needed, which is either in a heterogeneous environment, or when one call get_elements. It saves space and makes the convertor creation a little faster in some cases. Rearrange the fields in the datatype description structs. The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the static array was only partially created. All predefined types should have the ptypes array created and initialized. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
As Gilles suggested on open-mpi#2535 the opal_unpack_general_function was unpacking based on the requested count and not on the amount of packed data provided. Fixes open-mpi#2535. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
@bosilca this looks good to me |
bot:ompi:retest |
Note that per discussions, this PR caused master to become ABI-incompatible with the v3.x (which will shortly be renamed to be v3.0.x) branch. That's not a problem -- it just means that the next release series with be v4.0.x (vs. v3.1.x.). |
We have not released yet the 3.0. Why not including this change before the official release? This PR is not only a performance fix, but it addresses a real issue (identified by Gilles in #3439). |
@bwbarrett @hppritcha What say you for v3.0.0? |
I'm not happy about a big DDT patch 2 weeks before we do our first RC. Would have been better to fix the bug without all the other work, but I suppose I'm not going to have a choice, am I? |
I like your pragmatism ;) In the defense of the patch, it looks big because it changes all the accesses to a datatype field (from ".btypes" into "->ptypes"), but the logical changes are rather small. Technically it is possible to make a patch for the 3.x that does not have the dynamic ptypes array. This will address the problem Gilles found, but: 1) the code will diverge between 3.x and master (and additional issues might be difficult to fix); 2) it will also require 47 * sizeof(size_t) extra bytes per datatype (including the predefined); and 3) the homogeneous case will not be as streamlined as with this patch. Your call. |
I tried to apply this patch to v3.x and it doesn't patch cleanly.
|
* Don't overflow the internal datatype count. Change the type of the count to be a size_t (it does not alter the total size of the internal structures, so has no impact on the ABI). Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Optimize the datatype creation. The internal array of counts of predefined types is now only created when needed, which is either in a heterogeneous environment, or when one call get_elements. It saves space and makes the convertor creation a little faster in some cases. Rearrange the fields in the datatype description structs. The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the static array was only partially created. All predefined types should have the ptypes array created and initialized. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Fix the boundary computation. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * test/datatype: add test for short unpack on heteregeneous cluster Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Trying to reduce the cost of creating a convertor. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Respect the unpack boundaries. As Gilles suggested on open-mpi#2535 the opal_unpack_general_function was unpacking based on the requested count and not on the amount of packed data provided. Fixes open-mpi#2535. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Indeed, we didn't made the change from OPAL_PTRDIFF_TYPE to ptrdiff_t in master. I have prepared a PR for you. #3504 |
* Don't overflow the internal datatype count. Change the type of the count to be a size_t (it does not alter the total size of the internal structures, so has no impact on the ABI). Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Optimize the datatype creation. The internal array of counts of predefined types is now only created when needed, which is either in a heterogeneous environment, or when one call get_elements. It saves space and makes the convertor creation a little faster in some cases. Rearrange the fields in the datatype description structs. The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the static array was only partially created. All predefined types should have the ptypes array created and initialized. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Fix the boundary computation. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * test/datatype: add test for short unpack on heteregeneous cluster Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Trying to reduce the cost of creating a convertor. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Respect the unpack boundaries. As Gilles suggested on open-mpi#2535 the opal_unpack_general_function was unpacking based on the requested count and not on the amount of packed data provided. Fixes open-mpi#2535. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> (cherry picked from commit open-mpi/ompi@cbf03b3)
* Don't overflow the internal datatype count. Change the type of the count to be a size_t (it does not alter the total size of the internal structures, so has no impact on the ABI). Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Optimize the datatype creation. The internal array of counts of predefined types is now only created when needed, which is either in a heterogeneous environment, or when one call get_elements. It saves space and makes the convertor creation a little faster in some cases. Rearrange the fields in the datatype description structs. The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the static array was only partially created. All predefined types should have the ptypes array created and initialized. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Fix the boundary computation. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * test/datatype: add test for short unpack on heteregeneous cluster Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Trying to reduce the cost of creating a convertor. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Respect the unpack boundaries. As Gilles suggested on open-mpi#2535 the opal_unpack_general_function was unpacking based on the requested count and not on the amount of packed data provided. Fixes open-mpi#2535. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> (cherry picked from commit open-mpi/ompi@cbf03b3)
This started as a fix for #3439 but then evolved into a complete redesign of the handling of some datatype and convertor internal structures (mainly the array of counts for predefined types).