Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic/datatype #3441

Merged
merged 6 commits into from
May 9, 2017
Merged

Topic/datatype #3441

merged 6 commits into from
May 9, 2017

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented May 2, 2017

This started as a fix for #3439 but then evolved into a complete redesign of the handling of some datatype and convertor internal structures (mainly the array of counts for predefined types).

@bosilca bosilca requested a review from ggouaillardet May 2, 2017 06:00
@bosilca bosilca added this to the v2.1.2 milestone May 2, 2017
@bosilca
Copy link
Member Author

bosilca commented May 2, 2017

@ggouaillardet can you try this in a heterogeneous environment ?

@ibm-ompi
Copy link

ibm-ompi commented May 2, 2017

The IBM CI (GNU Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/805cc76376493b5d175dca9ac64833a2

@ibm-ompi
Copy link

ibm-ompi commented May 2, 2017

The IBM CI (XL Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/d227a5018dfdd6cb9d3336835b9338bd

@ibm-ompi
Copy link

ibm-ompi commented May 2, 2017

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/30256b2e57e9a3a40fc8ef555877a110

@ggouaillardet
Copy link
Contributor

@bosilca this PR build but all tests crash
i will be off until Monday, and i will unlikely have any chance to review this before.
once this is fixed, do you plan to backport it to the release branches ? or make a simpler fix just for the release branches ?

@bosilca
Copy link
Member Author

bosilca commented May 2, 2017

I haven't checked yet, but I think this patch would allow us to drastically reduce the size of the predefined datatypes. If we implement the reduction, we will change the ABI, so this might only be good for 3.0. For the others I will take a look at having a slimmed down version.

@ibm-ompi
Copy link

ibm-ompi commented May 3, 2017

The IBM CI (GNU Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/2fce5a06ccb4313d246c882baaeb5239

@ibm-ompi
Copy link

ibm-ompi commented May 3, 2017

The IBM CI (XL Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/90bfae8e90e611662bc9637eb7aca130

@ibm-ompi
Copy link

ibm-ompi commented May 3, 2017

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/cc4639d8b917a0e8d3bac6fd2ed0dbd2

@ibm-ompi
Copy link

ibm-ompi commented May 3, 2017

The IBM CI (GNU Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/6fd8c492a06afe924bcdeedd43ed6d5c

@ibm-ompi
Copy link

ibm-ompi commented May 3, 2017

The IBM CI (XL Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/f4a14ddc5fca9a636699d7d67d76ce26

@ibm-ompi
Copy link

ibm-ompi commented May 3, 2017

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/4d387051d59b342b93d537dea980d4c3

@bosilca
Copy link
Member Author

bosilca commented May 4, 2017

It turns out that in addition of being a bug fix, this patch also provide a performance boost. We have shred few nsecs from our shared memory latency from 95.450 nsecs to 91.865 nsecs on a E5-2650 v3 @ 2.30GHz.

Copy link
Contributor

@ggouaillardet ggouaillardet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datatype/getel and datatype/darray-pack from the ibm test suite fail. possible fixes are in the inline comments

@@ -216,9 +216,14 @@ int32_t ompi_datatype_create_darray(int size,
}

/* Build up array */
displs[0] = st_offsets[start_loop];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you really mean to commit changes to this file ?
before entering the loop, the st_offsets array is uninitialized, and this commit causes a failure in the datatype/darray-pack test from the ibm test suite

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was not supposed to get pushed. It is from another patch, but it is wrong and incomplete. I will remove.

* when we use get_element_count). Thus, we will pay the cost once per
* datatype, but we will only update this array if/when needed.
*/
int opal_datatype_compute_ptypes( opal_datatype_t* datatype )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the datatype/getel test from the ibm test suite loops forever.

i got things working with the patch below.
opal_datatype_get_element_count might have to be updated as well, though i could not write a program that causes the same endless loop

diff --git a/opal/datatype/opal_datatype_get_count.c b/opal/datatype/opal_datatype_get_count.c
index a860d5f..c7c87ea 100644
--- a/opal/datatype/opal_datatype_get_count.c
+++ b/opal/datatype/opal_datatype_get_count.c
@@ -169,10 +169,13 @@ int opal_datatype_compute_ptypes( opal_datatype_t* datatype )
     while( 1 ) {  /* loop forever the exit condition is on the last OPAL_DATATYPE_END_LOOP */
         if( OPAL_DATATYPE_END_LOOP == pElems[pos_desc].elem.common.type ) { /* end of the current loop */
             if( --(pStack->count) == 0 ) { /* end of loop */
-                stack_pos--; pStack--;
-                if( stack_pos == -1 ) return 0;  /* completed */
+                if( stack_pos == 0 ) return 0;  /* completed */
+                stack_pos--;
+                pStack--;
+                pos_desc++;
+            } else {
+                pos_desc = pStack->index + 1;
             }
-            pos_desc = pStack->index + 1;
             continue;
         }
         if( OPAL_DATATYPE_LOOP == pElems[pos_desc].elem.common.type ) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing. The same pattern appears in several location, I fixed them all. Please review the updated version.

bosilca added 6 commits May 8, 2017 23:31
Change the type of the count to be a size_t (it does not alter the total
size of the internal structures, so has no impact on the ABI).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
The internal array of counts of predefined types is now only created
when needed, which is either in a heterogeneous environment, or when
one call get_elements. It saves space and makes the convertor creation a
little faster in some cases.

Rearrange the fields in the datatype description structs.

The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the
static array was only partially created. All predefined types should
have the ptypes array created and initialized.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
As Gilles suggested on open-mpi#2535 the opal_unpack_general_function was
unpacking based on the requested count and not on the amount of packed
data provided.
Fixes open-mpi#2535.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
@ggouaillardet
Copy link
Contributor

@bosilca this looks good to me
i also tried in an hetero environment and could not find any issue

@jsquyres
Copy link
Member

jsquyres commented May 9, 2017

bot:ompi:retest

@bosilca bosilca merged commit cbf03b3 into open-mpi:master May 9, 2017
@bosilca bosilca deleted the topic/datatype branch May 9, 2017 13:31
@jsquyres
Copy link
Member

jsquyres commented May 9, 2017

Note that per discussions, this PR caused master to become ABI-incompatible with the v3.x (which will shortly be renamed to be v3.0.x) branch. That's not a problem -- it just means that the next release series with be v4.0.x (vs. v3.1.x.).

@bosilca
Copy link
Member Author

bosilca commented May 9, 2017

We have not released yet the 3.0. Why not including this change before the official release? This PR is not only a performance fix, but it addresses a real issue (identified by Gilles in #3439).

@jsquyres
Copy link
Member

jsquyres commented May 9, 2017

@bwbarrett @hppritcha What say you for v3.0.0?

@jsquyres jsquyres removed this from the v2.1.2 milestone May 9, 2017
@bwbarrett
Copy link
Member

I'm not happy about a big DDT patch 2 weeks before we do our first RC. Would have been better to fix the bug without all the other work, but I suppose I'm not going to have a choice, am I?

@bosilca
Copy link
Member Author

bosilca commented May 9, 2017

I like your pragmatism ;) In the defense of the patch, it looks big because it changes all the accesses to a datatype field (from ".btypes" into "->ptypes"), but the logical changes are rather small.

Technically it is possible to make a patch for the 3.x that does not have the dynamic ptypes array. This will address the problem Gilles found, but: 1) the code will diverge between 3.x and master (and additional issues might be difficult to fix); 2) it will also require 47 * sizeof(size_t) extra bytes per datatype (including the predefined); and 3) the homogeneous case will not be as streamlined as with this patch. Your call.

@hppritcha
Copy link
Member

hppritcha commented May 10, 2017

I tried to apply this patch to v3.x and it doesn't patch cleanly.

pn1249323:~/ompi (v3.x)$ git am patch_file
Applying: Don't overflow the internal datatype count. Change the type of the count to be a size_t (it does not alter the total size of the internal structures, so has no impact on the ABI).
error: patch failed: opal/datatype/opal_datatype_internal.h:155
error: opal/datatype/opal_datatype_internal.h: patch does not apply
error: patch failed: opal/datatype/opal_datatype_optimize.c:50
error: opal/datatype/opal_datatype_optimize.c: patch does not apply
Patch failed at 0001 Don't overflow the internal datatype count. Change the type of the count to be a size_t (it does not alter the total size of the internal structures, so has no impact on the ABI).
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
pn1249323:~/ompi (v3.x|AM 1/6)$ git am --abort
pn1249323:~/ompi (v3.x)$ git show HEAD

bosilca added a commit to bosilca/ompi that referenced this pull request May 10, 2017
* Don't overflow the internal datatype count.
Change the type of the count to be a size_t (it does not alter the total
size of the internal structures, so has no impact on the ABI).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Optimize the datatype creation.
The internal array of counts of predefined types is now only created
when needed, which is either in a heterogeneous environment, or when
one call get_elements. It saves space and makes the convertor creation a
little faster in some cases.

Rearrange the fields in the datatype description structs.

The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the
static array was only partially created. All predefined types should
have the ptypes array created and initialized.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Fix the boundary computation.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* test/datatype: add test for short unpack on heteregeneous cluster

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Trying to reduce the cost of creating a convertor.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Respect the unpack boundaries.
As Gilles suggested on open-mpi#2535 the opal_unpack_general_function was
unpacking based on the requested count and not on the amount of packed
data provided.
Fixes open-mpi#2535.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
@bosilca
Copy link
Member Author

bosilca commented May 10, 2017

Indeed, we didn't made the change from OPAL_PTRDIFF_TYPE to ptrdiff_t in master. I have prepared a PR for you. #3504

ggouaillardet pushed a commit to ggouaillardet/ompi that referenced this pull request May 24, 2017
* Don't overflow the internal datatype count.
Change the type of the count to be a size_t (it does not alter the total
size of the internal structures, so has no impact on the ABI).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Optimize the datatype creation.
The internal array of counts of predefined types is now only created
when needed, which is either in a heterogeneous environment, or when
one call get_elements. It saves space and makes the convertor creation a
little faster in some cases.

Rearrange the fields in the datatype description structs.

The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the
static array was only partially created. All predefined types should
have the ptypes array created and initialized.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Fix the boundary computation.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* test/datatype: add test for short unpack on heteregeneous cluster

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Trying to reduce the cost of creating a convertor.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Respect the unpack boundaries.
As Gilles suggested on open-mpi#2535 the opal_unpack_general_function was
unpacking based on the requested count and not on the amount of packed
data provided.
Fixes open-mpi#2535.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

(cherry picked from commit open-mpi/ompi@cbf03b3)
karasevb pushed a commit to karasevb/ompi that referenced this pull request Jun 8, 2017
* Don't overflow the internal datatype count.
Change the type of the count to be a size_t (it does not alter the total
size of the internal structures, so has no impact on the ABI).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Optimize the datatype creation.
The internal array of counts of predefined types is now only created
when needed, which is either in a heterogeneous environment, or when
one call get_elements. It saves space and makes the convertor creation a
little faster in some cases.

Rearrange the fields in the datatype description structs.

The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the
static array was only partially created. All predefined types should
have the ptypes array created and initialized.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Fix the boundary computation.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* test/datatype: add test for short unpack on heteregeneous cluster

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Trying to reduce the cost of creating a convertor.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Respect the unpack boundaries.
As Gilles suggested on open-mpi#2535 the opal_unpack_general_function was
unpacking based on the requested count and not on the amount of packed
data provided.
Fixes open-mpi#2535.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

(cherry picked from commit open-mpi/ompi@cbf03b3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants