Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Romio fixes from mpich v41x #8371

Merged
merged 5 commits into from
Oct 11, 2021

Conversation

markalle
Copy link
Contributor

@markalle markalle commented Jan 12, 2021

[from mpich] romio: avoid freeing NULLs in gpfs

mpich has two commits about freeing NULLs in GPFS, titled:

"romio gpfs: avoid freeing NULL"
and
"ROMIO: gpfs: avoid freeing NULL in more places"

The code isn't a perfect match vs the romio we have here
in ompi version 4.1.x, but it's essentially turning a
handful of "ADIOI_Free(x)" into "if (x) { ADIOI_Free(x); }".
So that's what this commit does too.

Signed-off-by: Mark Allen markalle@us.ibm.com

[from mpich] romio GPFS: missing initialization

Merging a romio commit that was accepted by mpich:

Romio GPFS: missing initialization

Romio GPFS had a runtime problem due to a .mem_ptrs field that
was uninitialized that it was potentially trying to free.

Signed-off-by: Mark Allen markalle@us.ibm.com

[from mpich] darray (romio) needs a resize in the block()/cyclic() su…

…btype creation

Merging a romio commit that was accepted by mpich:

darray (romio) needs a resize in the block()/cyclic() subtype creation

When Type_create_darray() uses the block()/cyclic() function to
construct a subtype for whatever dimension it's processing, it
needs to resize the resulting type before it goes into the type
construction as the input for processing the next dimension.

The same problem is in the main path's type creation, and in romio's
mirroring of it.

Gist for a testcase:
https://gist.github.com/markalle/940de93d64fd779e304ee124855b8e6a

The darray_bug_romio.c testcase creates a darray using

  • 4 ranks in a 2x2 grid
  • looking at the type made for rank 0
  • inner dimension: 4 ints distributed block over 2 ranks with 2 items each
  • outer dimension: 6 of the above distributed cyclic over 2 ranks with 2 items each

The type created by processing the first dimension should look like this
[ x x . . ]

And then distributing those for the second dimension becomes

[ x x x x ]
[ . . . . ]
[ . . . . ]
[ . . . . ]
[ x x x x ]
[ . . . . ]

Going to the MPI standard to justify why the first layout is right,
it's where the definiton of the cyclic() function has a ub_marker
of gsize*ex, eg 4 ints for that first type.

Signed-off-by: Mark Allen markalle@us.ibm.com

[from mpich] romio flatten has flat->count consistency problems

Merging a romio commit that was accepted by mpich:

romio flatten has flat->count consistency problems

ADIOI_Count_contiguous_blocks and ADIOI_Flatten are very similar functions,
but they diverge a lot around the edges and in degenerate cases. In Spectrum
MPI I spent some time making them consistent with each other but
found that to be a losing battle.

So the approach I used here is to not have Count() be as definitive,
and rather let Flatten() have the last word on what the final flat->count
really is. Eg Flatten's *curr_index is the real count.

The changes I made are

  1. Fix a couple places in Flatten where *curr_index was updated out
    of sync with what was actually being added to flat->indices[] and
    flat->blocklens[]. That's one of the important book-keeping rules
    Flatten should follow. There were a couple places (when counts[] arrays
    have 0s) where that wasn't the case (see the "nonzeroth" var in this
    commit).

  2. The main change I made was to reset flat->count based on
    Flatten's curr_index. This is because the divergence between the
    two functions is too great to reliably fix.

  3. A third change is just a safety valve, using a
    flatlist_node_grow() function just in case the
    Count function returns a smaller value than Flatten ends up
    trying to write into the array, and related to this I
    got rid of the various --(flat->count) lines, since that var now
    represents the allocated size of the array, until right after
    the Flatten function when it's reset to represent the data written
    to the array like it used to be.

I don't think we even need ADIOI_Count_contiguous_blocks() anymore,
but I didn't remove it.

Signed-off-by: Mark Allen markalle@us.ibm.com

[from mpich] make ADIOI_GEN_WriteStrided not step on itself

Merging a romio commit that was accepted by mpich:
make ADIOI_GEN_WriteStrided not step on itself

The ADIOI_GEN_WriteStrided funcion uses data sieving on non-contiguous
types. That is, if it wants to write data at locations
[W...X...Y...Z...]
it reads the whole buffer
[dddddddddddddddd]
changes the locations it wants to write to
[WdddXdddYdddZddd]
then writes the whole thing back. It uses locks to make this safe, but
the problem is this only protects against other parts of the product that
are using locks. And without this PR a peer who is simultaneously making
a simple non-contiguous write wouldn't have locked.

A testcase to demonstrate the original problem is here:
https://gist.github.com/markalle/d7da240c19e57f095c5d1b13240dae24

% mpicc -o x romio_write_timing.c
% mpirun -np 4 ./x

Note: you need to use a filesystem that uses ADIOI_GEN_WriteStrided to
hit the bug. I was using GPFS.

This commit is pulled from wkliao after discussing where to put the
new lock. It adds locks to contiguous writes in independent write
functions when data sieving write is not disabled

Signed-off-by: Mark Allen markalle@us.ibm.com

bot:notacherrypick

@jsquyres
Copy link
Member

jsquyres commented Feb 1, 2021

As of 1 Feb, 2 of these commits are merged upstream in MPICH, 2 are not yet merged.

Are you following upstream to get these other 2 commits merged upstream?

@markalle
Copy link
Contributor Author

markalle commented Feb 3, 2021

Yeah, I'm sure they'll accept this one:
pmodels/mpich#4995
because it already had a previous version reviewed and discussed and this is their re-worked version of it.

And this one I'm still waiting for somebody to look at:
pmodels/mpich#4948

I haven't really been pushing though, more just waiting. So I'll add a comment and @ a couple people to see if we can get them in (I don't have permission to add a reviewer over there)

@jsquyres
Copy link
Member

jsquyres commented Feb 4, 2021

Hey @roblatham00 -- FYI. 😄

@markalle
Copy link
Contributor Author

Just to update: I've made small changes based on mpich's comments on the above mpich romio PRs 4995 and 4948, so once those go in I'll update both of these v4.* merges to match

@gpaulsen
Copy link
Member

Note that the v4.0.x has already been merged #8370, to be released in v4.0.6....

@gpaulsen
Copy link
Member

gpaulsen commented Mar 4, 2021

As of March 4, only one outstanding upstream commit: pmodels/mpich#4948
I pinged them again today.

@rajachan
Copy link
Member

Looks like we are still waiting for resolution on pmodels/mpich#5101, is that right @markalle?

@jsquyres jsquyres modified the milestones: v4.1.1, v4.1.2 Apr 24, 2021
@gpaulsen
Copy link
Member

@markalle ?

@markalle
Copy link
Contributor Author

It's still the case that v41x isn't getting a full romio refresh like v5 right? If that's the case, then I think this PR is ready as is. The 5101 from mpich ended up getting replaced by a different solution to make sure there was consistency in the check for whether aligned_alloc() was declared. But neither the old nor the new solution is needed over here in the v41x branch (and it isn't present in the 4 commits making up this PR).

@gpaulsen
Copy link
Member

gpaulsen commented Aug 9, 2021

@ggouaillardet Can you please review?

@markalle Has this been accepted upstream ROMIO? Any changes needed to this PR?

mpich has two commits about freeing NULLs in GPFS, titled:

"romio gpfs: avoid freeing NULL"
and
"ROMIO: gpfs: avoid freeing NULL in more places"

The code isn't a perfect match vs the romio we have here
in ompi version 4.1.x, but it's essentially turning a
handful of "ADIOI_Free(x)" into "if (x) { ADIOI_Free(x); }".
So that's what this commit does too.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
Merging a romio commit that was accepted by mpich:

> romio GPFS: missing initialization
>
> Romio GPFS had a runtime problem due to a .mem_ptrs field that
> was uninitialized that it was potentially trying to free.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
…btype creation

Merging a romio commit that was accepted by mpich:

> darray (romio) needs a resize in the block()/cyclic() subtype creation
>
> When Type_create_darray() uses the block()/cyclic() function to
> construct a subtype for whatever dimension it's processing, it
> needs to resize the resulting type before it goes into the type
> construction as the input for processing the next dimension.
>
> The same problem is in the main path's type creation, and in romio's
> mirroring of it.
>
> Gist for a testcase:
> https://gist.github.com/markalle/940de93d64fd779e304ee124855b8e6a
>
> The darray_bug_romio.c testcase creates a darray using
> * 4 ranks in a 2x2 grid
> * looking at the type made for rank 0
> * inner dimension: 4 ints distributed block over 2 ranks with 2 items each
> * outer dimension: 6 of the above distributed cyclic over 2 ranks with 2 items each
>
> The type created by processing the first dimension should look like this
> [ x x . . ]
>
> And then distributing those for the second dimension becomes
>
> [ x x x x ]
> [ . . . . ]
> [ . . . . ]
> [ . . . . ]
> [ x x x x ]
> [ . . . . ]
>
> Going to the MPI standard to justify why the first layout is right,
> it's where the definiton of the cyclic() function has a ub_marker
> of gsize*ex, eg 4 ints for that first type.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
Merging a romio commit that was accepted by mpich:

> romio flatten has flat->count consistency problems
>
> ADIOI_Count_contiguous_blocks and ADIOI_Flatten are very similar functions,
> but they diverge a lot around the edges and in degenerate cases.  In Spectrum
> MPI I spent some time making them consistent with each other but
> found that to be a losing battle.
>
> So the approach I used here is to not have Count() be as definitive,
> and rather let Flatten() have the last word on what the final flat->count
> really is.  Eg Flatten's *curr_index is the real count.
>
> The changes I made are
>
> 1. Fix a couple places in Flatten where *curr_index was updated out
> of sync with what was actually being added to flat->indices[] and
> flat->blocklens[].  That's one of the important book-keeping rules
> Flatten should follow.  There were a couple places (when counts[] arrays
> have 0s) where that wasn't the case (see the "nonzeroth" var in this
> commit).
>
> 2. The main change I made was to reset flat->count based on
> Flatten's curr_index.  This is because the divergence between the
> two functions is too great to reliably fix.
>
> 3. A third change is just a safety valve, using a
> flatlist_node_grow() function just in case the
> Count function returns a smaller value than Flatten ends up
> trying to write into the array, and related to this I
> got rid of the various --(flat->count) lines, since that var now
> represents the allocated size of the array, until right after
> the Flatten function when it's reset to represent the data written
> to the array like it used to be.
>
> I don't think we even need ADIOI_Count_contiguous_blocks() anymore,
> but I didn't remove it.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
Merging a romio commit that was accepted by mpich:

> make ADIOI_GEN_WriteStrided not step on itself
>
> The ADIOI_GEN_WriteStrided funcion uses data sieving on non-contiguous
> types. That is, if it wants to write data at locations
>     [W...X...Y...Z...]
> it reads the whole buffer
>     [dddddddddddddddd]
> changes the locations it wants to write to
>     [WdddXdddYdddZddd]
> then writes the whole thing back. It uses locks to make this safe, but
> the problem is this only protects against other parts of the product that
> are using locks. And without this PR a peer who is simultaneously making
> a simple non-contiguous write wouldn't have locked.
>
> A testcase to demonstrate the original problem is here:
> https://gist.github.com/markalle/d7da240c19e57f095c5d1b13240dae24
>
> % mpicc -o x romio_write_timing.c
> % mpirun -np 4 ./x
>
> Note: you need to use a filesystem that uses ADIOI_GEN_WriteStrided to
> hit the bug. I was using GPFS.
>
> This commit is pulled from wkliao after discussing where to put the
> new lock.  It adds locks to contiguous writes in independent write
> functions when data sieving write is not disabled

Signed-off-by: Mark Allen <markalle@us.ibm.com>
@markalle
Copy link
Contributor Author

I just re-reviewed the status of the mpich PRs and made minor updates here. Most of the PRs are accepted in mpich and are merged here with no changes

A minor commit that turns some "ADIOI_Free(x);" into "if (x) { ADIOI_Free(x); }" is modified since the mpich current code is a little different than the version we have here.

There is still an outstanding mpich PR 5460 which I don't think we need to worry about. In OMPI we had already put a version of that fix in long ago, so we don't really need to do anything here. If anything, I'd say once it's officially in, the old fix we already have in OMPI could be replaced by the new mpich fix. But if we just continue using our already-fixed romio on this branch indefinitely I think that would be acceptable too.

And mpich did later make another commit "romio: properly free my_req and others_req" that touches some of these same files and is a more extensive reorganization of some gpfs code vs the simple "if (x) { ADIOI_Free(x); }" found in these commits. But it's pretty big and the newer romio is fairly dis-similar from the romio we have here in ompi v4.1.x so I don't think we need to pull in that commit, and should just go with the simpler part included here.

Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is this different than what’s in OMPI master or v5.0.x? I’m surprised these commits aren’t cherry-picks from there.

@markalle
Copy link
Contributor Author

But this branch is romio321 based, while master either moved or is moving to the new romio isn't it? I wouldn't advocate for making a big jump to the new romio over here in v4.1.x.

@jsquyres
Copy link
Member

@ggouaillardet Please review.

@jsquyres
Copy link
Member

@markalle Please fix the cherry pick checker CI fail. Thanks.

@markalle
Copy link
Contributor Author

bot:notacherrypick

I think it's fair to say this wasn't a cherry-pick. Most of these commits were directly from mpich, but I think the way I pulled it in was git format-patch in mpich, editing the paths inside the patch, then merging that. So conceptually it kind of was a cherry-pick out of a different repo, but isn't this check for literal cherry-picks and it would want to see a hash for a commit that's not in some completely separate mpich repo?

@awlauria
Copy link
Contributor

awlauria commented Sep 7, 2021

@jsquyres should this get in before a v4.1. rc?

@gpaulsen
Copy link
Member

@ggouaillardet can you review please?

@gpaulsen gpaulsen requested review from gpaulsen and removed request for gpaulsen September 13, 2021 21:42
@jsquyres
Copy link
Member

@ggouaillardet Can you review?

@markalle
Copy link
Contributor Author

Here's an update about where each of the commits here are located in mpich, they're all in, just the first one isn't an exact match:

*** [from mpich] romio: avoid freeing NULLs in gpfs
in this OMPI PR : c9c1ec6

This isn't an exact match for the new mpich code since we're
still in romio321-based code over here. But conceptually this
is the same as two commits that were accepted into romio master:
in mpich : 313289a82ac7fa14081d292a16b4b11d76c7baed
in mpich : 3662196ece110cc8982a327f1fd5e4cd6aadcc7a

*** [from mpich] romio GPFS: missing initialization
in this OMPI PR : 4de4fd3
in mpich : 83bbb827dd172d1751b03dc3dd080e1a3805136c

*** [from mpich] darray (romio) needs a resize in the block()/cyclic()
in this OMPI PR : 3e317d7
in mpich : 73a3eba9a1eaa141cfce85b70f4a230a62f2c594

*** [from mpich] romio flatten has flat->count consistency problems
in this OMPI PR : adda190
in mpich : c4b57625c06afb59df756a3bb1fd60031d3ee23e

*** [from mpich] make ADIOI_GEN_WriteStrided not step on itself
in this OMPI PR : ecd3e05
in mpich : bbb52104e7be43100d12edfeea9465ac79980a0d

@gpaulsen
Copy link
Member

@ggouaillardet can you please review so we can get this into v4.1.2?

Copy link
Contributor

@ggouaillardet ggouaillardet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been very busy for a while, so my apologies for the late and short review.

At first glance, the changes look good to me.

Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ready to go!

@jsquyres jsquyres merged commit eea6092 into open-mpi:v4.1.x Oct 11, 2021
@ggouaillardet
Copy link
Contributor

I do no expect so.
master has the same issue (some stuff is not implemented) and hence it will crash ...

I do not have much bandwidth to dig this right now, and the "best" I can do is have these subroutine return MPI_ERR_INTERN and/or MPI_Abort() since the default is not to abort on MPI-IO errors.

IIRC, ROM-IO non blocking collectives depend on MPICH extensions to MPI_Grequest.
I think I issued a PR years ago but it was faced with push back from the developers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants