Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Romio fixes from mpich #8370

Merged
merged 4 commits into from
Jan 29, 2021
Merged

Conversation

markalle
Copy link
Contributor

These are the fixes from MPICH's romio related to this issue with HDF5:
#6871

I think it's really just the darray + flatten portions of the fix that HDF5 needs, but this includes a couple other small fixes that seem safe and that I wasn't completely sure weren't also hit by HDF5 or similar apps.

The sieving fix isn't merged into mpich at the present time (#6871), but it's gone through some reviews already and I think it's about to get merged.

…btype creation

Mpich PR 4943 has a commit titled
darray (romio) needs a resize in the block()/cyclic() subtype creation

When Type_create_darray() uses the block()/cyclic() function to
construct a subtype for whatever dimension it's processing, it
needs to resize the resulting type before it goes into the type
construction as the input for processing the next dimension.

The same problem is in the main path's type creation, and in romio's
mirroring of it.

Gist for a testcase:
https://gist.github.com/markalle/940de93d64fd779e304ee124855b8e6a

The darray_bug_romio.c testcase creates a darray using
* 4 ranks in a 2x2 grid
* looking at the type made for rank 0
* inner dimension: 4 ints distributed block over 2 ranks with 2 items each
* outer dimension: 6 of the above distributed cyclic over 2 ranks with 2 items each

The type created by processing the first dimension should look like this
[ x x . . ]

And then distributing those for the second dimension becomes

[ x x x x ]
[ . . . . ]
[ . . . . ]
[ . . . . ]
[ x x x x ]
[ . . . . ]

Going to the MPI standard to justify why the first layout is right,
it's where the definiton of the cyclic() function has a ub_marker
of gsize*ex, eg 4 ints for that first type.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
Mpich PR 4943 has a commit titled
romio flatten has flat->count consistency problems

ADIOI_Count_contiguous_blocks and ADIOI_Flatten are very similar functions,
but they diverge a lot around the edges and in degenerate cases.  In Spectrum
MPI I spent some time making them consistent with each other but
found that to be a losing battle.

So the approach I used here is to not have Count() be as definitive,
and rather let Flatten() have the last word on what the final flat->count
really is.  Eg Flatten's *curr_index is the real count.

The changes I made are

1. Fix a couple places in Flatten where *curr_index was updated out
of sync with what was actually being added to flat->indices[] and
flat->blocklens[].  That's one of the important book-keeping rules
Flatten should follow.  There were a couple places (when counts[] arrays
have 0s) where that wasn't the case (see the "nonzeroth" var in this
commit).

2. The main change I made was to reset flat->count based on
Flatten's curr_index.  This is because the divergence between the
two functions is too great to reliably fix.

3. A third change is just a safety valve, using a
flatlist_node_grow() function just in case the
Count function returns a smaller value than Flatten ends up
trying to write into the array, and related to this I
got rid of the various --(flat->count) lines, since that var now
represents the allocated size of the array, until right after
the Flatten function when it's reset to represent the data written
to the array like it used to be.

I don't think we even need ADIOI_Count_contiguous_blocks() anymore,
but I didn't remove it.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
…to mpich

Mpich PR 4948 has a commit titled
porting an old fix we made in OMPI's copy of romio back to mpich

We already have most of that commit in OMPI, but the mpich version had
a small initialization of a pointer to NULL as well, so I took that part.

There was a problem with the Alltoallv in ADIOI_GPFS_Calc_others_req().
It was using sendbuf and sdisps[] to reference the various
my_req[x].offsets, but those are all malloced separately and could
thus be further apart than the integers in sdisps[] would allow.
So this PR copies all that data into a new send buf before the
Alltoallv, then copies it back out of a new recv buf.

I don't have a testcase that demonstrates the failure, but I did
at least run a GPFS test that called MPI_File_write_all() and
uses the Alltoallv().

Signed-off-by: Mark Allen <markalle@us.ibm.com>
… is not disabled

Mpich PR pmodels/mpich#4995 has a commit titled
lock contiguous indpendent write when data sieving write is not disabled

This is a fix for the testcase
    https://gist.github.com/markalle/d7da240c19e57f095c5d1b13240dae24

% mpicc -o x romio_write_timing.c
% mpirun -np 4 ./x

Note: you need to use a filesystem that uses ADIOI_GEN_WriteStrided to
hit the bug. I was using GPFS.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
@hppritcha
Copy link
Member

I'll check this on one of our systems with luste fs

@hppritcha
Copy link
Member

@markalle this works for me on a couple of systems with lustre fs.

@gpaulsen
Copy link
Member

So, @hppritcha, We need to decide if we want to take this patch on top of our ROMIO, or if we want to wait for MPICH/ROMIO to release with these fixes (if they ever will).
My understanding is that they've taken 1 of these 3 commits, but are possibly still iterating on the others.
If we wait, we can decide if we hold v4.0.6 for this, or pick it up in a future v4.0.x patch, or only pick it up in v4.1.x, or v5 streams (since it would be a new ROMIO update).

@ggouaillardet
Copy link
Contributor

fwiw, I finalized #8343 (ROMIO 3.4 refresh) for v5.

@markalle
Copy link
Contributor Author

For v5 I like the goal of having a straight drop of romio directly into ompi. Even though I'm skeptical about it ever being that clean it's a nice direction to attempt and we can certainly get closer to it. But for v4 I figured there wouldn't be major revision of romio so we whould just keep putting fixes into the version we have

@gpaulsen gpaulsen requested review from gpaulsen and removed request for ggouaillardet January 29, 2021 16:45
Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been through IBM Specturm MPI on GPFS, and through Open-MPI on Lustre with HDF5 make check.

@hppritcha hppritcha merged commit 8f0ca4d into open-mpi:v4.0.x Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants