Romio fixes from mpich #8370

markalle · 2021-01-12T22:36:23Z

These are the fixes from MPICH's romio related to this issue with HDF5:
#6871

I think it's really just the darray + flatten portions of the fix that HDF5 needs, but this includes a couple other small fixes that seem safe and that I wasn't completely sure weren't also hit by HDF5 or similar apps.

The sieving fix isn't merged into mpich at the present time (#6871), but it's gone through some reviews already and I think it's about to get merged.

…btype creation Mpich PR 4943 has a commit titled darray (romio) needs a resize in the block()/cyclic() subtype creation When Type_create_darray() uses the block()/cyclic() function to construct a subtype for whatever dimension it's processing, it needs to resize the resulting type before it goes into the type construction as the input for processing the next dimension. The same problem is in the main path's type creation, and in romio's mirroring of it. Gist for a testcase: https://gist.github.com/markalle/940de93d64fd779e304ee124855b8e6a The darray_bug_romio.c testcase creates a darray using * 4 ranks in a 2x2 grid * looking at the type made for rank 0 * inner dimension: 4 ints distributed block over 2 ranks with 2 items each * outer dimension: 6 of the above distributed cyclic over 2 ranks with 2 items each The type created by processing the first dimension should look like this [ x x . . ] And then distributing those for the second dimension becomes [ x x x x ] [ . . . . ] [ . . . . ] [ . . . . ] [ x x x x ] [ . . . . ] Going to the MPI standard to justify why the first layout is right, it's where the definiton of the cyclic() function has a ub_marker of gsize*ex, eg 4 ints for that first type. Signed-off-by: Mark Allen <markalle@us.ibm.com>

Mpich PR 4943 has a commit titled romio flatten has flat->count consistency problems ADIOI_Count_contiguous_blocks and ADIOI_Flatten are very similar functions, but they diverge a lot around the edges and in degenerate cases. In Spectrum MPI I spent some time making them consistent with each other but found that to be a losing battle. So the approach I used here is to not have Count() be as definitive, and rather let Flatten() have the last word on what the final flat->count really is. Eg Flatten's *curr_index is the real count. The changes I made are 1. Fix a couple places in Flatten where *curr_index was updated out of sync with what was actually being added to flat->indices[] and flat->blocklens[]. That's one of the important book-keeping rules Flatten should follow. There were a couple places (when counts[] arrays have 0s) where that wasn't the case (see the "nonzeroth" var in this commit). 2. The main change I made was to reset flat->count based on Flatten's curr_index. This is because the divergence between the two functions is too great to reliably fix. 3. A third change is just a safety valve, using a flatlist_node_grow() function just in case the Count function returns a smaller value than Flatten ends up trying to write into the array, and related to this I got rid of the various --(flat->count) lines, since that var now represents the allocated size of the array, until right after the Flatten function when it's reset to represent the data written to the array like it used to be. I don't think we even need ADIOI_Count_contiguous_blocks() anymore, but I didn't remove it. Signed-off-by: Mark Allen <markalle@us.ibm.com>

…to mpich Mpich PR 4948 has a commit titled porting an old fix we made in OMPI's copy of romio back to mpich We already have most of that commit in OMPI, but the mpich version had a small initialization of a pointer to NULL as well, so I took that part. There was a problem with the Alltoallv in ADIOI_GPFS_Calc_others_req(). It was using sendbuf and sdisps[] to reference the various my_req[x].offsets, but those are all malloced separately and could thus be further apart than the integers in sdisps[] would allow. So this PR copies all that data into a new send buf before the Alltoallv, then copies it back out of a new recv buf. I don't have a testcase that demonstrates the failure, but I did at least run a GPFS test that called MPI_File_write_all() and uses the Alltoallv(). Signed-off-by: Mark Allen <markalle@us.ibm.com>

… is not disabled Mpich PR pmodels/mpich#4995 has a commit titled lock contiguous indpendent write when data sieving write is not disabled This is a fix for the testcase https://gist.github.com/markalle/d7da240c19e57f095c5d1b13240dae24 % mpicc -o x romio_write_timing.c % mpirun -np 4 ./x Note: you need to use a filesystem that uses ADIOI_GEN_WriteStrided to hit the bug. I was using GPFS. Signed-off-by: Mark Allen <markalle@us.ibm.com>

hppritcha · 2021-01-12T22:37:57Z

I'll check this on one of our systems with luste fs

hppritcha · 2021-01-13T19:53:29Z

@markalle this works for me on a couple of systems with lustre fs.

gpaulsen · 2021-01-13T20:15:18Z

So, @hppritcha, We need to decide if we want to take this patch on top of our ROMIO, or if we want to wait for MPICH/ROMIO to release with these fixes (if they ever will).
My understanding is that they've taken 1 of these 3 commits, but are possibly still iterating on the others.
If we wait, we can decide if we hold v4.0.6 for this, or pick it up in a future v4.0.x patch, or only pick it up in v4.1.x, or v5 streams (since it would be a new ROMIO update).

ggouaillardet · 2021-01-16T06:40:44Z

fwiw, I finalized #8343 (ROMIO 3.4 refresh) for v5.

markalle · 2021-01-21T18:00:56Z

For v5 I like the goal of having a straight drop of romio directly into ompi. Even though I'm skeptical about it ever being that clean it's a nice direction to attempt and we can certainly get closer to it. But for v4 I figured there wouldn't be major revision of romio so we whould just keep putting fixes into the version we have

gpaulsen

This has been through IBM Specturm MPI on GPFS, and through Open-MPI on Lustre with HDF5 make check.

markalle added 4 commits December 16, 2020 19:04

markalle mentioned this pull request Jan 12, 2021

HDF5 parallel test suite on Lustre hangs with OMPI 4.0.5 #6871

Closed

hppritcha requested a review from ggouaillardet January 13, 2021 19:54

hppritcha added Target: v4.0.x NEWS labels Jan 13, 2021

gpaulsen requested review from gpaulsen and removed request for ggouaillardet January 29, 2021 16:45

gpaulsen approved these changes Jan 29, 2021

View reviewed changes

hppritcha merged commit 8f0ca4d into open-mpi:v4.0.x Jan 29, 2021

gpaulsen mentioned this pull request Feb 11, 2021

Romio fixes from mpich v41x #8371

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Romio fixes from mpich #8370

Romio fixes from mpich #8370

markalle commented Jan 12, 2021

hppritcha commented Jan 12, 2021

hppritcha commented Jan 13, 2021

gpaulsen commented Jan 13, 2021

ggouaillardet commented Jan 16, 2021

markalle commented Jan 21, 2021

gpaulsen left a comment

Romio fixes from mpich #8370

Romio fixes from mpich #8370

Conversation

markalle commented Jan 12, 2021

hppritcha commented Jan 12, 2021

hppritcha commented Jan 13, 2021

gpaulsen commented Jan 13, 2021

ggouaillardet commented Jan 16, 2021

markalle commented Jan 21, 2021

gpaulsen left a comment

Choose a reason for hiding this comment