-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Romio fixes from mpich #8370
Romio fixes from mpich #8370
Conversation
…btype creation Mpich PR 4943 has a commit titled darray (romio) needs a resize in the block()/cyclic() subtype creation When Type_create_darray() uses the block()/cyclic() function to construct a subtype for whatever dimension it's processing, it needs to resize the resulting type before it goes into the type construction as the input for processing the next dimension. The same problem is in the main path's type creation, and in romio's mirroring of it. Gist for a testcase: https://gist.github.com/markalle/940de93d64fd779e304ee124855b8e6a The darray_bug_romio.c testcase creates a darray using * 4 ranks in a 2x2 grid * looking at the type made for rank 0 * inner dimension: 4 ints distributed block over 2 ranks with 2 items each * outer dimension: 6 of the above distributed cyclic over 2 ranks with 2 items each The type created by processing the first dimension should look like this [ x x . . ] And then distributing those for the second dimension becomes [ x x x x ] [ . . . . ] [ . . . . ] [ . . . . ] [ x x x x ] [ . . . . ] Going to the MPI standard to justify why the first layout is right, it's where the definiton of the cyclic() function has a ub_marker of gsize*ex, eg 4 ints for that first type. Signed-off-by: Mark Allen <markalle@us.ibm.com>
Mpich PR 4943 has a commit titled romio flatten has flat->count consistency problems ADIOI_Count_contiguous_blocks and ADIOI_Flatten are very similar functions, but they diverge a lot around the edges and in degenerate cases. In Spectrum MPI I spent some time making them consistent with each other but found that to be a losing battle. So the approach I used here is to not have Count() be as definitive, and rather let Flatten() have the last word on what the final flat->count really is. Eg Flatten's *curr_index is the real count. The changes I made are 1. Fix a couple places in Flatten where *curr_index was updated out of sync with what was actually being added to flat->indices[] and flat->blocklens[]. That's one of the important book-keeping rules Flatten should follow. There were a couple places (when counts[] arrays have 0s) where that wasn't the case (see the "nonzeroth" var in this commit). 2. The main change I made was to reset flat->count based on Flatten's curr_index. This is because the divergence between the two functions is too great to reliably fix. 3. A third change is just a safety valve, using a flatlist_node_grow() function just in case the Count function returns a smaller value than Flatten ends up trying to write into the array, and related to this I got rid of the various --(flat->count) lines, since that var now represents the allocated size of the array, until right after the Flatten function when it's reset to represent the data written to the array like it used to be. I don't think we even need ADIOI_Count_contiguous_blocks() anymore, but I didn't remove it. Signed-off-by: Mark Allen <markalle@us.ibm.com>
…to mpich Mpich PR 4948 has a commit titled porting an old fix we made in OMPI's copy of romio back to mpich We already have most of that commit in OMPI, but the mpich version had a small initialization of a pointer to NULL as well, so I took that part. There was a problem with the Alltoallv in ADIOI_GPFS_Calc_others_req(). It was using sendbuf and sdisps[] to reference the various my_req[x].offsets, but those are all malloced separately and could thus be further apart than the integers in sdisps[] would allow. So this PR copies all that data into a new send buf before the Alltoallv, then copies it back out of a new recv buf. I don't have a testcase that demonstrates the failure, but I did at least run a GPFS test that called MPI_File_write_all() and uses the Alltoallv(). Signed-off-by: Mark Allen <markalle@us.ibm.com>
… is not disabled Mpich PR pmodels/mpich#4995 has a commit titled lock contiguous indpendent write when data sieving write is not disabled This is a fix for the testcase https://gist.github.com/markalle/d7da240c19e57f095c5d1b13240dae24 % mpicc -o x romio_write_timing.c % mpirun -np 4 ./x Note: you need to use a filesystem that uses ADIOI_GEN_WriteStrided to hit the bug. I was using GPFS. Signed-off-by: Mark Allen <markalle@us.ibm.com>
I'll check this on one of our systems with luste fs |
@markalle this works for me on a couple of systems with lustre fs. |
So, @hppritcha, We need to decide if we want to take this patch on top of our ROMIO, or if we want to wait for MPICH/ROMIO to release with these fixes (if they ever will). |
fwiw, I finalized #8343 (ROMIO 3.4 refresh) for v5. |
For v5 I like the goal of having a straight drop of romio directly into ompi. Even though I'm skeptical about it ever being that clean it's a nice direction to attempt and we can certainly get closer to it. But for v4 I figured there wouldn't be major revision of romio so we whould just keep putting fixes into the version we have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been through IBM Specturm MPI on GPFS, and through Open-MPI on Lustre with HDF5 make check.
These are the fixes from MPICH's romio related to this issue with HDF5:
#6871
I think it's really just the darray + flatten portions of the fix that HDF5 needs, but this includes a couple other small fixes that seem safe and that I wasn't completely sure weren't also hit by HDF5 or similar apps.
The sieving fix isn't merged into mpich at the present time (#6871), but it's gone through some reviews already and I think it's about to get merged.