-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Romio fixes from mpich v41x #8371
Romio fixes from mpich v41x #8371
Commits on Aug 11, 2021
-
[from mpich] romio: avoid freeing NULLs in gpfs
mpich has two commits about freeing NULLs in GPFS, titled: "romio gpfs: avoid freeing NULL" and "ROMIO: gpfs: avoid freeing NULL in more places" The code isn't a perfect match vs the romio we have here in ompi version 4.1.x, but it's essentially turning a handful of "ADIOI_Free(x)" into "if (x) { ADIOI_Free(x); }". So that's what this commit does too. Signed-off-by: Mark Allen <markalle@us.ibm.com>
Configuration menu - View commit details
-
Copy full SHA for c9c1ec6 - Browse repository at this point
Copy the full SHA c9c1ec6View commit details -
[from mpich] romio GPFS: missing initialization
Merging a romio commit that was accepted by mpich: > romio GPFS: missing initialization > > Romio GPFS had a runtime problem due to a .mem_ptrs field that > was uninitialized that it was potentially trying to free. Signed-off-by: Mark Allen <markalle@us.ibm.com>
Configuration menu - View commit details
-
Copy full SHA for 4de4fd3 - Browse repository at this point
Copy the full SHA 4de4fd3View commit details -
[from mpich] darray (romio) needs a resize in the block()/cyclic() su…
…btype creation Merging a romio commit that was accepted by mpich: > darray (romio) needs a resize in the block()/cyclic() subtype creation > > When Type_create_darray() uses the block()/cyclic() function to > construct a subtype for whatever dimension it's processing, it > needs to resize the resulting type before it goes into the type > construction as the input for processing the next dimension. > > The same problem is in the main path's type creation, and in romio's > mirroring of it. > > Gist for a testcase: > https://gist.github.com/markalle/940de93d64fd779e304ee124855b8e6a > > The darray_bug_romio.c testcase creates a darray using > * 4 ranks in a 2x2 grid > * looking at the type made for rank 0 > * inner dimension: 4 ints distributed block over 2 ranks with 2 items each > * outer dimension: 6 of the above distributed cyclic over 2 ranks with 2 items each > > The type created by processing the first dimension should look like this > [ x x . . ] > > And then distributing those for the second dimension becomes > > [ x x x x ] > [ . . . . ] > [ . . . . ] > [ . . . . ] > [ x x x x ] > [ . . . . ] > > Going to the MPI standard to justify why the first layout is right, > it's where the definiton of the cyclic() function has a ub_marker > of gsize*ex, eg 4 ints for that first type. Signed-off-by: Mark Allen <markalle@us.ibm.com>
Configuration menu - View commit details
-
Copy full SHA for 3e317d7 - Browse repository at this point
Copy the full SHA 3e317d7View commit details -
[from mpich] romio flatten has flat->count consistency problems
Merging a romio commit that was accepted by mpich: > romio flatten has flat->count consistency problems > > ADIOI_Count_contiguous_blocks and ADIOI_Flatten are very similar functions, > but they diverge a lot around the edges and in degenerate cases. In Spectrum > MPI I spent some time making them consistent with each other but > found that to be a losing battle. > > So the approach I used here is to not have Count() be as definitive, > and rather let Flatten() have the last word on what the final flat->count > really is. Eg Flatten's *curr_index is the real count. > > The changes I made are > > 1. Fix a couple places in Flatten where *curr_index was updated out > of sync with what was actually being added to flat->indices[] and > flat->blocklens[]. That's one of the important book-keeping rules > Flatten should follow. There were a couple places (when counts[] arrays > have 0s) where that wasn't the case (see the "nonzeroth" var in this > commit). > > 2. The main change I made was to reset flat->count based on > Flatten's curr_index. This is because the divergence between the > two functions is too great to reliably fix. > > 3. A third change is just a safety valve, using a > flatlist_node_grow() function just in case the > Count function returns a smaller value than Flatten ends up > trying to write into the array, and related to this I > got rid of the various --(flat->count) lines, since that var now > represents the allocated size of the array, until right after > the Flatten function when it's reset to represent the data written > to the array like it used to be. > > I don't think we even need ADIOI_Count_contiguous_blocks() anymore, > but I didn't remove it. Signed-off-by: Mark Allen <markalle@us.ibm.com>
Configuration menu - View commit details
-
Copy full SHA for adda190 - Browse repository at this point
Copy the full SHA adda190View commit details -
[from mpich] make ADIOI_GEN_WriteStrided not step on itself
Merging a romio commit that was accepted by mpich: > make ADIOI_GEN_WriteStrided not step on itself > > The ADIOI_GEN_WriteStrided funcion uses data sieving on non-contiguous > types. That is, if it wants to write data at locations > [W...X...Y...Z...] > it reads the whole buffer > [dddddddddddddddd] > changes the locations it wants to write to > [WdddXdddYdddZddd] > then writes the whole thing back. It uses locks to make this safe, but > the problem is this only protects against other parts of the product that > are using locks. And without this PR a peer who is simultaneously making > a simple non-contiguous write wouldn't have locked. > > A testcase to demonstrate the original problem is here: > https://gist.github.com/markalle/d7da240c19e57f095c5d1b13240dae24 > > % mpicc -o x romio_write_timing.c > % mpirun -np 4 ./x > > Note: you need to use a filesystem that uses ADIOI_GEN_WriteStrided to > hit the bug. I was using GPFS. > > This commit is pulled from wkliao after discussing where to put the > new lock. It adds locks to contiguous writes in independent write > functions when data sieving write is not disabled Signed-off-by: Mark Allen <markalle@us.ibm.com>
Configuration menu - View commit details
-
Copy full SHA for ecd3e05 - Browse repository at this point
Copy the full SHA ecd3e05View commit details