Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osc/rdma: performance improvments and bug fixes #4918

Merged
merged 1 commit into from
Mar 15, 2018

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Mar 15, 2018

This commit is a large update to the osc/rdma component. Included in
this commit:

  • Add support for using hardware atomics for fetch-and-op and single
    count accumulate when using the accumulate lock. This will improve
    the performance of these operations even when not setting the
    single intrinsic info key.

  • Rework how large accumulates are done. They now block on the get
    operation to fix some bugs discovered by an IBM one-sided test. I
    may roll back some of the changes if the underlying bug in the
    original design is discovered. There appear to be no real
    difference (on the hardware this was tested with) in performance so
    its probably a non-issue. References osc rdma hang with multiple windows #2530.

  • Add support for an additional lock-all algorithm: on-demand. The
    on-demand algorithm will attempt to acquire the peer lock when
    starting an RMA operation. The lock algorithm default has not
    changed. The algorithm can be selected by setting the
    osc_rdma_locking_mode MCA variable. The valid values are two_level
    and on_demand.

  • Make use of the btl_flush function if available. This can improve
    performance with some btls.

  • When using btl_flush do not keep track of the number of put
    operations. This reduces the number of atomic operations in the
    critical path.

  • Make the window buffers more friendly to multi-threaded
    applications. This was done by dropping support for multiple
    buffers per MPI window. I intend to re-add that support once the
    underlying performance bug under the old buffering scheme is
    fixed.

  • Fix a bug in request completion in the accumulate, get, and put
    paths. This also helps with osc rdma hang with multiple windows #2530.

  • General code cleanup and fixes.

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

@hjelmn
Copy link
Member Author

hjelmn commented Mar 15, 2018

opps, need to fix one small issue with the merge.

This commit is a large update to the osc/rdma component. Included in
this commit:

 - Add support for using hardware atomics for fetch-and-op and single
   count accumulate  when using the accumulate lock. This will improve
   the performance of these operations even when not setting the
   single intrinsic info key.

 - Rework how large accumulates are done. They now block on the get
   operation to fix some bugs discovered by an IBM one-sided test. I
   may roll back some of the changes if the underlying bug in the
   original design is discovered. There appear to be no real
   difference (on the hardware this was tested with) in performance so
   its probably a non-issue. References open-mpi#2530.

 - Add support for an additional lock-all algorithm: on-demand. The
   on-demand algorithm will attempt to acquire the peer lock when
   starting an RMA operation. The lock algorithm default has not
   changed. The algorithm can be selected by setting the
   osc_rdma_locking_mode MCA variable. The valid values are two_level
   and on_demand.

 - Make use of the btl_flush function if available. This can improve
   performance with some btls.

 - When using btl_flush do not keep track of the number of put
   operations. This reduces the number of atomic operations in the
   critical path.

 - Make the window buffers more friendly to multi-threaded
   applications. This was done by dropping support for multiple
   buffers per MPI window. I intend to re-add that support once the
   underlying performance bug under the old buffering scheme is
   fixed.

 - Fix a bug in request completion in the accumulate, get, and put
   paths. This also helps with open-mpi#2530.

 - General code cleanup and fixes.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn hjelmn merged commit 7f4872d into open-mpi:master Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant