Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix intercommunicator overflow with big payload collectives #9942

Merged
merged 1 commit into from
Feb 7, 2022

Conversation

jjhursey
Copy link
Member

  • The 'inter' collective component was multiplying the int count by
    the int size of the communicator which can overflow the integer.
    • Solution is to preserve the full size_t value in the compuation
      which the PML supports.
  • allgather, gather, scatter all overflowed in a multiply
    • Preserve the full size_t value in the multiply
    • allgather needed extra code to handle the bcast of the result
  • allgatherv, gatherv, scatterv all overflowed a total variable
    that accumulated over the count array.
    • Preserve the full size_t value in total type

 * The 'inter' collective component was multiplying the `int` count by
   the `int` size of the communicator which can overflow the integer.
   - Solution is to preserve the full `size_t` value in the compuation
     which the PML supports.
 * `allgather`, `gather`, `scatter` all overflowed in a multiply
    - Preserve the full `size_t` value in the multiply
    - allgather needed extra code to handle the bcast of the result
 * `allgatherv`, `gatherv`, `scatterv` all overflowed a `total` variable
    that accumulated over the count array.
    - Preserve the full `size_t` value in `total` type

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey
Copy link
Member Author

jjhursey commented Feb 4, 2022

@bosilca when you have a chance can you re-review?

@jjhursey jjhursey merged commit acbe7b0 into open-mpi:master Feb 7, 2022
@jjhursey jjhursey deleted the big-payload-inter-coll branch February 7, 2022 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants