Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big Count Collectives Test Suite #15

Merged
merged 1 commit into from
Mar 1, 2022

Conversation

jjhursey
Copy link
Member

  • Run collectives with a count parameter as close to INT_MAX as possible.
    • This test suite often highlights cases where the underlying algorithm
      assumes that the payload (roughly count x sizeof(datatype)) is
      and int when it should be handled as a size_t.
  • Includes:
    • Test with int and double _Complex primitive data types
    • Correctness checks
    • Mechanism to control (as best as we can) the amount of memory consumed
      per node.
  • Assumes:
    • Roughly the same amount of memory per node
    • Same number of processes per node
  • See README.md for details

Signed-off-by: Joshua Hursey jhursey@us.ibm.com

@jjhursey jjhursey added the enhancement New feature or request label Feb 22, 2022
@jjhursey jjhursey requested a review from gpaulsen February 22, 2022 20:33
@jjhursey jjhursey force-pushed the big-count-coll-testing branch from db878fb to d73c295 Compare February 22, 2022 20:44
@jjhursey jjhursey marked this pull request as ready for review February 22, 2022 20:44
Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a great PR. I commented on a few minor nits.

A few enhancements for future features could include:
o testing other reduction ops other than MPI_SUM
o Test MPI_Reduce_Local.
o Test new embigening in MPI-4.0 standard.

collective-big-count/Makefile Outdated Show resolved Hide resolved
# count = TEST_PAYLOAD_SIZE / sizeof(datatype)
######################################################################
# INT_MAX : == 2 GB so guard will not trip
TEST_PAYLOAD_SIZE=2147483647
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want this TEST_PAYLOAD_SIZE to be BIGGER than TEST_UNIFORM_COUNT? say 4 GB or 4GB + a few KB?
README.md (at the top) seems to suggest that the default TEST_PAYLOAD_SIZE would be UINT_MAX ???
See: https://github.com/open-mpi/ompi-tests-public/pull/15/files#diff-26cf9c4e5070bbfe5af943597e4ee73210cb78749e416e5bcb00fb204331f490R98

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can, but it should be relative to the datatype. So we don't want to set it to INT_MAX x 8 because then the count would be too big for an int datatype. I left it as a default of INT_MAX and folks can change it as they see fit. The TEST_UNIFORM_COUNT is probably the more useful of the two variants.

collective-big-count/common.h Show resolved Hide resolved
collective-big-count/common.h Outdated Show resolved Hide resolved
collective-big-count/common.h Outdated Show resolved Hide resolved
collective-big-count/common.h Outdated Show resolved Hide resolved
collective-big-count/common.h Outdated Show resolved Hide resolved
collective-big-count/test_allgather.c Show resolved Hide resolved
collective-big-count/test_allgatherv.c Show resolved Hide resolved
collective-big-count/test_reduce.c Show resolved Hide resolved
 * Run collectives with a `count` parameter as close to `INT_MAX` as possible.
   - This test suite often highlights cases where the underlying algorithm
     assumes that the payload (roughly `count x sizeof(datatype)`) is
     and `int` when it should be handled as a `size_t`.
 * Includes:
   - Test with `int` and `double _Complex` primitive data types
   - Correctness checks
   - Mechanism to control (as best as we can) the amount of memory consumed
     per node.
 * Assumes:
   - Roughly the same amount of memory per node
   - Same number of processes per node
 * See `README.md` for details

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey jjhursey force-pushed the big-count-coll-testing branch from d73c295 to 6332a0e Compare February 23, 2022 22:21
@jjhursey
Copy link
Member Author

I just force pushed the resolved changes from the review.

@jjhursey
Copy link
Member Author

jjhursey commented Mar 1, 2022

Discussed on March 1, 2022 teleconf. Ok to merge.

@jjhursey jjhursey merged commit 6944d97 into open-mpi:master Mar 1, 2022
@jjhursey jjhursey deleted the big-count-coll-testing branch March 1, 2022 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants