Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cuda::device::barrier_arrive tx #358

Merged
merged 68 commits into from
Sep 12, 2023

Commits on Sep 12, 2023

  1. Add barrier::arrive_tx

    This PR adds the barrier<thread_scope_block>::arrive_tx "token-returning
    arrive_tx member function" for SM90+ device code.
    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    5747b51 View commit details
    Browse the repository at this point in the history
  2. Implement review feedback

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    dcb64a8 View commit details
    Browse the repository at this point in the history
  3. Add test

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    8df2664 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b007800 View commit details
    Browse the repository at this point in the history
  5. Implement review feedback

    - Simplify test
    - Discard transaction count update on previous architectures
    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    8f88451 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    4ed5686 View commit details
    Browse the repository at this point in the history
  7. Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp

    Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
    ahendriksen and miscco committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    01edfd2 View commit details
    Browse the repository at this point in the history
  8. Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp

    Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
    ahendriksen and miscco committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    4a0ae94 View commit details
    Browse the repository at this point in the history
  9. Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp

    Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
    ahendriksen and miscco committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    89b8a33 View commit details
    Browse the repository at this point in the history
  10. Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp

    Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
    ahendriksen and miscco committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    7c91f12 View commit details
    Browse the repository at this point in the history
  11. Saved by [no_discard](!)

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    889bea5 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    2769530 View commit details
    Browse the repository at this point in the history
  13. Split arrive_tx test up

    Our tests are currently running with a specific number of threads on device.  Consequently we need to split them up if we want to test against different number of threads
    miscco authored and ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    3f36503 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    fe358cd View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    104d739 View commit details
    Browse the repository at this point in the history
  16. Change feature test macro name

    - Remove "has"
    - Make available on all architectures (also for host code)
    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    7f06ca4 View commit details
    Browse the repository at this point in the history
  17. Actually change the macro name

    Change to __cccl_lib_local_barrier_arrive_tx
    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    edfbba4 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    6d4a7c7 View commit details
    Browse the repository at this point in the history
  19. Implement review feedback

    - Guard feature test macro with architecture
    - Consolidate tests in single header
    - Make sure arrive_tx behaves appropriately on shared, distributed
      shared, and global memory (following structure of init(..))
    - Make arrive_tx a device + host function.  (inline visibility)
    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    c97f318 View commit details
    Browse the repository at this point in the history
  20. Fix feature flag test

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    e04db94 View commit details
    Browse the repository at this point in the history
  21. Invert feature test macro availability

    Previously, it was only available pre-SM70. Now it is available, from
    SM70 onwards.
    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    4107b77 View commit details
    Browse the repository at this point in the history
  22. Fix license headers

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    1756350 View commit details
    Browse the repository at this point in the history
  23. Update libcudacxx/.upstream-tests/test/cuda/barrier/arrive_tx.h

    Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
    ahendriksen and miscco committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    9447f36 View commit details
    Browse the repository at this point in the history
  24. Update libcudacxx/.upstream-tests/test/cuda/barrier/arrive_tx.h

    Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
    ahendriksen and miscco committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    3cc5f24 View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    6f2bcdf View commit details
    Browse the repository at this point in the history
  26. Address more review feedback

    miscco authored and ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    61744ce View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    f35972b View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    1b647c0 View commit details
    Browse the repository at this point in the history
  29. Cleanup _LIBCUDACXX_DEBUG_ASSERT

    The only difference with a "normal" assert is that the debug version is disabled during constant evaluation which has no effect on barrier.
    
    Also we should enable it consistently regardless of the debug mode
    miscco authored and ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    a9026b5 View commit details
    Browse the repository at this point in the history
  30. Fix asserts

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    a045347 View commit details
    Browse the repository at this point in the history
  31. Fix nvrtc tests

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    4d4c1ea View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    c2fec0c View commit details
    Browse the repository at this point in the history
  33. Configuration menu
    Copy the full SHA
    228e072 View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    0e0c661 View commit details
    Browse the repository at this point in the history
  35. Remove barrier::arrive_tx

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    f83e7ca View commit details
    Browse the repository at this point in the history
  36. Add docs

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    c092f56 View commit details
    Browse the repository at this point in the history
  37. Update CC sentence

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    3bff8e9 View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    7fa3ad2 View commit details
    Browse the repository at this point in the history
  39. Document feature flag

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    4eb18f9 View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    6e7d312 View commit details
    Browse the repository at this point in the history
  41. Configuration menu
    Copy the full SHA
    3b9a184 View commit details
    Browse the repository at this point in the history
  42. Configuration menu
    Copy the full SHA
    c0513bf View commit details
    Browse the repository at this point in the history
  43. Configuration menu
    Copy the full SHA
    514e330 View commit details
    Browse the repository at this point in the history
  44. Configuration menu
    Copy the full SHA
    f96f20a View commit details
    Browse the repository at this point in the history
  45. arrive_tx: disallow arrive_count == 0

    After discussion with Gonzalo, it was decided to disallow an arrival
    count of zero:
    - arrive with zero count is undefined in PTX
    - we need an arrival token to return
    - we want the .release semantics from the PTX arrive for all allowed
      parameter combinations
    
    We might want to add an cuda::device::expect_tx function with .release
    semantics instead.
    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    901ab2a View commit details
    Browse the repository at this point in the history
  46. Simplify state space handling

    This way allows developer tools to catch errors in release mode.
    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    3407e30 View commit details
    Browse the repository at this point in the history
  47. Configuration menu
    Copy the full SHA
    31b0de9 View commit details
    Browse the repository at this point in the history
  48. Update arch check test

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    ee9db4a View commit details
    Browse the repository at this point in the history
  49. Add completion function template parameter for ABI

    But do not support any template parameter that is not equal to the
    default completion function.
    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    d56cd4e View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    c7c4484 View commit details
    Browse the repository at this point in the history
  51. Configuration menu
    Copy the full SHA
    db23c52 View commit details
    Browse the repository at this point in the history
  52. Configuration menu
    Copy the full SHA
    c280a68 View commit details
    Browse the repository at this point in the history
  53. Configuration menu
    Copy the full SHA
    1401d4a View commit details
    Browse the repository at this point in the history
  54. Configuration menu
    Copy the full SHA
    3b8503f View commit details
    Browse the repository at this point in the history
  55. Fix hiding of arrive_tx

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    c6535a2 View commit details
    Browse the repository at this point in the history
  56. Configuration menu
    Copy the full SHA
    f2b38e7 View commit details
    Browse the repository at this point in the history
  57. Configuration menu
    Copy the full SHA
    490b166 View commit details
    Browse the repository at this point in the history
  58. Markup cluster test

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    1739496 View commit details
    Browse the repository at this point in the history
  59. Configuration menu
    Copy the full SHA
    53403d2 View commit details
    Browse the repository at this point in the history
  60. Configuration menu
    Copy the full SHA
    c17b3ed View commit details
    Browse the repository at this point in the history
  61. Configuration menu
    Copy the full SHA
    c41fd9e View commit details
    Browse the repository at this point in the history
  62. Configuration menu
    Copy the full SHA
    1443a90 View commit details
    Browse the repository at this point in the history
  63. Update docs

    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    331ed7c View commit details
    Browse the repository at this point in the history
  64. Replace phase completion section

    Instead of using the PTX docs as a reference, use the C++ barrier docs
    as a reference.
    ahendriksen committed Sep 12, 2023
    Configuration menu
    Copy the full SHA
    8e2f2ef View commit details
    Browse the repository at this point in the history
  65. Configuration menu
    Copy the full SHA
    41303da View commit details
    Browse the repository at this point in the history
  66. Configuration menu
    Copy the full SHA
    ce919a2 View commit details
    Browse the repository at this point in the history
  67. Configuration menu
    Copy the full SHA
    2040fa6 View commit details
    Browse the repository at this point in the history
  68. Configuration menu
    Copy the full SHA
    bb989b1 View commit details
    Browse the repository at this point in the history