-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cuda::device::barrier_arrive tx #358
Add cuda::device::barrier_arrive tx #358
Commits on Sep 12, 2023
-
This PR adds the barrier<thread_scope_block>::arrive_tx "token-returning arrive_tx member function" for SM90+ device code.
Configuration menu - View commit details
-
Copy full SHA for 5747b51 - Browse repository at this point
Copy the full SHA 5747b51View commit details -
Configuration menu - View commit details
-
Copy full SHA for dcb64a8 - Browse repository at this point
Copy the full SHA dcb64a8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8df2664 - Browse repository at this point
Copy the full SHA 8df2664View commit details -
Configuration menu - View commit details
-
Copy full SHA for b007800 - Browse repository at this point
Copy the full SHA b007800View commit details -
- Simplify test - Discard transaction count update on previous architectures
Configuration menu - View commit details
-
Copy full SHA for 8f88451 - Browse repository at this point
Copy the full SHA 8f88451View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4ed5686 - Browse repository at this point
Copy the full SHA 4ed5686View commit details -
Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 01edfd2 - Browse repository at this point
Copy the full SHA 01edfd2View commit details -
Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 4a0ae94 - Browse repository at this point
Copy the full SHA 4a0ae94View commit details -
Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 89b8a33 - Browse repository at this point
Copy the full SHA 89b8a33View commit details -
Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 7c91f12 - Browse repository at this point
Copy the full SHA 7c91f12View commit details -
Configuration menu - View commit details
-
Copy full SHA for 889bea5 - Browse repository at this point
Copy the full SHA 889bea5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2769530 - Browse repository at this point
Copy the full SHA 2769530View commit details -
Our tests are currently running with a specific number of threads on device. Consequently we need to split them up if we want to test against different number of threads
Configuration menu - View commit details
-
Copy full SHA for 3f36503 - Browse repository at this point
Copy the full SHA 3f36503View commit details -
Configuration menu - View commit details
-
Copy full SHA for fe358cd - Browse repository at this point
Copy the full SHA fe358cdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 104d739 - Browse repository at this point
Copy the full SHA 104d739View commit details -
Change feature test macro name
- Remove "has" - Make available on all architectures (also for host code)
Configuration menu - View commit details
-
Copy full SHA for 7f06ca4 - Browse repository at this point
Copy the full SHA 7f06ca4View commit details -
Actually change the macro name
Change to __cccl_lib_local_barrier_arrive_tx
Configuration menu - View commit details
-
Copy full SHA for edfbba4 - Browse repository at this point
Copy the full SHA edfbba4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6d4a7c7 - Browse repository at this point
Copy the full SHA 6d4a7c7View commit details -
- Guard feature test macro with architecture - Consolidate tests in single header - Make sure arrive_tx behaves appropriately on shared, distributed shared, and global memory (following structure of init(..)) - Make arrive_tx a device + host function. (inline visibility)
Configuration menu - View commit details
-
Copy full SHA for c97f318 - Browse repository at this point
Copy the full SHA c97f318View commit details -
Configuration menu - View commit details
-
Copy full SHA for e04db94 - Browse repository at this point
Copy the full SHA e04db94View commit details -
Invert feature test macro availability
Previously, it was only available pre-SM70. Now it is available, from SM70 onwards.
Configuration menu - View commit details
-
Copy full SHA for 4107b77 - Browse repository at this point
Copy the full SHA 4107b77View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1756350 - Browse repository at this point
Copy the full SHA 1756350View commit details -
Update libcudacxx/.upstream-tests/test/cuda/barrier/arrive_tx.h
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 9447f36 - Browse repository at this point
Copy the full SHA 9447f36View commit details -
Update libcudacxx/.upstream-tests/test/cuda/barrier/arrive_tx.h
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 3cc5f24 - Browse repository at this point
Copy the full SHA 3cc5f24View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f2bcdf - Browse repository at this point
Copy the full SHA 6f2bcdfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 61744ce - Browse repository at this point
Copy the full SHA 61744ceView commit details -
Configuration menu - View commit details
-
Copy full SHA for f35972b - Browse repository at this point
Copy the full SHA f35972bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1b647c0 - Browse repository at this point
Copy the full SHA 1b647c0View commit details -
Cleanup
_LIBCUDACXX_DEBUG_ASSERT
The only difference with a "normal" assert is that the debug version is disabled during constant evaluation which has no effect on barrier. Also we should enable it consistently regardless of the debug mode
Configuration menu - View commit details
-
Copy full SHA for a9026b5 - Browse repository at this point
Copy the full SHA a9026b5View commit details -
Configuration menu - View commit details
-
Copy full SHA for a045347 - Browse repository at this point
Copy the full SHA a045347View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4d4c1ea - Browse repository at this point
Copy the full SHA 4d4c1eaView commit details -
Configuration menu - View commit details
-
Copy full SHA for c2fec0c - Browse repository at this point
Copy the full SHA c2fec0cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 228e072 - Browse repository at this point
Copy the full SHA 228e072View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0e0c661 - Browse repository at this point
Copy the full SHA 0e0c661View commit details -
Configuration menu - View commit details
-
Copy full SHA for f83e7ca - Browse repository at this point
Copy the full SHA f83e7caView commit details -
Configuration menu - View commit details
-
Copy full SHA for c092f56 - Browse repository at this point
Copy the full SHA c092f56View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3bff8e9 - Browse repository at this point
Copy the full SHA 3bff8e9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7fa3ad2 - Browse repository at this point
Copy the full SHA 7fa3ad2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4eb18f9 - Browse repository at this point
Copy the full SHA 4eb18f9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6e7d312 - Browse repository at this point
Copy the full SHA 6e7d312View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b9a184 - Browse repository at this point
Copy the full SHA 3b9a184View commit details -
Configuration menu - View commit details
-
Copy full SHA for c0513bf - Browse repository at this point
Copy the full SHA c0513bfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 514e330 - Browse repository at this point
Copy the full SHA 514e330View commit details -
Configuration menu - View commit details
-
Copy full SHA for f96f20a - Browse repository at this point
Copy the full SHA f96f20aView commit details -
arrive_tx: disallow arrive_count == 0
After discussion with Gonzalo, it was decided to disallow an arrival count of zero: - arrive with zero count is undefined in PTX - we need an arrival token to return - we want the .release semantics from the PTX arrive for all allowed parameter combinations We might want to add an cuda::device::expect_tx function with .release semantics instead.
Configuration menu - View commit details
-
Copy full SHA for 901ab2a - Browse repository at this point
Copy the full SHA 901ab2aView commit details -
This way allows developer tools to catch errors in release mode.
Configuration menu - View commit details
-
Copy full SHA for 3407e30 - Browse repository at this point
Copy the full SHA 3407e30View commit details -
Configuration menu - View commit details
-
Copy full SHA for 31b0de9 - Browse repository at this point
Copy the full SHA 31b0de9View commit details -
Configuration menu - View commit details
-
Copy full SHA for ee9db4a - Browse repository at this point
Copy the full SHA ee9db4aView commit details -
Add completion function template parameter for ABI
But do not support any template parameter that is not equal to the default completion function.
Configuration menu - View commit details
-
Copy full SHA for d56cd4e - Browse repository at this point
Copy the full SHA d56cd4eView commit details -
Configuration menu - View commit details
-
Copy full SHA for c7c4484 - Browse repository at this point
Copy the full SHA c7c4484View commit details -
Configuration menu - View commit details
-
Copy full SHA for db23c52 - Browse repository at this point
Copy the full SHA db23c52View commit details -
Configuration menu - View commit details
-
Copy full SHA for c280a68 - Browse repository at this point
Copy the full SHA c280a68View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1401d4a - Browse repository at this point
Copy the full SHA 1401d4aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b8503f - Browse repository at this point
Copy the full SHA 3b8503fView commit details -
Configuration menu - View commit details
-
Copy full SHA for c6535a2 - Browse repository at this point
Copy the full SHA c6535a2View commit details -
Configuration menu - View commit details
-
Copy full SHA for f2b38e7 - Browse repository at this point
Copy the full SHA f2b38e7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 490b166 - Browse repository at this point
Copy the full SHA 490b166View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1739496 - Browse repository at this point
Copy the full SHA 1739496View commit details -
Configuration menu - View commit details
-
Copy full SHA for 53403d2 - Browse repository at this point
Copy the full SHA 53403d2View commit details -
Configuration menu - View commit details
-
Copy full SHA for c17b3ed - Browse repository at this point
Copy the full SHA c17b3edView commit details -
Configuration menu - View commit details
-
Copy full SHA for c41fd9e - Browse repository at this point
Copy the full SHA c41fd9eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1443a90 - Browse repository at this point
Copy the full SHA 1443a90View commit details -
Configuration menu - View commit details
-
Copy full SHA for 331ed7c - Browse repository at this point
Copy the full SHA 331ed7cView commit details -
Replace phase completion section
Instead of using the PTX docs as a reference, use the C++ barrier docs as a reference.
Configuration menu - View commit details
-
Copy full SHA for 8e2f2ef - Browse repository at this point
Copy the full SHA 8e2f2efView commit details -
Configuration menu - View commit details
-
Copy full SHA for 41303da - Browse repository at this point
Copy the full SHA 41303daView commit details -
Configuration menu - View commit details
-
Copy full SHA for ce919a2 - Browse repository at this point
Copy the full SHA ce919a2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2040fa6 - Browse repository at this point
Copy the full SHA 2040fa6View commit details -
Configuration menu - View commit details
-
Copy full SHA for bb989b1 - Browse repository at this point
Copy the full SHA bb989b1View commit details