Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cuda::device::barrier_arrive tx #358

Merged
merged 68 commits into from
Sep 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
5747b51
Add barrier::arrive_tx
ahendriksen Aug 18, 2023
dcb64a8
Implement review feedback
ahendriksen Aug 18, 2023
8df2664
Add test
ahendriksen Aug 18, 2023
b007800
arrive_tx: Trap on older architectures
ahendriksen Aug 18, 2023
8f88451
Implement review feedback
ahendriksen Aug 21, 2023
4ed5686
Remove includes that are unavailable in nvrtc
ahendriksen Aug 21, 2023
01edfd2
Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp
ahendriksen Aug 23, 2023
4a0ae94
Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp
ahendriksen Aug 23, 2023
89b8a33
Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp
ahendriksen Aug 23, 2023
7c91f12
Update libcudacxx/.upstream-tests/test/cuda/barrier_arrive_tx.pass.cpp
ahendriksen Aug 23, 2023
889bea5
Saved by [no_discard](!)
ahendriksen Aug 23, 2023
2769530
Move barrier tests into their own subfolder
miscco Aug 25, 2023
3f36503
Split arrive_tx test up
miscco Aug 25, 2023
fe358cd
Ensure that we silence "unused variable" warnings
miscco Aug 25, 2023
104d739
Add a feature test macro for cp_async_exposure
miscco Aug 25, 2023
7f06ca4
Change feature test macro name
ahendriksen Aug 28, 2023
edfbba4
Actually change the macro name
ahendriksen Aug 28, 2023
6d4a7c7
Fix typo in feature test macro
ahendriksen Aug 28, 2023
c97f318
Implement review feedback
ahendriksen Aug 29, 2023
e04db94
Fix feature flag test
ahendriksen Aug 29, 2023
4107b77
Invert feature test macro availability
ahendriksen Aug 29, 2023
1756350
Fix license headers
ahendriksen Aug 29, 2023
9447f36
Update libcudacxx/.upstream-tests/test/cuda/barrier/arrive_tx.h
ahendriksen Aug 30, 2023
3cc5f24
Update libcudacxx/.upstream-tests/test/cuda/barrier/arrive_tx.h
ahendriksen Aug 30, 2023
6f2bcdf
Use more descriptive header guard
ahendriksen Aug 30, 2023
61744ce
Address more review feedback
miscco Aug 30, 2023
f35972b
Make tests work on the host
ahendriksen Aug 30, 2023
1b647c0
Add arrive_tx to non-thread-scoped barriers
ahendriksen Aug 30, 2023
a9026b5
Cleanup `_LIBCUDACXX_DEBUG_ASSERT`
miscco Aug 31, 2023
a045347
Fix asserts
ahendriksen Aug 31, 2023
4d4c1ea
Fix nvrtc tests
ahendriksen Aug 31, 2023
c2fec0c
Revert to _LIBCUDACXX_DEBUG_ASSERT
ahendriksen Aug 31, 2023
228e072
Remove debug assert from constexpr init
ahendriksen Aug 31, 2023
0e0c661
Add and test cuda::device::arrive_tx
ahendriksen Sep 4, 2023
f83e7ca
Remove barrier::arrive_tx
ahendriksen Sep 4, 2023
c092f56
Add docs
ahendriksen Sep 4, 2023
3bff8e9
Update CC sentence
ahendriksen Sep 4, 2023
7fa3ad2
Run arrive_tx tests from SM90 onwards
ahendriksen Sep 5, 2023
4eb18f9
Document feature flag
ahendriksen Sep 5, 2023
6e7d312
Make feature flag available in host compilation trajectory
ahendriksen Sep 5, 2023
3b9a184
Add pre-conditions, effects discussion
ahendriksen Sep 7, 2023
c0513bf
Improve architecture checking
ahendriksen Sep 7, 2023
514e330
Add back assert for expected unit
ahendriksen Sep 7, 2023
f96f20a
Add __syncthreads() to example
ahendriksen Sep 7, 2023
901ab2a
arrive_tx: disallow arrive_count == 0
ahendriksen Sep 7, 2023
3407e30
Simplify state space handling
ahendriksen Sep 7, 2023
31b0de9
Check arrive_count_update upper limit
ahendriksen Sep 7, 2023
ee9db4a
Update arch check test
ahendriksen Sep 7, 2023
d56cd4e
Add completion function template parameter for ABI
ahendriksen Sep 7, 2023
c7c4484
Test that arrive_tx fails on cluster and device memory
ahendriksen Sep 7, 2023
db23c52
Test that arrive_tx fails with completion function
ahendriksen Sep 7, 2023
c280a68
Remove expected unit assert again
ahendriksen Sep 8, 2023
1401d4a
Hide arrive_tx on older architectures
ahendriksen Sep 8, 2023
3b8503f
Pull out barrier_native_handle into variable
ahendriksen Sep 8, 2023
c6535a2
Fix hiding of arrive_tx
ahendriksen Sep 8, 2023
f2b38e7
Rename arrive_tx => barrier_arrive_tx
ahendriksen Sep 8, 2023
490b166
Add docs for barrier with transaction count
ahendriksen Sep 8, 2023
1739496
Markup cluster test
ahendriksen Sep 11, 2023
53403d2
Remove completion function template param
ahendriksen Sep 11, 2023
c17b3ed
Trap in complete_tx in arrive_tx tests
ahendriksen Sep 11, 2023
c41fd9e
Exclude failing nvrtc test
ahendriksen Sep 11, 2023
1443a90
Remove ref to UB in effects section
ahendriksen Sep 12, 2023
331ed7c
Update docs
ahendriksen Sep 11, 2023
8e2f2ef
Replace phase completion section
ahendriksen Sep 12, 2023
41303da
Pull out __cvta_generic_to_shared
ahendriksen Sep 12, 2023
ce919a2
Add back debug assert in constexpr
ahendriksen Sep 12, 2023
2040fa6
Disable debug assert on C++11
ahendriksen Sep 12, 2023
bb989b1
[skip-tests] Use same format for standard lib text modification
ahendriksen Sep 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions libcudacxx/.upstream-tests/test/cuda/barrier/arrive_tx.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//
#ifndef TEST_ARRIVE_TX_H_
#define TEST_ARRIVE_TX_H_

#include <cuda/barrier>

#include <cuda/std/utility>

#include "concurrent_agents.h"
#include "cuda_space_selector.h"
#include "test_macros.h"

// Suppress warning about barrier in shared memory
TEST_NV_DIAG_SUPPRESS(static_var_with_dynamic_init)

template<typename Barrier>
inline __device__
void mbarrier_complete_tx(
Barrier &b, int transaction_count)
{
NV_DISPATCH_TARGET(
NV_PROVIDES_SM_90, (
if (__isShared(cuda::device::barrier_native_handle(b))) {
asm volatile(
"mbarrier.complete_tx.relaxed.cta.shared::cta.b64 [%0], %1;"
:
: "r"((unsigned int) __cvta_generic_to_shared(cuda::device::barrier_native_handle(b))),
"r"(transaction_count)
: "memory");
} else {
__trap();
}
), NV_ANY_TARGET, (
// On architectures pre-SM90 (and on host), we drop the transaction count
// update. The barriers do not keep track of transaction counts.
__trap();
)
);
}

template<typename Barrier>
__device__
void thread(Barrier& b, int arrives_per_thread)
{
constexpr int tx_count = 1;
auto tok = cuda::device::barrier_arrive_tx(b, arrives_per_thread, tx_count);
// Manually increase the transaction count of the barrier.
mbarrier_complete_tx(b, tx_count);
ahendriksen marked this conversation as resolved.
Show resolved Hide resolved

b.wait(cuda::std::move(tok));
}

__device__
void test()
{
NV_DISPATCH_TARGET(
NV_IS_DEVICE, (
// Run all threads, each arriving with arrival count 1
constexpr auto block = cuda::thread_scope_block;

__shared__ cuda::barrier<block> bar_1;
init(&bar_1, (int) blockDim.x);
__syncthreads();
thread(bar_1, 1);

// Run all threads, each arriving with arrival count 2
__shared__ cuda::barrier<block> bar_2;
init(&bar_2, (int) 2 * blockDim.x);
__syncthreads();
thread(bar_2, 2);
)
);
}

#endif // TEST_ARRIVE_TX_H_
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//
//
// UNSUPPORTED: libcpp-has-no-threads
// UNSUPPORTED: pre-sm-90

// <cuda/barrier>

#include <cooperative_groups.h>
#include <cuda/barrier>
#include "test_macros.h"

// Suppress warning about barrier in shared memory
TEST_NV_DIAG_SUPPRESS(static_var_with_dynamic_init)

int main(int, char**){
NV_DISPATCH_TARGET(
NV_IS_HOST, (
// When PR #416 is merged, uncomment this line:
// cuda_cluster_size = 2;
),
NV_IS_DEVICE, (
__shared__ cuda::barrier<cuda::thread_scope_block> bar;

if (threadIdx.x == 0) {
init(&bar, blockDim.x);
}
namespace cg = cooperative_groups;
auto cluster = cg::this_cluster();

cluster.sync();

// This test currently fails at this point because support for
// clusters has not yet been added.
cuda::barrier<cuda::thread_scope_block> *remote_bar;
remote_bar = cluster.map_shared_rank(&bar, cluster.block_rank() ^ 1);

// When PR #416 is merged, this should fail here because the barrier
// is in device memory.
auto token = cuda::device::barrier_arrive_tx(*remote_bar, 1, 0);
));
return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//
//
// UNSUPPORTED: libcpp-has-no-threads
// UNSUPPORTED: pre-sm-90

// <cuda/barrier>

#include "arrive_tx.h"

int main(int, char**)
{
NV_DISPATCH_TARGET(
NV_IS_HOST, (
// Required by concurrent_agents_launch to know how many we're
// launching. This can only be an int, because the nvrtc tests use grep
// to figure out how many threads to launch.
cuda_thread_count = 256;
),
NV_IS_DEVICE, (
test();
)
);

return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//
//
// UNSUPPORTED: libcpp-has-no-threads
// UNSUPPORTED: pre-sm-90
ahendriksen marked this conversation as resolved.
Show resolved Hide resolved

// <cuda/barrier>

#include <cuda/barrier>
#include "test_macros.h"

// Suppress warning about barrier in shared memory
TEST_NV_DIAG_SUPPRESS(static_var_with_dynamic_init)

__device__ uint64_t bar_storage;

int main(int, char**){
NV_IF_TARGET(
NV_IS_DEVICE, (
cuda::barrier<cuda::thread_scope_block> *bar_ptr;
bar_ptr = reinterpret_cast<cuda::barrier<cuda::thread_scope_block> *>(bar_storage);

if (threadIdx.x == 0) {
init(bar_ptr, blockDim.x);
}
__syncthreads();

// Should fail because the barrier is in device memory.
auto token = cuda::device::barrier_arrive_tx(*bar_ptr, 1, 0);
));
return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//
//
// UNSUPPORTED: libcpp-has-no-threads
// UNSUPPORTED: pre-sm-90

// <cuda/barrier>

#include <cuda/barrier>

#ifndef __cccl_lib_local_barrier_arrive_tx
static_assert(false, "should define __cccl_lib_local_barrier_arrive_tx");
#endif // __cccl_lib_local_barrier_arrive_tx

int main(int, char**){
return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//
//
// UNSUPPORTED: libcpp-has-no-threads
// UNSUPPORTED: nvrtc
// UNSUPPORTED: pre-sm-70

// <cuda/barrier>

#include <cuda/barrier>

int main(int, char**){
NV_IF_TARGET(
NV_IS_DEVICE, (
__shared__ cuda::barrier<cuda::thread_scope_block> bar;
if (threadIdx.x == 0) {
init(&bar, blockDim.x);
}
__syncthreads();

// barrier_arrive_tx should fail on SM70 and SM80, because it is hidden.
auto token = cuda::device::barrier_arrive_tx(bar, 1, 0);

#if defined(__CUDA_MINIMUM_ARCH__) && 900 <= __CUDA_MINIMUM_ARCH__
static_assert(false, "Fail manually for SM90 and up.");
#endif // __CUDA_MINIMUM_ARCH__
));
return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//
//
// UNSUPPORTED: libcpp-has-no-threads
// UNSUPPORTED: pre-sm-90

// <cuda/barrier>

#include "arrive_tx.h"

int main(int, char**)
{
NV_DISPATCH_TARGET(
NV_IS_HOST, (
// Required by concurrent_agents_launch to know how many we're
// launching. This can only be an int, because the nvrtc tests use grep
// to figure out how many threads to launch.
cuda_thread_count = 2;
),
NV_IS_DEVICE, (
test();
)
);

return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//
//
// UNSUPPORTED: libcpp-has-no-threads
// UNSUPPORTED: pre-sm-90

// <cuda/barrier>

#include "arrive_tx.h"

int main(int, char**)
{
NV_DISPATCH_TARGET(
NV_IS_HOST, (
// Required by concurrent_agents_launch to know how many we're
// launching. This can only be an int, because the nvrtc tests use grep
// to figure out how many threads to launch.
cuda_thread_count = 32;
),
NV_IS_DEVICE, (
test();
)
);

return 0;
}
Loading
Loading