-
Notifications
You must be signed in to change notification settings - Fork 449
Fix block radix rank for blocks with non multiple of 32 threads #564
Fix block radix rank for blocks with non multiple of 32 threads #564
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some suggestions inline.
RankAlgorithm == cub::RadixRankAlgorithm::RADIX_RANK_MATCH_EARLY_COUNTS_ANY || | ||
RankAlgorithm == cub::RadixRankAlgorithm::RADIX_RANK_MATCH_EARLY_COUNTS_ATOMIC_OR; | ||
|
||
if (uses_warp_striped_arrangement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUB_IF_CONSTEXPR
(in <cub/detail/cpp_compatibility.cuh>
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's a test that's probably optimized by nvcc in any way, I don't see a reason to deteriorate readability. Is there anything I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to merge the PR. If there's a reason to change this, please let me know and I'll update the code in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some compilers will issue a warning when a constant conditional is evaluated in a regular if (...)
. We should be using the if constexpr
macros in public headers for these cases. Since this is just a test, we can fix it later if it becomes an issue.
cub::BFEDigitExtractor<Key> extractor(0, RadixBits); | ||
block_radix_rank(temp_storage).RankKeys(keys, ranks, extractor); | ||
|
||
if (uses_warp_striped_arrangement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
4735117
to
1b19a77
Compare
This PR addresses the following issue by masking inactive threads in match any. The code generation hasn't changed for thread blocks with
(BlockThreads % 32) = 0
.