Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Fix block radix rank for blocks with non multiple of 32 threads #564

Merged

Conversation

gevtushenko
Copy link
Collaborator

This PR addresses the following issue by masking inactive threads in match any. The code generation hasn't changed for thread blocks with (BlockThreads % 32) = 0.

@gevtushenko gevtushenko added this to the 2.1.0 milestone Aug 31, 2022
@gevtushenko gevtushenko added the type: bug: functional Does not work as intended. label Aug 31, 2022
gevtushenko added a commit to gevtushenko/thrust that referenced this pull request Aug 31, 2022
@gevtushenko gevtushenko added the testing: gpuCI in progress Started gpuCI testing. label Aug 31, 2022
Copy link
Collaborator

@alliepiper alliepiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some suggestions inline.

cub/block/block_radix_rank.cuh Show resolved Hide resolved
cub/block/block_radix_rank.cuh Outdated Show resolved Hide resolved
cub/util_ptx.cuh Show resolved Hide resolved
RankAlgorithm == cub::RadixRankAlgorithm::RADIX_RANK_MATCH_EARLY_COUNTS_ANY ||
RankAlgorithm == cub::RadixRankAlgorithm::RADIX_RANK_MATCH_EARLY_COUNTS_ATOMIC_OR;

if (uses_warp_striped_arrangement)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUB_IF_CONSTEXPR

(in <cub/detail/cpp_compatibility.cuh>)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a test that's probably optimized by nvcc in any way, I don't see a reason to deteriorate readability. Is there anything I'm missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge the PR. If there's a reason to change this, please let me know and I'll update the code in a follow up PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some compilers will issue a warning when a constant conditional is evaluated in a regular if (...). We should be using the if constexpr macros in public headers for these cases. Since this is just a test, we can fix it later if it becomes an issue.

cub::BFEDigitExtractor<Key> extractor(0, RadixBits);
block_radix_rank(temp_storage).RankKeys(keys, ranks, extractor);

if (uses_warp_striped_arrangement)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

test/test_block_radix_rank.cu Show resolved Hide resolved
test/test_block_radix_rank.cu Show resolved Hide resolved
@gevtushenko gevtushenko force-pushed the fix-main/github/block_radix_rank branch from 4735117 to 1b19a77 Compare September 1, 2022 13:51
gevtushenko added a commit to gevtushenko/thrust that referenced this pull request Sep 1, 2022
@gevtushenko gevtushenko merged commit eee0ca9 into NVIDIA:main Sep 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
testing: gpuCI in progress Started gpuCI testing. type: bug: functional Does not work as intended.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

BlockRadixRankMatch produces invalid results when warp size does not divide block size
3 participants