Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Update CDP support macros for if-target compatibility #486

Merged
merged 5 commits into from
Jun 29, 2022

Conversation

alliepiper
Copy link
Collaborator

@alliepiper alliepiper commented May 18, 2022

Goes with NVIDIA/thrust#1661. Updates the CDP/CUDART macros to be consistent across host/device passes and thus compatible with NV_IF_TARGET.

All device algorithm tests with existing CDP tests have been updated to exercise this functionality. Some device algorithm tests do not have CDP tests and will need to be updated in a future PR.

Pre-written Release Notes

Breaking Changes

  • Update CDP support macros for if-target compatibility #486: CUB’s CUDA Runtime support macros have been updated to support NV_IF_TARGET. They are now defined consistently across all host/device compilation passes. This should not affect most usages of these macros, but may require changes for some edge cases.
    • CUB_RUNTIME_FUNCTION: Execution space annotations for functions that invoke CUDA Runtime APIs.
      • Old behavior:
        • RDC enabled: Defined to __host__ __device__
        • RDC not enabled:
          • NVCC host pass: Defined to __host__ __device__
          • NVCC device pass: Defined to __host__
      • New behavior:
        • RDC enabled: Defined to __host__ __device__
        • RDC not enabled: Defined to __host__
    • CUB_RUNTIME_ENABLED: No change in behavior, but no longer used in CUB. Provided for legacy support only. Legacy behavior:
      • RDC enabled: Macro is defined.
      • RDC not enabled:
        • NVCC host pass: Macro is defined.
        • NVCC device pass: Macro is not defined.
    • CUB_RDC_ENABLED: New macro, may be combined with NV_IF_TARGET to replace most usages of CUB_RUNTIME_ENABLED. Behavior:
      • RDC enabled: Macro is defined.
      • RDC not enabled: Macro is not defined.

Other Enhancements

@alliepiper alliepiper added the P0: must have Absolutely necessary. Critical issue, major blocker, etc. label May 18, 2022
@alliepiper alliepiper added this to the 2.0.0 milestone May 18, 2022
@alliepiper alliepiper marked this pull request as draft May 18, 2022 16:34
@alliepiper alliepiper added the release: breaking change Include in "Breaking Changes" section of release notes. label May 18, 2022
@alliepiper alliepiper changed the title WIP if-target CDP refactor Update CDP support macros for if-target compatibility May 19, 2022
@alliepiper alliepiper requested a review from gevtushenko May 19, 2022 14:58
@alliepiper alliepiper marked this pull request as ready for review May 19, 2022 14:58
@alliepiper alliepiper added the release: notes PR description contains pre-written release notes. label May 19, 2022
Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a significant improvement and a huge work. Thank you for your efforts! There are but a few minor comments below.

cub/device/dispatch/dispatch_segmented_sort.cuh Outdated Show resolved Hide resolved
test/test_device_radix_sort.cu Outdated Show resolved Hide resolved
test/test_device_reduce.cu Outdated Show resolved Hide resolved
The definition of CUB_RUNTIME_FUNCTION is modified to be
consistently defined across all NVCC compilation passes. Rather
than depending on __CUDA_ARCH__, its definition now only depends
on whether or not RDC is enabled.

A new CUB_RDC_ENABLED macro has been added as a replacement for
CUB_RUNTIME_ENABLED. CUB_RUNTIME_ENABLED also depends on the
definition of __CUDA_ARCH__ and should no longer be used
in CUB code. It is provided for legacy purposes only. Usages should
be replaced by NV_IF_TARGET and the new CUB_RDC_ENABLED macro.

To summarize the differences between the old and new macros:

- `CUB_RUNTIME_FUNCTION`
  - Old:
    - RDC enabled: Defined to `__host__ __device__`.
    - RDC not enabled:
      - NVCC host pass: Defined to `__host__ __device__`.
      - NVCC device pass: Defined to `__host__`.
  - New:
    - RDC enabled: Defined to `__host__ __device__`.
    - RDC not enabled: Defined to `__host__`.
- `CUB_*_ENABLED`
  - `RUNTIME` (old)
    - RDC enabled: Macro is defined.
    - RDC not enabled:
      - NVCC host pass: Macro is defined.
      - NVCC device pass: Macro is not defined.
  - `RDC` (new)
    - RDC enabled: Macro is defined.
    - RDC not enabled: Macro is not defined.

The most common pattern used with the old macros looked like this:

```
CUB_RUNTIME_FUNCTION
cudaError_t SomeFunction()
{
  return cudaErrorNotSupported;
  // Actual implementation here
  return cudaSuccess;
}
```

This pattern can now be simplified to:

```
CUB_RUNTIME_FUNCTION
cudaError_t SomeFunction()
{
  // Actual implementation here
  return cudaSuccess;
}
```
Some test already had some support for CDP testing, this commit uses the
%PARAM% system to enable these.

The radix sort tests saw some addition changes that remove some
excessive testing. More work could be done here.
`SyncStream` should continue to be used for required synchronizations.

`DebugSyncStream` should only be used for sync that are non-essential
and only used to handle the `debug_synchronous` flag of the device
algorithms.
@alliepiper alliepiper merged commit a634b91 into NVIDIA:main Jun 29, 2022
@alliepiper alliepiper deleted the if_target_cdp branch June 29, 2022 16:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P0: must have Absolutely necessary. Critical issue, major blocker, etc. release: breaking change Include in "Breaking Changes" section of release notes. release: notes PR description contains pre-written release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants