Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

P2322R6 accumulator types for reduce #509

Conversation

gevtushenko
Copy link
Collaborator

@gevtushenko gevtushenko commented Jun 13, 2022

This PR addresses the following issue. I also found that we use copy assignment operator on uninitialized memory, which might lead to issues for not primitive types.

⚠️ Breaking changes:

  1. AgentReduce:
    1. now has extra template parameter representing the accumulator type.
    2. ConsumeTile uses accumulator type for thread aggregate instead of output iterator value type.
    3. ConsumeTile returns accumulator type instead of output iterator value type.
  2. DeviceReduceKernel doesn't accept output iterator as a template parameter. Apart from that, it now accepts accumulator type.
  3. DeviceReduceSingleTileKernel now accepts accumulator type.
  4. DeviceSegmentedReduceKernel now accepts accumulator type.
  5. DeviceReducePolicy now accepts accumulator type instead of input iterator value type. It also doesn't accept output iterator value type now.
  6. DispatchReduce:
    1. now accepts accumulator type as last parameter.
    2. now accepts initializer type instead of output iterator value type.
    3. doesn't work with device extended lambdas now.
    4. Constructor now accepts init as initial type instead of output iterator value type.
  7. DispatchSegmentedReduce:
    1. accepts accumulator type as last parameter.
    2. accepts initializer type instead of output iterator value type.
  8. Thread operators now accept parameters using different types: Equality, Inequality, InequalityWrapper, Sum, Difference, Division, Max, ArgMax, Min, ArgMin.
  9. ThreadReduce now accepts accumulator type and use different type for prefix
  10. Default accumulator type is now selected by introspecting return type of the reduce operator.

@gevtushenko gevtushenko requested a review from alliepiper June 13, 2022 12:40
@gevtushenko gevtushenko added testing: gpuCI in progress Started gpuCI testing. type: bug: functional Does not work as intended. labels Jun 13, 2022
@gevtushenko gevtushenko added this to the 2.0.0 milestone Jun 13, 2022
@gevtushenko
Copy link
Collaborator Author

This PR should be merged after libcu++ supports device lambdas in invoke result: NVIDIA/libcudacxx#284

@gevtushenko gevtushenko added testing: gpuCI passed Passed gpuCI testing. and removed testing: gpuCI in progress Started gpuCI testing. labels Jun 13, 2022
@alliepiper alliepiper added the P1: should have Necessary, but not critical. label Jun 22, 2022
cub::detail::non_void_value_t<
OutputIteratorT,
cub::detail::value_t<InputIteratorT>>,
typename AccumT =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the release: breaking change label and put a description of these changes to the Dispatch interface in the PR description? That way I'll make sure to call these changes out in the release notes.

Same goes for all of the accumulator-type changes in behavior -- I check that label when I'm building relnotes from the list of PRs.

@gevtushenko gevtushenko added the release: breaking change Include in "Breaking Changes" section of release notes. label Jun 24, 2022
@alliepiper alliepiper self-assigned this Jul 25, 2022
Copy link
Collaborator

@alliepiper alliepiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a couple of things before merging:

  • Rebase for debug_synchronous/DebugSyncStream/CDP changes
  • Add summary of breaking changes to PR description.

@alliepiper alliepiper removed their assignment Jul 25, 2022
@gevtushenko gevtushenko force-pushed the fix-main/github/reduce_intermediate_type branch from a064b12 to 921885f Compare July 28, 2022 09:10
Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I am yet qualified to really approve it but I can complain

cub/agent/agent_reduce.cuh Show resolved Hide resolved
cub/agent/agent_reduce.cuh Show resolved Hide resolved
cub/agent/agent_reduce.cuh Show resolved Hide resolved
cub/agent/agent_reduce.cuh Show resolved Hide resolved
cub/device/dispatch/dispatch_reduce.cuh Show resolved Hide resolved
cub/device/dispatch/dispatch_reduce.cuh Show resolved Hide resolved
cub/device/dispatch/dispatch_reduce.cuh Show resolved Hide resolved
cub/device/dispatch/dispatch_reduce.cuh Show resolved Hide resolved
cub/thread/thread_operators.cuh Show resolved Hide resolved
@gevtushenko
Copy link
Collaborator Author

⚠️ Breaking changes:

  1. AgentReduce:
    1. now has extra template parameter representing the accumulator type.
    2. ConsumeTile uses accumulator type for thread aggregate instead of output iterator value type.
    3. ConsumeTile returns accumulator type instead of output iterator value type.
  2. DeviceReduceKernel doesn't accept output iterator as a template parameter. Apart from that, it now accepts accumulator type.
  3. DeviceReduceSingleTileKernel now accepts accumulator type.
  4. DeviceSegmentedReduceKernel now accepts accumulator type.
  5. DeviceReducePolicy now accepts accumulator type instead of input iterator value type. It also doesn't accept output iterator value type now.
  6. DispatchReduce:
    1. now accepts accumulator type as last parameter.
    2. now accepts initializer type instead of output iterator value type.
    3. doesn't work with device extended lambdas now.
    4. Constructor now accepts init as initial type instead of output iterator value type.
  7. DispatchSegmentedReduce:
    1. accepts accumulator type as last parameter.
    2. accepts initializer type instead of output iterator value type.
  8. Thread operators now accept parameters using different types: Equality, Inequality, InequalityWrapper, Sum, Difference, Division, Max, ArgMax, Min, ArgMin.
  9. ThreadReduce now accepts accumulator type and use different type for prefix
  10. Default accumulator type is now selected by introspecting return type of the reduce operator.

@gevtushenko gevtushenko force-pushed the fix-main/github/reduce_intermediate_type branch from a87316b to 1e98f6f Compare August 1, 2022 17:00
@gevtushenko gevtushenko merged commit 81a96c9 into NVIDIA:main Aug 1, 2022
@alliepiper
Copy link
Collaborator

Breaking changes

Thanks for writing this up! Can you edit the PR description/first comment and add this to it? I'm less likely to overlook it that way :)

@gevtushenko
Copy link
Collaborator Author

Breaking changes

Thanks for writing this up! Can you edit the PR description/first comment and add this to it? I'm less likely to overlook it that way :)

Sorry, didn't think about this aspect 😄 I'll definitely update the description.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P1: should have Necessary, but not critical. release: breaking change Include in "Breaking Changes" section of release notes. testing: gpuCI passed Passed gpuCI testing. type: bug: functional Does not work as intended.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants