-
Notifications
You must be signed in to change notification settings - Fork 449
P2322R6 accumulator types for reduce #509
P2322R6 accumulator types for reduce #509
Conversation
This PR should be merged after libcu++ supports device lambdas in invoke result: NVIDIA/libcudacxx#284 |
cub::detail::non_void_value_t< | ||
OutputIteratorT, | ||
cub::detail::value_t<InputIteratorT>>, | ||
typename AccumT = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the release: breaking change
label and put a description of these changes to the Dispatch interface in the PR description? That way I'll make sure to call these changes out in the release notes.
Same goes for all of the accumulator-type changes in behavior -- I check that label when I'm building relnotes from the list of PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need a couple of things before merging:
- Rebase for
debug_synchronous
/DebugSyncStream
/CDP
changes - Add summary of breaking changes to PR description.
a064b12
to
921885f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I am yet qualified to really approve it but I can complain
|
a87316b
to
1e98f6f
Compare
Thanks for writing this up! Can you edit the PR description/first comment and add this to it? I'm less likely to overlook it that way :) |
Sorry, didn't think about this aspect 😄 I'll definitely update the description. |
This PR addresses the following issue. I also found that we use copy assignment operator on uninitialized memory, which might lead to issues for not primitive types.
AgentReduce
:ConsumeTile
uses accumulator type for thread aggregate instead of output iterator value type.ConsumeTile
returns accumulator type instead of output iterator value type.DeviceReduceKernel
doesn't accept output iterator as a template parameter. Apart from that, it now accepts accumulator type.DeviceReduceSingleTileKernel
now accepts accumulator type.DeviceSegmentedReduceKernel
now accepts accumulator type.DeviceReducePolicy
now accepts accumulator type instead of input iterator value type. It also doesn't accept output iterator value type now.DispatchReduce
:init
as initial type instead of output iterator value type.DispatchSegmentedReduce
:Equality
,Inequality
,InequalityWrapper
,Sum
,Difference
,Division
,Max
,ArgMax
,Min
,ArgMin
.ThreadReduce
now accepts accumulator type and use different type for prefix