-
Notifications
You must be signed in to change notification settings - Fork 756
Conversation
…ions. This partially reverts 7ff227a and fixes NVIDIA#1250. I'm not sure why changing these broke the tests, but since these usages are just testing details that are being refactored by NVIDIA#1251 let's just revert the change for now. The test failures were only happening on GCC, MSVC was fine with both versions of these functions, so it may be a compiler issue.
45c8380
to
87ae049
Compare
…ions. This partially reverts 7ff227a and fixes #1250. I'm not sure why changing these broke the tests, but since these usages are just testing details that are being refactored by #1251 let's just revert the change for now. The test failures were only happening on GCC, MSVC was fine with both versions of these functions, so it may be a compiler issue.
d6ae713
to
7caa692
Compare
ab507ce
to
9e7dfe2
Compare
9e7dfe2
to
c953231
Compare
c953231
to
c236ed8
Compare
The "large indices" tests are currently failing because temporary memory allocation fails for the larger problem sizes. This doesn't happen for the synchronous scan (which uses a custom implementation instead of just calling |
c236ed8
to
3ff9d84
Compare
This is ready for review, but will need NVIDIA/cub#213 merged first so we can test without hitting timeouts. |
The "files changed" code volume is quite large. Where is an algorithm & design summary to review this aspect vs. code details? |
They're mostly changes to Thrust's internal testing infrastructure. The actual API changes here are very small and follow the existing conventions and design plan for Thrust's existing async algorithms. There's a design doc from a couple years ago floating around for the async design. If you need to see it for some reason I can dig it up. |
We discussed this in a call, but to update the PR review:
The input can be initialized on the host and copied to the device if a host-only strategy is needed. Since the input is more frequently accessed on the device (4 device invocations vs 1 host invocation), this reduces the amount of transfer overhead in the common case that inputs can be generated on the device.
Me too, but I'd also like to not require copyability for the generic
Can do. I'll see about adapting it to run on the device, too -- we have device-side rng facilities, so hopefully it'll be trivial. |
bf3c701
to
56c0eec
Compare
Addressed feedback. The |
DVS CL: 29371528 |
Latest round of DVS runs show the following issues on Clang:
|
56c0eec
to
a071c17
Compare
run tests |
1 similar comment
run tests |
- iterator_value_t - iterator_pointer_t - iterator_reference_t - iterator_difference_t - iterator_system_t
a071c17
to
888512b
Compare
run tests |
888512b
to
e1b3caa
Compare
run tests |
DVS CL: 29473661 |
Prerequisite: NVIDIA/cub#210