Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Add async scan algorithms #1251

Merged
merged 2 commits into from
Jan 21, 2021
Merged

Conversation

alliepiper
Copy link
Collaborator

@alliepiper alliepiper commented Aug 6, 2020

Prerequisite: NVIDIA/cub#210

@alliepiper alliepiper marked this pull request as draft August 6, 2020 01:24
alliepiper added a commit to alliepiper/thrust that referenced this pull request Aug 6, 2020
…ions.

This partially reverts 7ff227a and
fixes NVIDIA#1250.

I'm not sure why changing these broke the tests, but since these
usages are just testing details that are being refactored by NVIDIA#1251
let's just revert the change for now.

The test failures were only happening on GCC, MSVC was fine with both
versions of these functions, so it may be a compiler issue.
@alliepiper alliepiper force-pushed the feature/async_scan_algos branch from 45c8380 to 87ae049 Compare August 7, 2020 01:12
@brycelelbach brycelelbach added this to the 1.10.0 milestone Aug 7, 2020
alliepiper added a commit that referenced this pull request Aug 12, 2020
…ions.

This partially reverts 7ff227a and
fixes #1250.

I'm not sure why changing these broke the tests, but since these
usages are just testing details that are being refactored by #1251
let's just revert the change for now.

The test failures were only happening on GCC, MSVC was fine with both
versions of these functions, so it may be a compiler issue.
@alliepiper alliepiper force-pushed the feature/async_scan_algos branch 2 times, most recently from d6ae713 to 7caa692 Compare August 27, 2020 20:58
@alliepiper alliepiper self-assigned this Sep 3, 2020
@alliepiper alliepiper force-pushed the feature/async_scan_algos branch 4 times, most recently from ab507ce to 9e7dfe2 Compare September 10, 2020 18:12
@brycelelbach brycelelbach modified the milestones: 1.10.0, 1.11.0 Sep 16, 2020
@alliepiper alliepiper force-pushed the feature/async_scan_algos branch from 9e7dfe2 to c953231 Compare October 1, 2020 23:04
@alliepiper alliepiper force-pushed the feature/async_scan_algos branch from c953231 to c236ed8 Compare October 2, 2020 21:26
@alliepiper alliepiper marked this pull request as ready for review October 5, 2020 18:49
@alliepiper
Copy link
Collaborator Author

The "large indices" tests are currently failing because temporary memory allocation fails for the larger problem sizes. This doesn't happen for the synchronous scan (which uses a custom implementation instead of just calling cub::DeviceScan). I'll pull those tests out before merging if I can't find a solution/workaround.

@alliepiper alliepiper removed their assignment Oct 5, 2020
@alliepiper alliepiper force-pushed the feature/async_scan_algos branch from c236ed8 to 3ff9d84 Compare October 14, 2020 20:57
@alliepiper alliepiper added the blocked Cannot make progress. label Oct 14, 2020
@alliepiper
Copy link
Collaborator Author

This is ready for review, but will need NVIDIA/cub#213 merged first so we can test without hitting timeouts.

@alliepiper alliepiper added testing: gpuCI in progress Started gpuCI testing. and removed testing: gpuCI in progress Started gpuCI testing. blocked Cannot make progress. labels Oct 19, 2020
@hcedwar
Copy link

hcedwar commented Nov 9, 2020

The "files changed" code volume is quite large. Where is an algorithm & design summary to review this aspect vs. code details?

@alliepiper
Copy link
Collaborator Author

They're mostly changes to Thrust's internal testing infrastructure. The actual API changes here are very small and follow the existing conventions and design plan for Thrust's existing async algorithms.

There's a design doc from a couple years ago floating around for the async design. If you need to see it for some reason I can dig it up.

@alliepiper alliepiper modified the milestones: 1.11.0, 1.12.0 Nov 11, 2020
@alliepiper
Copy link
Collaborator Author

We discussed this in a call, but to update the PR review:

  • I'd like this to support input initialization on the host because there are some initialization strategies that are host only, like unittest::random_integer.

The input can be initialized on the host and copied to the device if a host-only strategy is needed. Since the input is more frequently accessed on the device (4 device invocations vs 1 host invocation), this reduces the amount of transfer overhead in the common case that inputs can be generated on the device.

  • I'd prefer to only generate input once and then copy.

Me too, but I'd also like to not require copyability for the generic input_type concept. If this ends up being problematic in practice I'm happy to revisit this.

  • I'd like to retain the unittest::random_integer tests for async algorithms.

Can do. I'll see about adapting it to run on the device, too -- we have device-side rng facilities, so hopefully it'll be trivial.

@alliepiper alliepiper force-pushed the feature/async_scan_algos branch from bf3c701 to 56c0eec Compare November 20, 2020 22:01
@alliepiper
Copy link
Collaborator Author

Addressed feedback. The mixed_types tests are failing now, I'll need to investigate this. We also discussed shortening the namespace for some of the device-specific test mixins, I'll fix that when I take a look at the new test failures.

@alliepiper
Copy link
Collaborator Author

mixed_types test should still be failing, but running DVS to try to repro build timeouts.

DVS CL: 29371528

@alliepiper alliepiper removed the testing: gpuCI passed Passed gpuCI testing. label Nov 30, 2020
@alliepiper
Copy link
Collaborator Author

Latest round of DVS runs show the following issues on Clang:

thrust/system/cuda/detail/async/exclusive_scan.h:160:8: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
thrust/system/cuda/detail/async/inclusive_scan.h:158:8: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]

@alliepiper alliepiper force-pushed the feature/async_scan_algos branch from 56c0eec to a071c17 Compare December 8, 2020 23:03
@alliepiper
Copy link
Collaborator Author

run tests

1 similar comment
@brycelelbach
Copy link
Collaborator

run tests

- iterator_value_t
- iterator_pointer_t
- iterator_reference_t
- iterator_difference_t
- iterator_system_t
@alliepiper alliepiper force-pushed the feature/async_scan_algos branch from a071c17 to 888512b Compare January 6, 2021 00:18
@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper alliepiper force-pushed the feature/async_scan_algos branch from 888512b to e1b3caa Compare January 6, 2021 02:20
@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper
Copy link
Collaborator Author

DVS CL: 29473661

@alliepiper alliepiper added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed testing: gpuCI in progress Started gpuCI testing. testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Jan 6, 2021
@alliepiper alliepiper merged commit 0d51e20 into NVIDIA:main Jan 21, 2021
@alliepiper alliepiper deleted the feature/async_scan_algos branch January 21, 2021 20:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants