-
Notifications
You must be signed in to change notification settings - Fork 448
Break debug_synchronous
usage
#525
Comments
I strongly support this. It would address several issues we've had with these flags, especially with the CDP changes introduced in CTK 11.6. Having a CI build running with the extra assertions would also add a significant boost to our testing! It may be useful to introduce a debug level that just prints the debug info without syncing, something like:
Re: I'm thinking something like this:
(grep Thrust for |
Another question this opens up is how we expose this in Thrust. We currently have the On one hand, the logging/syncing stuff is completely in the realm of CUB. These describe kernel launches, and we want all CUDA kernel-related code to (eventually) live in CUB. But the assertions may bleed over into Thrust -- for instance, making the The documentation for these macros should explicitly specify that they must be used consistently across all translation units and linked libraries that used CUB, otherwise results are undefined. This should be reasonable as these features shouldn't be needed in released production code. Also, we may want to make the Adding some convenience macros to simplify checks and make them more robust against typos would be useful as well:
|
|
A few details that arose:
|
Currently, each device-scope algorithm in CUB has a
debug_synchronous
parameter which is defaulted totrue
. Being a runtime variable, it leads to a few issues:-rdc
compilation, there's a higher memory footprint (issue)._CubLog
has to inject code into binary, which might affect performanceTHRUST_DEBUG_SYNC
macro instead of a runtime variable. It'd be better to have consistency between libraries.Since 2.0 is a breaking release, I suggest we break
debug_synchronous
support. Deprecation would lead to a code bloat, because we'd have to introduce new template parameters into dispatch/agent layers and still maintain the new scheme (see below). In general, the option should only be used for CUB debugging, so I don't think that it's a big issue. To inform users about the breaking change I suggest we leave the API when possible, but use a static assert to tell about the replacing macro:To replace the parameter, I suggest we introduce a
CUB_DEBUG_LEVEL
macro with various convenience aliases. When set to1
it'll lead to the same behaviour we have in the case ofdebug_synchronous=true
: device synchronization after each kernel invocation, logging of kernel launches. When set to2
it'll lead to precondition checks. For instance, we can check that pointers are device-accessible, that segments in segmented sort don't overlap etc. More importantly, this approach would allow us to embed precondition checks into kernels with no overhead.The convenience macros might be:
The text was updated successfully, but these errors were encountered: