Break `debug_synchronous` usage #525

gevtushenko · 2022-07-08T10:39:28Z

Currently, each device-scope algorithm in CUB has a debug_synchronous parameter which is defaulted to true. Being a runtime variable, it leads to a few issues:

In case of -rdc compilation, there's a higher memory footprint (issue).
_CubLog has to inject code into binary, which might affect performance
Thrust uses a THRUST_DEBUG_SYNC macro instead of a runtime variable. It'd be better to have consistency between libraries.

Since 2.0 is a breaking release, I suggest we break debug_synchronous support. Deprecation would lead to a code bloat, because we'd have to introduce new template parameters into dispatch/agent layers and still maintain the new scheme (see below). In general, the option should only be used for CUB debugging, so I don't think that it's a big issue. To inform users about the breaking change I suggest we leave the API when possible, but use a static assert to tell about the replacing macro:

cub::Device::API(...) { dispatch(...); }
cub::Device::API(..., bool debug_synchronous) { static_assert(false, "Use new approach"); }

To replace the parameter, I suggest we introduce a CUB_DEBUG_LEVEL macro with various convenience aliases. When set to 1 it'll lead to the same behaviour we have in the case of debug_synchronous=true: device synchronization after each kernel invocation, logging of kernel launches. When set to 2 it'll lead to precondition checks. For instance, we can check that pointers are device-accessible, that segments in segmented sort don't overlap etc. More importantly, this approach would allow us to embed precondition checks into kernels with no overhead.

The convenience macros might be:

#ifdef CUB_DEBUG_SYNC
#define CUB_DEBUG_LEVEL 1
#endif

#ifdef CUB_DEBUG_ASSERTIONS
#define CUB_DEBUG_LEVEL 2
#endif

The text was updated successfully, but these errors were encountered:

alliepiper · 2022-07-08T17:34:51Z

I strongly support this. It would address several issues we've had with these flags, especially with the CDP changes introduced in CTK 11.6. Having a CI build running with the extra assertions would also add a significant boost to our testing!

It may be useful to introduce a debug level that just prints the debug info without syncing, something like:

// Extra logging, no syncs
#ifdef CUB_DEBUG_LOG
#define CUB_DEBUG_LEVEL 1
#endif

// Logging + syncs
#ifdef CUB_DEBUG_SYNC
#define CUB_DEBUG_LEVEL 2
#endif

// Logging + syncs + assertions
#ifdef CUB_DEBUG_ASSERTIONS
#define CUB_DEBUG_LEVEL 3
#endif

Re: static_asserting in the Device... APIs, this seems reasonable, but I'd like to give users an escape hatch in case they aren't able to update their code immediately for CUB 2.0.

I'm thinking something like this:

cub::Device::API(...) { dispatch(...); }

cub::Device::API(..., bool debug_synchronous)
{
#ifndef CUB_IGNORE_DEPRECATED_RUNTIME_DEBUG_SYNC
  static_assert(cub::detail::dependent_false<T>::value, "Use new approach"); 
#else
  cub::Device::API(...);
#endif
}

(grep Thrust for dependent_false for an implementation of that helper -- it'll be needed to make the static asserts fire only when the relevant APIs are used)

alliepiper · 2022-07-08T18:46:42Z

Another question this opens up is how we expose this in Thrust. We currently have the THRUST_DEBUG_SYNC macro, would we add the others, or drop the Thrust macro and just use the CUB ones for simplicity's sake? Or just keep the existing Thrust macro for legacy's sake, but don't expose the new ones?

On one hand, the logging/syncing stuff is completely in the realm of CUB. These describe kernel launches, and we want all CUDA kernel-related code to (eventually) live in CUB.

But the assertions may bleed over into Thrust -- for instance, making the ASSERTIONS debug level enable checked iterators in Thrust could be useful.

The documentation for these macros should explicitly specify that they must be used consistently across all translation units and linked libraries that used CUB, otherwise results are undefined. This should be reasonable as these features shouldn't be needed in released production code.

Also, we may want to make the CUB_DEBUG_LEVEL an internal macro and use symbolic constants, in case we need to change the numeric values to add levels, etc.

Adding some convenience macros to simplify checks and make them more robust against typos would be useful as well:

/*
 * `CUB_DETAIL_DEBUG_LEVEL_*`: Implementation details, internal use only:
 */

#define CUB_DETAIL_DEBUG_LEVEL_NONE 0
#define CUB_DETAIL_DEBUG_LEVEL_LOG 1
#define CUB_DETAIL_DEBUG_LEVEL_SYNC 2
#define CUB_DETAIL_DEBUG_LEVEL_ASSERTIONS 3

/*
 * `CUB_DEBUG_*`: User interfaces:
 */

// Extra logging, no syncs
#ifdef CUB_DEBUG_LOG
#define CUB_DETAIL_DEBUG_LEVEL CUB_DETAIL_DEBUG_LEVEL_LOG
#endif

// Logging + syncs
#ifdef CUB_DEBUG_SYNC
#define CUB_DETAIL_DEBUG_LEVEL CUB_DETAIL_DEBUG_LEVEL_SYNC
#endif

// Logging + syncs + assertions
#ifdef CUB_DEBUG_ASSERTIONS
#define CUB_DETAIL_DEBUG_LEVEL CUB_DETAIL_DEBUG_LEVEL_ASSERTIONS
#endif

// Default case, no extra debugging:
#ifndef CUB_DETAIL_DEBUG_LEVEL
#define CUB_DETAIL_DEBUG_LEVEL CUB_DETAIL_DEBUG_LEVEL_NONE
#endif

/*
 * `CUB_DETAIL_DEBUG_ENABLE_*`:
 * Internal implementation details, used for testing enabled debug features:
 */

#if CUB_DETAIL_DEBUG_LEVEL >= CUB_DETAIL_DEBUG_LEVEL_LOG
#define CUB_DETAIL_DEBUG_ENABLE_LOG
#endif

#if CUB_DETAIL_DEBUG_LEVEL >= CUB_DETAIL_DEBUG_LEVEL_SYNC
#define CUB_DETAIL_DEBUG_ENABLE_SYNC
#endif

#if CUB_DETAIL_DEBUG_LEVEL >= CUB_DETAIL_DEBUG_LEVEL_ASSERTIONS
#define CUB_DETAIL_DEBUG_ENABLE_ASSERTIONS
#endif

/*
 * Usage pseudocode example:
 */

// Usage:
void some_algo()
{
#ifdef CUB_DETAIL_DEBUG_ENABLE_ASSERTIONS
  validate_inputs();
#endif

#ifdef CUB_DETAIL_DEBUG_ENABLE_LOG
  print_log();
#endif

  launch_kernel();

#ifdef CUB_DETAIL_DEBUG_ENABLE_SYNC
  sync();
#endif
}

gevtushenko · 2022-07-08T20:47:31Z

separation of log and sync - this is a great idea, I agree
CUB_IGNORE_DEPRECATED_RUNTIME_DEBUG_SYNC also a good catch, if someone is using a third-party header only library, the'll be able to affect the error
THRUST_DEBUG_SYNC - I'd prefer to keep THRUST_DEBUG_SYNC and define a proper CUB macro based on it. If thrust assertions has to be checked, we have to introduce THRUST_DEBUG_ASSERTIONS and check them at thrust side.
symbolic constants - love the *_ENABLE part, will do this way

gevtushenko · 2022-07-11T16:46:43Z

A few details that arose:

We'd like to introduce CUB_DEBUG_DEVICE_ASSERTIONS in compliment to CUB_DEBUG_{HOST_?}ASSERTIONS to distinguish between precondition checks on the host and device.
We'd like to select a default for CUB_DEBUG_ASSERTIONS based on NDEBUG but have a way to opt it out
We'll only provide a way to opt in CUB_DEBUG_DEVICE_ASSERTIONS. Device-side assertions are off by default.

gevtushenko self-assigned this Jul 8, 2022

alliepiper added this to the 2.0.0 milestone Jul 8, 2022

alliepiper added type: enhancement New feature or request. P0: must have Absolutely necessary. Critical issue, major blocker, etc. release: breaking change Include in "Breaking Changes" section of release notes. labels Jul 8, 2022

alliepiper mentioned this issue Jul 8, 2022

Fix DebugSyncStream #523

Closed

This was referenced Jul 15, 2022

Deprecate debug synchronous #527

Merged

Fix main/GitHub/deprecate debug synchronous NVIDIA/thrust#1741

Merged

gevtushenko linked a pull request Jul 15, 2022 that will close this issue

Deprecate debug synchronous #527

Merged

gevtushenko closed this as completed in #527 Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break `debug_synchronous` usage #525

Break `debug_synchronous` usage #525

gevtushenko commented Jul 8, 2022

alliepiper commented Jul 8, 2022 •

edited

Loading

alliepiper commented Jul 8, 2022

gevtushenko commented Jul 8, 2022

gevtushenko commented Jul 11, 2022

Break debug_synchronous usage #525

Break debug_synchronous usage #525

Comments

gevtushenko commented Jul 8, 2022

alliepiper commented Jul 8, 2022 • edited Loading

alliepiper commented Jul 8, 2022

gevtushenko commented Jul 8, 2022

gevtushenko commented Jul 11, 2022

Break `debug_synchronous` usage #525

Break `debug_synchronous` usage #525

alliepiper commented Jul 8, 2022 •

edited

Loading