Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Break debug_synchronous usage #525

Closed
gevtushenko opened this issue Jul 8, 2022 · 4 comments · Fixed by #527
Closed

Break debug_synchronous usage #525

gevtushenko opened this issue Jul 8, 2022 · 4 comments · Fixed by #527
Assignees
Labels
P0: must have Absolutely necessary. Critical issue, major blocker, etc. release: breaking change Include in "Breaking Changes" section of release notes. type: enhancement New feature or request.
Milestone

Comments

@gevtushenko
Copy link
Collaborator

Currently, each device-scope algorithm in CUB has a debug_synchronous parameter which is defaulted to true. Being a runtime variable, it leads to a few issues:

  1. In case of -rdc compilation, there's a higher memory footprint (issue).
  2. _CubLog has to inject code into binary, which might affect performance
  3. Thrust uses a THRUST_DEBUG_SYNC macro instead of a runtime variable. It'd be better to have consistency between libraries.

Since 2.0 is a breaking release, I suggest we break debug_synchronous support. Deprecation would lead to a code bloat, because we'd have to introduce new template parameters into dispatch/agent layers and still maintain the new scheme (see below). In general, the option should only be used for CUB debugging, so I don't think that it's a big issue. To inform users about the breaking change I suggest we leave the API when possible, but use a static assert to tell about the replacing macro:

cub::Device::API(...) { dispatch(...); }
cub::Device::API(..., bool debug_synchronous) { static_assert(false, "Use new approach"); }

To replace the parameter, I suggest we introduce a CUB_DEBUG_LEVEL macro with various convenience aliases. When set to 1 it'll lead to the same behaviour we have in the case of debug_synchronous=true: device synchronization after each kernel invocation, logging of kernel launches. When set to 2 it'll lead to precondition checks. For instance, we can check that pointers are device-accessible, that segments in segmented sort don't overlap etc. More importantly, this approach would allow us to embed precondition checks into kernels with no overhead.

The convenience macros might be:

#ifdef CUB_DEBUG_SYNC
#define CUB_DEBUG_LEVEL 1
#endif

#ifdef CUB_DEBUG_ASSERTIONS
#define CUB_DEBUG_LEVEL 2
#endif
@gevtushenko gevtushenko self-assigned this Jul 8, 2022
@alliepiper
Copy link
Collaborator

alliepiper commented Jul 8, 2022

I strongly support this. It would address several issues we've had with these flags, especially with the CDP changes introduced in CTK 11.6. Having a CI build running with the extra assertions would also add a significant boost to our testing!

It may be useful to introduce a debug level that just prints the debug info without syncing, something like:

// Extra logging, no syncs
#ifdef CUB_DEBUG_LOG
#define CUB_DEBUG_LEVEL 1
#endif

// Logging + syncs
#ifdef CUB_DEBUG_SYNC
#define CUB_DEBUG_LEVEL 2
#endif

// Logging + syncs + assertions
#ifdef CUB_DEBUG_ASSERTIONS
#define CUB_DEBUG_LEVEL 3
#endif

Re: static_asserting in the Device... APIs, this seems reasonable, but I'd like to give users an escape hatch in case they aren't able to update their code immediately for CUB 2.0.

I'm thinking something like this:

cub::Device::API(...) { dispatch(...); }

cub::Device::API(..., bool debug_synchronous)
{
#ifndef CUB_IGNORE_DEPRECATED_RUNTIME_DEBUG_SYNC
  static_assert(cub::detail::dependent_false<T>::value, "Use new approach"); 
#else
  cub::Device::API(...);
#endif
}

(grep Thrust for dependent_false for an implementation of that helper -- it'll be needed to make the static asserts fire only when the relevant APIs are used)

@alliepiper alliepiper added this to the 2.0.0 milestone Jul 8, 2022
@alliepiper alliepiper added type: enhancement New feature or request. P0: must have Absolutely necessary. Critical issue, major blocker, etc. release: breaking change Include in "Breaking Changes" section of release notes. labels Jul 8, 2022
@alliepiper
Copy link
Collaborator

Another question this opens up is how we expose this in Thrust. We currently have the THRUST_DEBUG_SYNC macro, would we add the others, or drop the Thrust macro and just use the CUB ones for simplicity's sake? Or just keep the existing Thrust macro for legacy's sake, but don't expose the new ones?

On one hand, the logging/syncing stuff is completely in the realm of CUB. These describe kernel launches, and we want all CUDA kernel-related code to (eventually) live in CUB.

But the assertions may bleed over into Thrust -- for instance, making the ASSERTIONS debug level enable checked iterators in Thrust could be useful.


The documentation for these macros should explicitly specify that they must be used consistently across all translation units and linked libraries that used CUB, otherwise results are undefined. This should be reasonable as these features shouldn't be needed in released production code.


Also, we may want to make the CUB_DEBUG_LEVEL an internal macro and use symbolic constants, in case we need to change the numeric values to add levels, etc.

Adding some convenience macros to simplify checks and make them more robust against typos would be useful as well:

/*
 * `CUB_DETAIL_DEBUG_LEVEL_*`: Implementation details, internal use only:
 */

#define CUB_DETAIL_DEBUG_LEVEL_NONE 0
#define CUB_DETAIL_DEBUG_LEVEL_LOG 1
#define CUB_DETAIL_DEBUG_LEVEL_SYNC 2
#define CUB_DETAIL_DEBUG_LEVEL_ASSERTIONS 3

/*
 * `CUB_DEBUG_*`: User interfaces:
 */

// Extra logging, no syncs
#ifdef CUB_DEBUG_LOG
#define CUB_DETAIL_DEBUG_LEVEL CUB_DETAIL_DEBUG_LEVEL_LOG
#endif

// Logging + syncs
#ifdef CUB_DEBUG_SYNC
#define CUB_DETAIL_DEBUG_LEVEL CUB_DETAIL_DEBUG_LEVEL_SYNC
#endif

// Logging + syncs + assertions
#ifdef CUB_DEBUG_ASSERTIONS
#define CUB_DETAIL_DEBUG_LEVEL CUB_DETAIL_DEBUG_LEVEL_ASSERTIONS
#endif

// Default case, no extra debugging:
#ifndef CUB_DETAIL_DEBUG_LEVEL
#define CUB_DETAIL_DEBUG_LEVEL CUB_DETAIL_DEBUG_LEVEL_NONE
#endif

/*
 * `CUB_DETAIL_DEBUG_ENABLE_*`:
 * Internal implementation details, used for testing enabled debug features:
 */

#if CUB_DETAIL_DEBUG_LEVEL >= CUB_DETAIL_DEBUG_LEVEL_LOG
#define CUB_DETAIL_DEBUG_ENABLE_LOG
#endif

#if CUB_DETAIL_DEBUG_LEVEL >= CUB_DETAIL_DEBUG_LEVEL_SYNC
#define CUB_DETAIL_DEBUG_ENABLE_SYNC
#endif

#if CUB_DETAIL_DEBUG_LEVEL >= CUB_DETAIL_DEBUG_LEVEL_ASSERTIONS
#define CUB_DETAIL_DEBUG_ENABLE_ASSERTIONS
#endif

/*
 * Usage pseudocode example:
 */

// Usage:
void some_algo()
{
#ifdef CUB_DETAIL_DEBUG_ENABLE_ASSERTIONS
  validate_inputs();
#endif

#ifdef CUB_DETAIL_DEBUG_ENABLE_LOG
  print_log();
#endif

  launch_kernel();

#ifdef CUB_DETAIL_DEBUG_ENABLE_SYNC
  sync();
#endif
}

@gevtushenko
Copy link
Collaborator Author

  1. separation of log and sync - this is a great idea, I agree
  2. CUB_IGNORE_DEPRECATED_RUNTIME_DEBUG_SYNC also a good catch, if someone is using a third-party header only library, the'll be able to affect the error
  3. THRUST_DEBUG_SYNC - I'd prefer to keep THRUST_DEBUG_SYNC and define a proper CUB macro based on it. If thrust assertions has to be checked, we have to introduce THRUST_DEBUG_ASSERTIONS and check them at thrust side.
  4. symbolic constants - love the *_ENABLE part, will do this way

@gevtushenko
Copy link
Collaborator Author

A few details that arose:

  1. We'd like to introduce CUB_DEBUG_DEVICE_ASSERTIONS in compliment to CUB_DEBUG_{HOST_?}ASSERTIONS to distinguish between precondition checks on the host and device.
  2. We'd like to select a default for CUB_DEBUG_ASSERTIONS based on NDEBUG but have a way to opt it out
  3. We'll only provide a way to opt in CUB_DEBUG_DEVICE_ASSERTIONS. Device-side assertions are off by default.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P0: must have Absolutely necessary. Critical issue, major blocker, etc. release: breaking change Include in "Breaking Changes" section of release notes. type: enhancement New feature or request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants