Skip to content

Releases: brycelelbach/cub_historical_2019_2020

CUB 1.0.2

19 May 07:28
Compare
Choose a tag to compare

Summary

CUB 1.0.2 is a minor release.

Bug Fixes

  • Corrections to code snippet examples for cub::BlockLoad, cub::BlockStore, and cub::BlockDiscontinuity.
  • Cleaned up unnecessary/missing header includes. You can now safely include a specific .cuh (instead of cub.cuh).
  • Bug/compilation fixes for cub::BlockHistogram.

CUB 1.0.1

19 May 07:27
Compare
Choose a tag to compare

Summary

CUB 1.0.1 adds cub::DeviceRadixSort and cub::DeviceScan. Numerous other performance and correctness fixes and included.

Breaking Changes

  • New collective interface idiom (specialize/construct/invoke).

New Features

  • cub::DeviceRadixSort. Implements short-circuiting for homogenous digit passes.
  • cub::DeviceScan. Implements single-pass "adaptive-lookback" strategy.

Other Enhancements

  • Significantly improved documentation (with example code snippets).
  • More extensive regression test suit for aggressively testing collective variants.
  • Allow non-trially-constructed types (previously unions had prevented aliasing temporary storage of those types).
  • Improved support for SM3x SHFL (collective ops now use SHFL for types larger than 32 bits).
  • Better code generation for 64-bit addressing within cub::BlockLoad/cub::BlockStore.
  • cub::DeviceHistogram now supports histograms of arbitrary bins.
  • Updates to accommodate CUDA 5.5 dynamic parallelism.

Bug Fixes

  • Workarounds for SM10 codegen issues in uncommonly-used cub::WarpScan/cub::WarpReduce specializations.

CUB 0.9.4

19 May 07:25
Compare
Choose a tag to compare

Summary

CUB 0.9.3 is a minor release.

Enhancements

  • Various documentation updates and corrections.

Bug Fixes

  • Fixed compilation errors for SM1x.
  • Fixed compilation errors for some WarpScan entrypoints on SM3x and up.

CUB 0.9.3

19 May 08:23
Compare
Choose a tag to compare

Summary

CUB 0.9.3 adds histogram algorithms and work management utility descriptors.

New Features

  • cub::DevicHistogram256.
  • cub::BlockHistogram256.
  • cub::BlockScan algorithm variant BLOCK_SCAN_RAKING_MEMOIZE, which trades more register consumption for less shared memory I/O.
  • cub::GridQueue, cub::GridEvenShare, work management utility descriptors.

Other Enhancements

  • Updates to cub::BlockRadixRank to use cub::BlockScan, which improves performance on SM3x by using SHFL.
  • Allow types other than builtin types to be used in cub::WarpScan::*Sum methods if they only have operator+ overloaded. Previously they also required to support assignment from int(0).
  • Update cub::BlockReduce's BLOCK_REDUCE_WARP_REDUCTIONS algorithm to work even when block size is not an even multiple of warp size.
  • Refactoring of cub::DeviceAllocator interface and cub::CachingDeviceAllocator implementation.

CUB 0.9.2

19 May 08:22
Compare
Choose a tag to compare

Summary

CUB 0.9.2 adds cub::WarpReduce.

New Features

  • cub::WarpReduce, which uses the SHFL instruction when applicable. cub::BlockReduce now uses this cub::WarpReduce instead of implementing its own.

Enhancements

  • Documentation updates and corrections.

Bug Fixes

  • Fixes for 64-bit Linux compilation warnings and errors.

CUB 0.9.1

19 May 08:21
Compare
Choose a tag to compare

Summary

CUB 0.9.1 is a minor release.

Bug Fixes

  • Fix for ambiguity in cub::BlockScan::Reduce between generic reduction and summation. Summation entrypoints are now called ::Sum(), similar to the convention in cub::BlockScan.
  • Small edits to documentation and download tracking.

CUB 0.9.0

19 May 08:20
Compare
Choose a tag to compare

Summary

Initial preview release. CUB is the first durable, high-performance library of cooperative block-level, warp-level, and thread-level primitives for CUDA kernel programming.