Releases: brycelelbach/cub_historical_2019_2020
Releases · brycelelbach/cub_historical_2019_2020
CUB 1.0.2
Summary
CUB 1.0.2 is a minor release.
Bug Fixes
- Corrections to code snippet examples for
cub::BlockLoad
,cub::BlockStore
, andcub::BlockDiscontinuity
. - Cleaned up unnecessary/missing header includes. You can now safely include a specific .cuh (instead of
cub.cuh
). - Bug/compilation fixes for
cub::BlockHistogram
.
CUB 1.0.1
Summary
CUB 1.0.1 adds cub::DeviceRadixSort
and cub::DeviceScan
. Numerous other performance and correctness fixes and included.
Breaking Changes
- New collective interface idiom (specialize/construct/invoke).
New Features
cub::DeviceRadixSort
. Implements short-circuiting for homogenous digit passes.cub::DeviceScan
. Implements single-pass "adaptive-lookback" strategy.
Other Enhancements
- Significantly improved documentation (with example code snippets).
- More extensive regression test suit for aggressively testing collective variants.
- Allow non-trially-constructed types (previously unions had prevented aliasing temporary storage of those types).
- Improved support for SM3x SHFL (collective ops now use SHFL for types larger than 32 bits).
- Better code generation for 64-bit addressing within
cub::BlockLoad
/cub::BlockStore
. cub::DeviceHistogram
now supports histograms of arbitrary bins.- Updates to accommodate CUDA 5.5 dynamic parallelism.
Bug Fixes
- Workarounds for SM10 codegen issues in uncommonly-used
cub::WarpScan
/cub::WarpReduce
specializations.
CUB 0.9.4
Summary
CUB 0.9.3 is a minor release.
Enhancements
- Various documentation updates and corrections.
Bug Fixes
- Fixed compilation errors for SM1x.
- Fixed compilation errors for some WarpScan entrypoints on SM3x and up.
CUB 0.9.3
Summary
CUB 0.9.3 adds histogram algorithms and work management utility descriptors.
New Features
cub::DevicHistogram256
.cub::BlockHistogram256
.cub::BlockScan
algorithm variantBLOCK_SCAN_RAKING_MEMOIZE
, which trades more register consumption for less shared memory I/O.cub::GridQueue
,cub::GridEvenShare
, work management utility descriptors.
Other Enhancements
- Updates to
cub::BlockRadixRank
to usecub::BlockScan
, which improves performance on SM3x by using SHFL. - Allow types other than builtin types to be used in
cub::WarpScan::*Sum
methods if they only haveoperator+
overloaded. Previously they also required to support assignment fromint(0)
. - Update
cub::BlockReduce
'sBLOCK_REDUCE_WARP_REDUCTIONS
algorithm to work even when block size is not an even multiple of warp size. - Refactoring of
cub::DeviceAllocator
interface andcub::CachingDeviceAllocator
implementation.
CUB 0.9.2
Summary
CUB 0.9.2 adds cub::WarpReduce
.
New Features
cub::WarpReduce
, which uses the SHFL instruction when applicable.cub::BlockReduce
now uses thiscub::WarpReduce
instead of implementing its own.
Enhancements
- Documentation updates and corrections.
Bug Fixes
- Fixes for 64-bit Linux compilation warnings and errors.
CUB 0.9.1
Summary
CUB 0.9.1 is a minor release.
Bug Fixes
- Fix for ambiguity in
cub::BlockScan::Reduce
between generic reduction and summation. Summation entrypoints are now called::Sum()
, similar to the convention incub::BlockScan
. - Small edits to documentation and download tracking.
CUB 0.9.0
Summary
Initial preview release. CUB is the first durable, high-performance library of cooperative block-level, warp-level, and thread-level primitives for CUDA kernel programming.