Release CUTLASS 2.3 · NVIDIA/cutlass

CUTLASS 2.3

NVIDIA Ampere Architecture features
- Sparse Tensor Core GEMM kernels:
  - Direct access to Sparse Tensor Cores and maximum performance via mma.sp.sync
- Fast SGEMM targeting GeForce RTX 30-series CUDA Cores
Minor Features:
- Activation functions such as GeLU and Sigmoid
- Small matrix and quaternion template classes in device code
- Floating-point constants
NVIDIA Ampere GPU Architecture examples and documentation:
- Tensor Float 32 and
- Sparse Tensor Cores
- Documentation added on CUTLASS efficient row-major epilogue

Provide feedback