CUTLASS 2.3
CUTLASS 2.3
- NVIDIA Ampere Architecture features
- Sparse Tensor Core GEMM kernels:
- Direct access to Sparse Tensor Cores and maximum performance via
mma.sp.sync
- Direct access to Sparse Tensor Cores and maximum performance via
- Fast SGEMM targeting GeForce RTX 30-series CUDA Cores
- Sparse Tensor Core GEMM kernels:
- Minor Features:
- Activation functions such as GeLU and Sigmoid
- Small matrix and quaternion template classes in device code
- Floating-point constants
- NVIDIA Ampere GPU Architecture examples and documentation:
- Tensor Float 32 and
- Sparse Tensor Cores
- Documentation added on CUTLASS efficient row-major epilogue