CUDA_kernels PMPP (Programming Massively Parallel Processors) GEMM Convolution Stencil Parallel Histogram Reduction Prefix Sum(Scan) Merge Sorting Sparse Matrix Computation Graph Traversal