Skip to content

ECP Milestone 1.7: Kokkos contribution

Attila Cangi edited this page Sep 11, 2017 · 47 revisions

Introduction

In this report we develop and port two QMCPACK kernels of the miniqmc miniapp (spline and Jastrow evaluations) to the Kokkos C++ programming model, and at least one CPU and one GPU based architecture. We make an initial assessment of:

  1. Capability (expressiveness, ability to achieve required levels of parallelization)
  2. Performance (achieved time to solution on multiple architectures)
  3. Portability (extent of modification required for performance on multiple architectures)
  4. Suitability (deviation from desired application abstractions required to port or to obtain performance, complexity, difficulty of learning curve)
  5. Support (compiler, tools, documentation)

Desired improvements and extensions to improve the objective (1-3) and subjective (4,5) measures will be identified and communicated with the Kokkos team.

Implementation

We implement the Kokkos C++ programming model in the spline and Jastrow kernels in order to achieve portability on multiple architectures.

Spline kernel

Each Monte Carlo step requires the evaluation of single particle orbitals (SPOs). The miniqmc miniapp uses a 3D tricubic B-spline based orbital representation for the evaluation of the SPOs. The 3D tricubic B-spline basis is a highly efficient representation for SPOs requiring only 64 elements at any given point in space. The Kokkos implementation of the spline kernel requires modifications to the data structures and expression of parallelism in the mini-app.

Data structures

The spline kernel evaluates the wavefunction, its gradient (a 3D vector containing the first derivative with respect to particle positions) and its hessian (a 3D tensor containing the second derivative with respect to particle positions). We convert several data structures related to these into Views which is the Kokkos abstraction for multidimensional arrays with template parameters controlling in which memory space (host or device) the data resides and which data layout is imposed.

Parallel execution

Jastrow kernel

Assessment

We assess our Kokkos implementation of the spline and Jastrow kernels on the following architectures.

  • Many-core architecture: Intel Xeon Phi (KNL)
  • GPU: Nvidia K40

Spline kernel

Jastrow kernel

Conclusions and Future Work

Acknowledgements

This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations—the Office of Science and the National Nuclear Security Administration—responsible for the planning and preparation of a capable exascale ecosystem—including software, applications, hardware, advanced system engineering, and early testbed platforms—to support the nation’s exascale computing imperative.

Clone this wiki locally