forked from QMCPACK/qmcpack
-
Notifications
You must be signed in to change notification settings - Fork 2
Performance analysis on small problem sizes part 2
Ye Luo edited this page Jul 14, 2020
·
5 revisions
Timing of DMC unified driver with dmc-a32-e384-cpu-Clang-batch-w1344 running on Summit, 7 batches and 8 walkers per batch. On a Summit node, we put 6 MPI ranks each with 7 threads.
DMCBatched::RunSteps 139.6967 1.3720 25 5.587869120
DMCBatched::Hamiltonian 76.8406 0.0103 25 3.073624086
Hamiltonian::ElecElec 4.8134 4.8134 25 0.192537746
Hamiltonian::IonIon 0.0009 0.0009 25 0.000035486
Hamiltonian::Kinetic 0.0048 0.0048 25 0.000191803
Hamiltonian::LocalECP 1.2211 1.2211 25 0.048843880
Hamiltonian::NonLocalECP 70.7901 1.8457 25 2.831604805
ParticleSet::update 31.6759 31.6759 5308 0.005967586
WaveFunction::J1OrbitalSoA_NLratio 3.3165 3.3165 5308 0.000624810
WaveFunction::J2OrbitalSoA_NLratio 26.5605 26.5605 5308 0.005003858
WaveFunction::SlaterDet_NLratio 7.3915 0.1761 5308 0.001392525
DiracDeterminantBase::ratio 0.3734 0.3734 5308 0.000070344
DiracDeterminantBase::spoval 6.8421 0.4537 5308 0.001289007
SplineC2ROMP::offload 6.3883 6.3883 5308 0.001203532
DMCBatched::MovePbyP 59.5932 1.7929 25 2.383727245
ParticleSet::acceptMove 2.3101 2.3101 1840858 0.000001255
ParticleSet::computeNewPosDT 13.6748 13.6748 9600 0.001424464
ParticleSet::donePbyP 3.9055 3.9055 4800 0.000813638
WaveFunction::J1OrbitalSoA_VGL 3.4353 3.4353 19200 0.000178924
WaveFunction::J1OrbitalSoA_accept 0.3011 0.3011 9625 0.000031284
WaveFunction::J2OrbitalSoA_VGL 9.7676 9.7676 19200 0.000508727
WaveFunction::J2OrbitalSoA_accept 12.5381 12.5381 9625 0.001302659
WaveFunction::SlaterDet_VGL 5.0797 0.2779 19200 0.000264568
DiracDeterminantBase::ratio 1.2942 1.2942 9600 0.000134814
DiracDeterminantBase::spovgl 3.5076 0.2003 9600 0.000365374
SplineC2ROMP::offload 3.3073 3.3073 9600 0.000344506
WaveFunction::SlaterDet_accept 6.7881 0.2510 9625 0.000705258
DiracDeterminantBase::update 1.1774 1.1774 9650 0.000122007
DiracDeterminantBatched::D2H 5.3597 5.3597 50 0.107194562
ParticleSet::update and WaveFunction::J2OrbitalSoA are taking the 60% of the time, 84.4 seconds of totally 139.7 seconds