You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 2, 2025. It is now read-only.
Hpcrun seems to add a high overhead on Blue Gene. Master adds more
than 2x for the openmp solve phase in amg2006. The ompt-tr4 branch
with llvm libomp runtime adds even more.
This is with AMG 2006 on mira/cetus at ANL, 8 nodes, 8 MPI ranks,
16 openmp threads, problem size (-r) 16,16,16. AMG compiled with gnu,
flags '-g -O2', run with WALLCLOCK at 8500 (118 samples/sec).
AMG 2006 native, no toolkit.
wall clock time = 13.350482 seconds
wall clock time = 205.818907 seconds
wall clock time = 16.934752 seconds
Toolkit master, regular libgomp.
wall clock time = 31.799200 seconds
wall clock time = 241.473654 seconds
wall clock time = 43.120992 seconds
Branch ompt-tr4 with llvm libomp runtime and OMP_IDLE.
wall clock time = 35.795240 seconds
wall clock time = 247.430433 seconds
wall clock time = 72.394108 seconds
That's about 2.5x for phases 1 and 3 with master and over 4x for ompt.