-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate with Tracy profiler #267
Conversation
Check-perf-impact results: (2908f97f836fd2def14c3429cd4d61ac) ❓ No new benchmark data submitted. ❓ |
Check-perf-impact results: (5219aaa3cd4bf66c40181c2c7f113381) ✔️ No significant performance change in the microbenchmark set. You are good to go! Relative execution time per category: (mean of relative medians)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
I've bumped Tracy to a post-release version and since wolfpld/tracy#854 was merged. This allows us to explicitly specify the order of threads within the profiler view. We might want to add documentation on how to run a Celerity application against Tracy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff.
Some inline notes, also, this PR should update CHANGELOG.md
!
I've also added |
816c6ea
to
8493e15
Compare
Pull Request Test Coverage Report for Build 10325120468Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
788418c
to
111d2b6
Compare
Check-perf-impact results: (5b2139f73fb4c21b4bcab6e559cd8c5a)
Relative execution time per category: (mean of relative medians)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, ship it!
This integrates the runtime (particularly scheduler and executor) with Tracy version 0.11 for profiling support. This will allow us to optimize the runtime, and users to find what kernels and other operations their application is spending the most time in.
To compile Celerity with Tracy support, pass
-DCELERITY_TRACY_SUPPORT=ON
(defaultOFF
) to CMake. To enable tracing at runtime, populate the environment variableCELERITY_TRACY=off|fast|full
(defaultoff
).fast
emits the same zones asfull
, but avoids attaching debug information that takes measurable time to stringify inside Celerity. Leaving the runtime setting tooff
should have no visible performance impact.Inside Celerity, tracing is made available through the
CELERITY_DETAIL_TRACY_*
macros from tracy.h, which expand to Tracy code when support is enabled, or to an empty token list when disabled. Insidelive_executor.cc
, additional context is maintained to track the asynchronous completion of instructions as Fibers. Also, the arguments formerly passed to theCELERITY_TRACE
macro can be captured and copied into zone text to avoid duplicating the debug stringification.The PR performs four additional, minor changes to the runtime:
host_config
(an MPI dependency) out ofconfig
in order to parse environment and enable / disable Tracy before the call toMPI_Init
.CELERITY_TRACY=full
estimated_global_memory_traffic_bytes
attribute todevice_kernel_instruction
. It estimates the SM <-> DRAM traffic inside a kernel from accessor data types and range mappers, which allows us to print a throughput estimate on device kernel zones, which is relevant for weak-scaling problems.backend::init
, which triggers any potential backend device initialization on executor thread startup, which would otherwise delay (and distort the execution time of) the first instructions executed per device.In CI, Tracy support is enabled in (and only in) the SimSYCL release build, so we do get notified about compile errors but not much else. Let me know if you can come up with a better automation setup.
Please experiment with this integration a bit and feel free to push fixups for zone texts and the somewhat arbitrary zone "color scheme" I have come up with.