-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add performance profiling tool to enable programmatic/interactive/automatic PT-XLA profile capturing #3498
Conversation
…omatic PT-XLA profile capturing
@@ -338,6 +338,7 @@ With PyTorch/XLA we provide a set of performance profiling tooling and auto-metr | |||
* [Official tutorial](https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm) | |||
* [Colab notebook](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/pytorch-xla-profiling-colab.ipynb) | |||
* [Sample MNIST training script with profiling](https://github.com/pytorch/xla/blob/master/test/test_profile_mp_mnist.py) | |||
* [Utility script for capturing performance profiles](https://github.com/pytorch/xla/blob/master/scripts/capture_profile.py) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a reference on how to load the profiled files on tensorboard
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure where to add the tensorboard instructions. It seems like the Official tutorial and Colab notebook links in the README already teaches how to use tensorboard anyway?
For now, I have added instructions on how to run tensorboard in the module docstring for capture_profile.py
. Let me know if you think there is a better location for these instructions.
"""A utility script for capturing PyTorch/XLA performance profiles interactively and/or automatically
Example run commands:
$ python3 capture_profile.py --service_addr "localhost:9001" --logdir "gs://path/to/logdir" --duration_ms 20000 --interactive loop
$ python3 capture_profile.py --service_addr "10.0.0.2:9001" --logdir "gs://path/to/logdir" --duration_ms 30000 --automatic 100 60
Once you have captured & saved the performance profiles, you can view them using Tensorboard.
Example commands to launch the Tensorboard server:
$ (vm) tensorboard --logdir "gs://path/to/logdir --port 8001"
$ tensorboard --logdir "/local/path/to/logdir --port 8001"
After that, visit http://localhost:8001/#profile on your machine to view the performance profile in Tensorboard.
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @thisisalbertliang
LGTM after tests pass. Need linter update.
I just reformatted Let me know if we should merge this PR now. |
Great. Thanks @thisisalbertliang. |
@miladm found this pr while writing 1.12 release note. We should add this to the TroubleShotting doc |
Resolves #3471
Pull Request Summary
This PR adds a performance profiling utility script (
capture_profile.py
) that enables interactive and/or automatic PT-XLA profile capturing from the command line. The motivation behindcapture_profile.py
is to provide PT-XLA users with an easy and simple way to re-usetorch_xla.debug.profiler.trace
for tracing their training jobs.How to Run
Instructions on how to run
capture_profile.py
are already clearly documented in the script itself.Here are some example run commands
For Googlers
See b/226973507 for more details.