Add performance profiling tool to enable programmatic/interactive/automatic PT-XLA profile capturing #3498

thisisalbertliang · 2022-04-13T19:21:06Z

Resolves #3471

Pull Request Summary

This PR adds a performance profiling utility script (capture_profile.py) that enables interactive and/or automatic PT-XLA profile capturing from the command line. The motivation behind capture_profile.py is to provide PT-XLA users with an easy and simple way to re-use torch_xla.debug.profiler.trace for tracing their training jobs.

How to Run

Instructions on how to run capture_profile.py are already clearly documented in the script itself.

Here are some example run commands

python3 capture_profile.py --service_addr "localhost:9001" --logdir "gs://path/to/logdir" --duration_ms 20000 --interactive loop

python3 capture_profile.py --service_addr "10.0.0.2:9001" --logdir "gs://path/to/logdir" --duration_ms 30000 --automatic 100 60

For Googlers

See b/226973507 for more details.

…omatic PT-XLA profile capturing

miladm · 2022-04-13T19:50:31Z

README.md

@@ -338,6 +338,7 @@ With PyTorch/XLA we provide a set of performance profiling tooling and auto-metr
 * [Official tutorial](https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm)
 * [Colab notebook](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/pytorch-xla-profiling-colab.ipynb)
 * [Sample MNIST training script with profiling](https://github.com/pytorch/xla/blob/master/test/test_profile_mp_mnist.py)
+* [Utility script for capturing performance profiles](https://github.com/pytorch/xla/blob/master/scripts/capture_profile.py)


Please add a reference on how to load the profiled files on tensorboard.

I suggest we port some of the comments in this README to PyTorch/XLA. I wonder if the README is the best place. Alternatively, we can consider another documentation guide on "Profiling" in PyTorch/XLA.

@JackCaoG wdyt?

@miladm

I am not sure where to add the tensorboard instructions. It seems like the Official tutorial and Colab notebook links in the README already teaches how to use tensorboard anyway?

For now, I have added instructions on how to run tensorboard in the module docstring for capture_profile.py. Let me know if you think there is a better location for these instructions.

"""A utility script for capturing PyTorch/XLA performance profiles interactively and/or automatically Example run commands: $ python3 capture_profile.py --service_addr "localhost:9001" --logdir "gs://path/to/logdir" --duration_ms 20000 --interactive loop $ python3 capture_profile.py --service_addr "10.0.0.2:9001" --logdir "gs://path/to/logdir" --duration_ms 30000 --automatic 100 60 Once you have captured & saved the performance profiles, you can view them using Tensorboard. Example commands to launch the Tensorboard server: $ (vm) tensorboard --logdir "gs://path/to/logdir --port 8001" $ tensorboard --logdir "/local/path/to/logdir --port 8001" After that, visit http://localhost:8001/#profile on your machine to view the performance profile in Tensorboard. """

…pture_profile.py

miladm

Thanks @thisisalbertliang

LGTM after tests pass. Need linter update.

thisisalbertliang · 2022-04-13T23:02:55Z

@miladm

I just reformatted capture_profile.py using yapf. The linter test is passing now.

Let me know if we should merge this PR now.

miladm · 2022-04-14T00:43:18Z

Great. Thanks @thisisalbertliang.
All tests need to pass before merging - even though your changes are in python.
I think rerunning the build or pushing an empty commit triggers the the process.

JackCaoG · 2022-06-28T04:58:40Z

@miladm found this pr while writing 1.12 release note. We should add this to the TroubleShotting doc

thisisalbertliang · 2022-06-28T13:47:10Z

@miladm found this pr while writing 1.12 release note. We should add this to the TroubleShotting doc

thanks @JackCaoG . Just made a quick PR to add capture_profile.py to TROUBLESHOOTING.md. lmk if this what you wanted.

PR link: #3670

thisisalbertliang added 2 commits April 13, 2022 14:44

Add performance profiling tool to enable programmatic/interactive/aut…

8ff4843

…omatic PT-XLA profile capturing

Update README to document capture_profile.py

5d84c00

thisisalbertliang added the enhancement New feature or request label Apr 13, 2022

thisisalbertliang requested review from miladm, yeounoh, will-cromar and JackCaoG April 13, 2022 19:21

thisisalbertliang self-assigned this Apr 13, 2022

Add module docstring to capture_profile.py

a6afa55

miladm reviewed Apr 13, 2022

View reviewed changes

Add instructions on how to run tensorboard in module docstring for ca…

f2ea216

…pture_profile.py

miladm approved these changes Apr 13, 2022

View reviewed changes

Reformat capture_profile.py using yapf

531266d

thisisalbertliang mentioned this pull request Apr 14, 2022

Add PyTorch/XLA performance profiling functionality to the trainer package thisisalbertliang/training#24

Merged

miladm merged commit db0c7e8 into master Apr 14, 2022

miladm deleted the capture_profile branch April 14, 2022 05:59

thisisalbertliang mentioned this pull request Jun 28, 2022

Add capture_profile.py to TROUBLESHOOTING.md #3670

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance profiling tool to enable programmatic/interactive/automatic PT-XLA profile capturing #3498

Add performance profiling tool to enable programmatic/interactive/automatic PT-XLA profile capturing #3498

thisisalbertliang commented Apr 13, 2022

miladm Apr 13, 2022 •

edited

Loading

miladm Apr 13, 2022

thisisalbertliang Apr 13, 2022 •

edited

Loading

miladm left a comment •

edited

Loading

thisisalbertliang commented Apr 13, 2022 •

edited

Loading

miladm commented Apr 14, 2022

JackCaoG commented Jun 28, 2022

thisisalbertliang commented Jun 28, 2022

Add performance profiling tool to enable programmatic/interactive/automatic PT-XLA profile capturing #3498

Add performance profiling tool to enable programmatic/interactive/automatic PT-XLA profile capturing #3498

Conversation

thisisalbertliang commented Apr 13, 2022

Pull Request Summary

How to Run

For Googlers

miladm Apr 13, 2022 • edited Loading

Choose a reason for hiding this comment

miladm Apr 13, 2022

Choose a reason for hiding this comment

thisisalbertliang Apr 13, 2022 • edited Loading

Choose a reason for hiding this comment

miladm left a comment • edited Loading

Choose a reason for hiding this comment

thisisalbertliang commented Apr 13, 2022 • edited Loading

miladm commented Apr 14, 2022

JackCaoG commented Jun 28, 2022

thisisalbertliang commented Jun 28, 2022

miladm Apr 13, 2022 •

edited

Loading

thisisalbertliang Apr 13, 2022 •

edited

Loading

miladm left a comment •

edited

Loading

thisisalbertliang commented Apr 13, 2022 •

edited

Loading