-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new feature for profiling training runs #782
new feature for profiling training runs #782
Conversation
Added tests for the simple profiler. Not sure the best way to test the advanced profiler. The current output is string which is somewhat tricky/annoying to parse. There was a recent PR which allowed for getting profile stats as a dict, but that was only merged into Python a couple weeks ago. |
I'm planning to add more documentation regarding usage, but would like to get some input to make sure this is generally an acceptable solution. |
looks promising, just need to have deeper look... :] |
great! I’ll go ahead and work on more documentation this evening |
maybe wait for @williamFalcon first screening opinion before you start with fine pruning... ;] |
|
||
def start(self, action_name): | ||
if action_name in self.current_actions: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider some kind of soft information (just a logger warning) since it is a complementary service...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it some thought and would prefer to keep the exception. For the default profiled actions, this will be tested and we'll ensure that everything works properly (verifying that no tests raise the exception). If the user is profiling additional actions, I would think they would want to know when their code is incorrect. The only time this error is raised is if someone calls profiler.stop()
without calling profiler.start()
or vice versa.
In reality, I don't even think we should document the start()
and stop()
interfaces to the user (since it can quickly get very messy), following the principle of having one way to do things.
Force User Decisions To Best Practices
There are 1,000 ways to do something. However, something eventually becomes standard practice that everyone does. Thus we pick one way of doing it and force everyone to do it this way.
|
||
def describe(self): | ||
def print_row(action, mean, std_dev): | ||
print(f"{action:<20s}\t| {mean:<15}\t| {std_dev:<15}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use logging
""" | ||
|
||
def __init__(self, output_filename=None): | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
''' >> """
I used the LightningTemplateModel to verify that the profiler behaves and outputs as expected. The printed summary could definitely be presented better. Should we keep the reporting in seconds or use something like milliseconds? Any other tips to clean up this summary? Also, we should still print the summary when the user exits early using "Ctrl C" to trigger a KeyboardEscape. Where would I need to make a change to make sure that's the case?
|
Ok, I added a method to the base profiler which we can use when we need to profile calling I also stopped reporting standard deviations for durations and instead report the total time spent per action. I think this will be more informative.
|
@jeremyjordan this is so sick haha. great job! |
@williamFalcon thanks! i'm also excited about the AdvancedProfiler as this may help people who really want to get into the details. i'm envisioning they'll start with Profiler and switch to AdvancedProfiler if they decide they want to try optimizing a step. the output for AdvancedProfiler is something like: (for each action)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, Thx!
import numpy as np | ||
|
||
|
||
def test_simple_profiler(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a test for the advanced profiler
docs/source/profiler.rst
Outdated
.. role:: hidden | ||
:class: hidden-section | ||
|
||
Profiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Performance Profiler" ?
@jeremyjordan amazing job! Let's make sure the docs and tests are thorough here. In addition, I'd love to get a medium post about this. I can co-publish with you if you want, so we can get it on a lot of publications. email me to discuss |
You can make a #publishing channel on slack ;) |
it's good for discovery because we can also publish in towards data science, kdnuggets, etc... |
yeah i'd love to do a blog post on this! i agree that great documentation (including blog posts) is important for user adoption. happy to cross-post on my blog as well. |
@williamFalcon i've added more documentation, a quick test for the AdvancedProfiler, and made the update to the Trainer to allow for the usage of |
@jeremyjordan can we make the import:
|
the profiler is in pytorch_lightning/utilities/profiler.py |
@Borda we need organization in the package. i think we need a collection called profilers because it allows other profilers to enter lightning as well. much like we have loggers and callbacks |
should we move |
yes. i kind of hate utility folders, it usually means we didn’t think hard about organization :) |
awesome! i'll make the change. haha i feel the same way, |
sure, do you want to do it in 0.6.1 or in next release 0.7? |
we can add the organization over time. right now since profilers is new we need to do it now |
Sure, just think about versions since the restructure on utils API would be better in a new release, like 0.7 https://github.com/PyTorchLightning/pytorch-lightning/milestone/5 |
i don’t think we have any utils api right now... the point is to avoid utils. |
Before submitting
What does this PR do?
Fixes #753
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.