-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements to pdep logging #1765
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1765 +/- ##
=========================================
- Coverage 32.6% 32.6% -0.01%
=========================================
Files 87 87
Lines 26115 26116 +1
Branches 6875 6876 +1
=========================================
Hits 8516 8516
+ Misses 16641 16630 -11
- Partials 958 970 +12
Continue to review full report at Codecov.
|
Nice optimization! |
Instead of pre-formatting strings provided to the logging module, use sprintf style formatting and provide arguments directly. This way, the logging module can decide whether or not to format the strings, which can take a significant amount of time. Also, this converts a few statements from warning to debug in order to clean up the log file for pdep jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Motivation or Problem
This PR brings substantial performance improvements to RMG pressure dependence jobs by cutting down unnecessary costs associated with logging.
Description of Changes
The key change is related to a subtlety in using the logging module. We commonly call logging with statements such as
When executing this, Python will first perform the string formatting and pass the result to
logging.debug
. Then, regardless of whether or not the logging level requires this to be printed to the log file, the string formatting will have been performed.However, the intended usage of logging is by directly providing the arguments to be formatted:
Note that sprintf-style formatting is required. In this case, the logging module will perform the string formatting only if the logging level is such that this statement needs to be logged.
In the
rmgpy.pdep
module, there were a number oflogging.debug
calls which were formatting Network objects and numpy arrays which took a substantial amount of time to convert to strings.This PR changes the calls which were taking the most time to use the more efficient syntax. Additionally, the logging levels for
were adjusted to clean up RMG.log a bit. The "Significant corrections" line was kept at warning while the others were changed to debug.
Testing
I tested this on a dodecane pyrolysis model which was set to terminate at 100 species. The model took 58 minutes to complete on master and 22 minutes on this branch, or ~2.6x faster. The runtime for
update_unimolecular_reaction_networks
was reduced by ~5x.Reviewer Tips
Try a pdep job and compare performance to master.