Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements to pdep logging #1765

Merged
merged 1 commit into from
Oct 21, 2019
Merged

Performance improvements to pdep logging #1765

merged 1 commit into from
Oct 21, 2019

Conversation

mliu49
Copy link
Contributor

@mliu49 mliu49 commented Oct 12, 2019

Motivation or Problem

This PR brings substantial performance improvements to RMG pressure dependence jobs by cutting down unnecessary costs associated with logging.

Description of Changes

The key change is related to a subtlety in using the logging module. We commonly call logging with statements such as

logging.debug('Something happened to {0}'.format(object))

When executing this, Python will first perform the string formatting and pass the result to logging.debug. Then, regardless of whether or not the logging level requires this to be printed to the log file, the string formatting will have been performed.

However, the intended usage of logging is by directly providing the arguments to be formatted:

logging.debug('Something happened to %', object)

Note that sprintf-style formatting is required. In this case, the logging module will perform the string formatting only if the logging level is such that this statement needs to be logged.

In the rmgpy.pdep module, there were a number of logging.debug calls which were formatting Network objects and numpy arrays which took a substantial amount of time to convert to strings.

This PR changes the calls which were taking the most time to use the more efficient syntax. Additionally, the logging levels for

Warning: For path reaction C[CH]C1CC1(1088) <=> CC1[CH]CC1(390):
Warning:     Expected kf(370.585 K) = 1.65613e-10
Warning:       Actual kf(370.585 K) = 1.27673e-10
Warning:     Expected Keq(370.585 K) = 3.0015
Warning:       Actual Keq(370.585 K) = 0
Warning: Significant corrections to k(E) to be consistent with high-pressure limit for path reaction "C[CH]C1CC1(1088) <=> CC1[CH]CC1(390)".

were adjusted to clean up RMG.log a bit. The "Significant corrections" line was kept at warning while the others were changed to debug.

Testing

I tested this on a dodecane pyrolysis model which was set to terminate at 100 species. The model took 58 minutes to complete on master and 22 minutes on this branch, or ~2.6x faster. The runtime for update_unimolecular_reaction_networks was reduced by ~5x.

Reviewer Tips

Try a pdep job and compare performance to master.

@codecov
Copy link

codecov bot commented Oct 12, 2019

Codecov Report

Merging #1765 into master will decrease coverage by <.01%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #1765      +/-   ##
=========================================
- Coverage    32.6%   32.6%   -0.01%     
=========================================
  Files          87      87              
  Lines       26115   26116       +1     
  Branches     6875    6876       +1     
=========================================
  Hits         8516    8516              
+ Misses      16641   16630      -11     
- Partials      958     970      +12
Impacted Files Coverage Δ
rmgpy/pdep/network.py 12.45% <0%> (-0.03%) ⬇️
rmgpy/data/statmech.py 42.2% <0%> (ø) ⬆️
rmgpy/rmg/pdep.py 12.21% <0%> (ø) ⬆️
rmgpy/reaction.py 0% <0%> (ø) ⬆️
rmgpy/data/kinetics/family.py 48.28% <0%> (ø) ⬆️
rmgpy/yml.py 15.71% <0%> (ø) ⬆️
rmgpy/data/kinetics/database.py 50.61% <0%> (ø) ⬆️
rmgpy/molecule/molecule.py 0% <0%> (ø) ⬆️
rmgpy/statmech/ndTorsions.py 59.78% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 447ff5e...851ba94. Read the comment docs.

@rwest
Copy link
Member

rwest commented Oct 12, 2019

Nice optimization!
Obvious in hindsight, clearly not adding complexity, and significant time saving. Well done figuring it out.

@mjohnson541 mjohnson541 requested review from mjohnson541 and removed request for alongd October 15, 2019 17:47
Instead of pre-formatting strings provided to the logging module,
use sprintf style formatting and provide arguments directly.

This way, the logging module can decide whether or not to
format the strings, which can take a significant amount of time.

Also, this converts a few statements from warning to debug
in order to clean up the log file for pdep jobs.
Copy link
Contributor

@kspieks kspieks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@mbprend mbprend self-assigned this Oct 21, 2019
@mbprend
Copy link
Contributor

mbprend commented Oct 21, 2019

Hello @mliu49 and @kspieks,

I can work on this and push some commits. Would you like me to fork from your branch?

@mliu49 mliu49 merged commit ceadaf8 into master Oct 21, 2019
@mliu49 mliu49 deleted the logging branch October 21, 2019 20:28
@mliu49 mliu49 mentioned this pull request Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants