Reduced print statement frequency for exceeded max number of objects.… #1755

kspieks · 2019-10-08T21:45:59Z

… It used to print 'Exceeded max number of objects...removing excess objects' during each iteration. Now, using the boolean variable invalid_objects_print_boolean, it only prints this on the first iteration that the if statement is triggered.

Motivation or Problem

Richard West opened an issue on Jun 18 saying that, "The log file sometimes contains hundreds or thousands of consecutive lines like this:
Exceeded max number of objects...removing excess objects
Exceeded max number of objects...removing excess objects
Exceeded max number of objects...removing excess objects
"
Both Max Liu and Matt Johnson agreed that this was too verbose. Instead, this message should only be printed once, rather than each time during the loop. Although this original feature would help in telling how the object cap is impacting the run, it was concluded that most users are not interested in analyzing that and it's better to only print it once.

Description of Changes

To make this message only be printed during the first time the number of return objects exceeds the object cap, a new boolean variable was introduced before the while loop. The variable is called invalid_objects_print_boolean. It is set to True before the while loop and changed to False after the message is printed once. Feel free to change the variable name if it is too long haha

Other alternatives that could have been implemented are:

Delete the line entirely or comment it out so it never prints. This was deemed too sloppy
Continue printing the line during each iteration, but move the cursor to the front of the line so it re-prints over the message. This would prevent the message from taking up so many lines since it would always be printed on the same line

Testing

I tried running eg1, propane_branching, and liquid_phase_constSPC. Each of these examples seems to print the message only once before the change. I have not found an example that would trigger this print message multiple times to verify that this change is successful. Let me know if there is an input file I can use to test. Otherwise, hopefully this change is straightforward enough to be understood without testing

Reviewer Tips

Suggestions for verifying that this PR works or other notes for the reviewer.

… It used to print this line during each iteration. Now, using the boolean invalid_objects_print_boolean, it only prints this on the first iteration that the if statement is triggered

codecov · 2019-10-08T22:18:39Z

Codecov Report

Merging #1755 into master will decrease coverage by 0.86%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1755      +/-   ##
==========================================
- Coverage   32.61%   31.75%   -0.87%     
==========================================
  Files          87       87              
  Lines       26103    26103              
  Branches     6869     6869              
==========================================
- Hits         8514     8289     -225     
- Misses      16631    16878     +247     
+ Partials      958      936      -22

Impacted Files	Coverage Δ
rmgpy/qm/molecule.py	`24.77% <0%> (-55.41%)`	⬇️
rmgpy/qm/mopac.py	`27.05% <0%> (-42.95%)`	⬇️
rmgpy/qm/qmdata.py	`55.88% <0%> (-29.42%)`	⬇️
rmgpy/qm/main.py	`50% <0%> (-15.33%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29e111d...4db605b. Read the comment docs.

mliu49 · 2019-10-09T22:28:27Z

Thank you for your contribution! I think your implementation of this fix looks great.

As a minor note, we generally recommend pushing feature branches to the ReactionMechanismGenerator repository and making pull requests from here, rather than from personal forks. The main reason is to facilitate collaboration on branches. Also, RMG-tests builds are triggered by Travis push builds for this repo. There isn't any need to recreate this PR, but it would be best for future PRs.

These changes also prompted me to look into exactly how we handle invalid objects, and I have some questions/concerns which I think we should look deeper into. Perhaps @mjohnson541 can comment here.

The current behavior seems to be that on each time step, we will build a list of new_objects which can be species, reactions, or pdep networks in that order. We will then check the list of invalid_objects, which is carried over from time step to time step, and if the number of invalid objects is less than max_num_objs_per_iter, we will add objects from new_objects to invalid_objects up to the limit of max_num_objs_per_iter.

Major concern: Why do we keep building the new_objects list after invalid_objects is full? Evaluating all of the selection criteria is not free, so it seems like there's no point in compiling new_objects if we're just going to throw them out. As demonstrated by the issue this PR fixes, this is not a trivial number of objects.
Smaller question: The new_objects list is compiled by getting all species which exceed the move to core tolerance, followed by reactions selected by branching, followed by pdep networks. If species are enough to reach max_num_objs_per_iter, then reactions or pdep networks may not get considered on that iteration. This is not a simple question, but should there be some other criteria by which these are chosen besides the order that the code is written in, or is this already an ideal order?

kspieks · 2019-10-09T22:53:28Z

Yea of course! Thanks for reviewing and definitely tag me if you find other basic issues that I could use to get familiar with RMG :) Ah yes. I was having some permission error while trying to push to the RMG master branch yesterday. Matt resolved that by adding me to the contributor list, but while trying to work around it at the time, I ended up pushing it to the fork on my personal account and making the pull request from there. But yes, in the future I plan to - create a new branch from the master branch of RMG - edit and commit things to this branch and push it to RMG (which the group seems to denote as official). This allows others to pull from this so we can collaborate - eventually create a pull request from this branch and later delete the branch if it is incorporated into master. Basically, never touch the forked copy on my personal account during the overall workflow Let me know if this is the accepted workflow in the group. Best, Kevin

…

On Wed, Oct 9, 2019 at 6:28 PM Max Liu ***@***.***> wrote: Thank you for your contribution! I think your implementation of this fix looks great. As a minor note, we generally recommend pushing feature branches to the ReactionMechanismGenerator repository and making pull requests from here, rather than from personal forks. The main reason is to facilitate collaboration on branches. Also, RMG-tests builds are triggered by Travis push builds for this repo. There isn't any need to recreate this PR, but it would be best for future PRs. ------------------------------ These changes also prompted me to look into exactly how we handle invalid objects, and I have some questions/concerns which I think we should look deeper into. Perhaps @mjohnson541 <https://github.com/mjohnson541> can comment here. The current behavior seems to be that on each time step, we will build a list of new_objects which can be species, reactions, or pdep networks *in that order*. We will then check the list of invalid_objects, which is carried over from time step to time step, and if the number of invalid objects is less than max_num_objs_per_iter, we will add objects from new_objects to invalid_objects up to the limit of max_num_objs_per_iter. 1. Major concern: Why do we keep building the new_objects list after invalid_objects is full? Evaluating all of the selection criteria is not free, so it seems like there's no point in compiling new_objects if we're just going to throw them out. As demonstrated by the issue this PR fixes, this is not a trivial number of objects. 2. Smaller question: The new_objects list is compiled by getting all species which exceed the move to core tolerance, followed by reactions selected by branching, followed by pdep networks. If species are enough to reach max_num_objs_per_iter, then reactions or pdep networks may not get considered on that iteration. This is not a simple question, but should there be some other criteria by which these are chosen besides the order that the code is written in, or is this already an ideal order? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1755?email_source=notifications&email_token=AHNMWNEOOAPYHENN5J6B6KDQNZLI5A5CNFSM4I6XP6K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAZUUCQ#issuecomment-540232202>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHNMWNF45CSNZFI532L2XATQNZLI5ANCNFSM4I6XP6KQ> .

mjohnson541 · 2019-10-09T23:42:52Z

@kspieks Yep!

@mliu49 so the trick is that the code need reconfigured to do that. Creating the object's list has negligible computational cost, but evaluating the edge fluxes does not. However as currently configured the solver function computes the edge fluxes as a part of the function evaluations.

I have a branch (checkEdge) where I reformatted the function calls to make solver evaluations and edge checks independent operations, on that branch this is possible. On that branch you could simply tell it to stop evaluating edge fluxes when you haven't hit interrupt tolerance. For pruning runs my guess is it might give you as much as a 30% overall solver time reduction.

hwpang · 2019-10-12T07:15:34Z

@kspieks Yep!

@mliu49 so the trick is that the code need reconfigured to do that. Creating the object's list has negligible computational cost, but evaluating the edge fluxes does not. However as currently configured the solver function computes the edge fluxes as a part of the function evaluations.

I have a branch (checkEdge) where I reformatted the function calls to make solver evaluations and edge checks independent operations, on that branch this is possible. On that branch you could simply tell it to stop evaluating edge fluxes when you haven't hit interrupt tolerance. For pruning runs my guess is it might give you as much as a 30% overall solver time reduction.

This might not be related to this PR, but would you mind providing more context on why does RMG use invalid_objects and move new_objects to this object list? :-D

Thanks!

Reduced print statement frequency for exceeded max number of objects.…

4db605b

… It used to print this line during each iteration. Now, using the boolean invalid_objects_print_boolean, it only prints this on the first iteration that the if statement is triggered

kspieks added Module: Cython Complexity: Low Module: Logging Good First Issue labels Oct 8, 2019

mliu49 approved these changes Oct 10, 2019

View reviewed changes

mliu49 merged commit e2c8931 into ReactionMechanismGenerator:master Oct 10, 2019

amarkpayne mentioned this pull request Oct 18, 2019

Reduce the frequency of "Exceeded max number of objects...removing excess objects" lines in the log. #1632

Closed

mliu49 mentioned this pull request Nov 26, 2019

RMG v3.0.0 Release Planning #1830

Closed

mliu49 mentioned this pull request Dec 16, 2019

RMG-Py v3.0.0 Release #1852

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduced print statement frequency for exceeded max number of objects.… #1755

Reduced print statement frequency for exceeded max number of objects.… #1755

kspieks commented Oct 8, 2019

codecov bot commented Oct 8, 2019 •

edited

Loading

mliu49 commented Oct 9, 2019

kspieks commented Oct 9, 2019 via email

mjohnson541 commented Oct 9, 2019

hwpang commented Oct 12, 2019

Reduced print statement frequency for exceeded max number of objects.… #1755

Reduced print statement frequency for exceeded max number of objects.… #1755

Conversation

kspieks commented Oct 8, 2019

Motivation or Problem

Description of Changes

Testing

Reviewer Tips

codecov bot commented Oct 8, 2019 • edited Loading

Codecov Report

mliu49 commented Oct 9, 2019

kspieks commented Oct 9, 2019 via email

mjohnson541 commented Oct 9, 2019

hwpang commented Oct 12, 2019

codecov bot commented Oct 8, 2019 •

edited

Loading