Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced print statement frequency for exceeded max number of objects.… #1755

Merged
merged 1 commit into from
Oct 10, 2019

Conversation

kspieks
Copy link
Contributor

@kspieks kspieks commented Oct 8, 2019

… It used to print 'Exceeded max number of objects...removing excess objects' during each iteration. Now, using the boolean variable invalid_objects_print_boolean, it only prints this on the first iteration that the if statement is triggered.

Motivation or Problem

Richard West opened an issue on Jun 18 saying that, "The log file sometimes contains hundreds or thousands of consecutive lines like this:
Exceeded max number of objects...removing excess objects
Exceeded max number of objects...removing excess objects
Exceeded max number of objects...removing excess objects
"
Both Max Liu and Matt Johnson agreed that this was too verbose. Instead, this message should only be printed once, rather than each time during the loop. Although this original feature would help in telling how the object cap is impacting the run, it was concluded that most users are not interested in analyzing that and it's better to only print it once.

Description of Changes

To make this message only be printed during the first time the number of return objects exceeds the object cap, a new boolean variable was introduced before the while loop. The variable is called invalid_objects_print_boolean. It is set to True before the while loop and changed to False after the message is printed once. Feel free to change the variable name if it is too long haha

Other alternatives that could have been implemented are:

  1. Delete the line entirely or comment it out so it never prints. This was deemed too sloppy
  2. Continue printing the line during each iteration, but move the cursor to the front of the line so it re-prints over the message. This would prevent the message from taking up so many lines since it would always be printed on the same line

Testing

I tried running eg1, propane_branching, and liquid_phase_constSPC. Each of these examples seems to print the message only once before the change. I have not found an example that would trigger this print message multiple times to verify that this change is successful. Let me know if there is an input file I can use to test. Otherwise, hopefully this change is straightforward enough to be understood without testing

Reviewer Tips

Suggestions for verifying that this PR works or other notes for the reviewer.

… It used to print this line during each iteration. Now, using the boolean invalid_objects_print_boolean, it only prints this on the first iteration that the if statement is triggered
@codecov
Copy link

codecov bot commented Oct 8, 2019

Codecov Report

Merging #1755 into master will decrease coverage by 0.86%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1755      +/-   ##
==========================================
- Coverage   32.61%   31.75%   -0.87%     
==========================================
  Files          87       87              
  Lines       26103    26103              
  Branches     6869     6869              
==========================================
- Hits         8514     8289     -225     
- Misses      16631    16878     +247     
+ Partials      958      936      -22
Impacted Files Coverage Δ
rmgpy/qm/molecule.py 24.77% <0%> (-55.41%) ⬇️
rmgpy/qm/mopac.py 27.05% <0%> (-42.95%) ⬇️
rmgpy/qm/qmdata.py 55.88% <0%> (-29.42%) ⬇️
rmgpy/qm/main.py 50% <0%> (-15.33%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29e111d...4db605b. Read the comment docs.

@mliu49
Copy link
Contributor

mliu49 commented Oct 9, 2019

Thank you for your contribution! I think your implementation of this fix looks great.

As a minor note, we generally recommend pushing feature branches to the ReactionMechanismGenerator repository and making pull requests from here, rather than from personal forks. The main reason is to facilitate collaboration on branches. Also, RMG-tests builds are triggered by Travis push builds for this repo. There isn't any need to recreate this PR, but it would be best for future PRs.


These changes also prompted me to look into exactly how we handle invalid objects, and I have some questions/concerns which I think we should look deeper into. Perhaps @mjohnson541 can comment here.

The current behavior seems to be that on each time step, we will build a list of new_objects which can be species, reactions, or pdep networks in that order. We will then check the list of invalid_objects, which is carried over from time step to time step, and if the number of invalid objects is less than max_num_objs_per_iter, we will add objects from new_objects to invalid_objects up to the limit of max_num_objs_per_iter.

  1. Major concern: Why do we keep building the new_objects list after invalid_objects is full? Evaluating all of the selection criteria is not free, so it seems like there's no point in compiling new_objects if we're just going to throw them out. As demonstrated by the issue this PR fixes, this is not a trivial number of objects.
  2. Smaller question: The new_objects list is compiled by getting all species which exceed the move to core tolerance, followed by reactions selected by branching, followed by pdep networks. If species are enough to reach max_num_objs_per_iter, then reactions or pdep networks may not get considered on that iteration. This is not a simple question, but should there be some other criteria by which these are chosen besides the order that the code is written in, or is this already an ideal order?

@kspieks
Copy link
Contributor Author

kspieks commented Oct 9, 2019 via email

@mjohnson541
Copy link
Contributor

@kspieks Yep!

@mliu49 so the trick is that the code need reconfigured to do that. Creating the object's list has negligible computational cost, but evaluating the edge fluxes does not. However as currently configured the solver function computes the edge fluxes as a part of the function evaluations.

I have a branch (checkEdge) where I reformatted the function calls to make solver evaluations and edge checks independent operations, on that branch this is possible. On that branch you could simply tell it to stop evaluating edge fluxes when you haven't hit interrupt tolerance. For pruning runs my guess is it might give you as much as a 30% overall solver time reduction.

@mliu49 mliu49 merged commit e2c8931 into ReactionMechanismGenerator:master Oct 10, 2019
@hwpang
Copy link
Contributor

hwpang commented Oct 12, 2019

@kspieks Yep!

@mliu49 so the trick is that the code need reconfigured to do that. Creating the object's list has negligible computational cost, but evaluating the edge fluxes does not. However as currently configured the solver function computes the edge fluxes as a part of the function evaluations.

I have a branch (checkEdge) where I reformatted the function calls to make solver evaluations and edge checks independent operations, on that branch this is possible. On that branch you could simply tell it to stop evaluating edge fluxes when you haven't hit interrupt tolerance. For pruning runs my guess is it might give you as much as a 30% overall solver time reduction.

This might not be related to this PR, but would you mind providing more context on why does RMG use invalid_objects and move new_objects to this object list? :-D

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants