Remove unnecessary duplicate check #1824

kspieks · 2019-11-20T23:18:56Z

These lines checked if reactions were marked as 'Duplicate' when loading in a previously saved seed that was automatically generated by RMG during a previous run.

Motivation or Problem

Attempts to load in a previous seed from a large mechanism never finished loading and could never start the first iteration.

By large, think PDD pyrolysis with 100,000 + reactions, each with large molecules. The seed would load in, but then each reaction was checked against each other to make sure there were no duplicates. Performing subgraph isomorphism checks on EACH molecule would not complete within 24 hours, so in practice, I could never actually restart a job from a seed because it took so long to get started. The check is also redundant since the duplicate flags are preserved when loading in the previous seed

Description of Changes

Deleted the redundant `Duplicate' check from library.py. If the seed was automatically generated by a previous RMG run, all reactions were already checked for duplicates.

Testing

All tests were done with minimal example. saveSeedModulus=10 was used. The seed from iteration number 20 was restarted. The relevant files were diff'ed. Diff results are the same whether the check for duplicates lines are included or deleted, which gives me confidence that deleting these lines will not affect mechanism generation.

Check reactions.py upon first loading in
• Core: diff previous_restart/restart/reactions.py ../minimal/previous_seeds/iteration_number_20/seed/reactions.py yields nothing
• Edge: diff previous_restart/restart_edge/reactions.py ../minimal/previous_seeds/iteration_number_20/seed_edge/reactions.py yields nothing

Check reaction.py after completing the iteration
• Core: diff seed/seed/reactions.py ../minimal/seed/seed/reactions.py yields white spacing differences
• Edge: diff seed/seed_edge/reactions.py ../minimal/seed/seed_edge/reactions.py yields white spacing differences

Check Chemkin file
• Core: diff chemkin/chem_annotated.inp ../minimal/chemkin/chem_annotated.inp This yields some differences, but it looks like it's just which number is assigned to a molecule. All numerical values look identical
• Edge: diff chemkin/chem_edge_annotated.inp ../minimal/chemkin/chem_edge_annotated.inp Again seems to yield differences in numbering of molecules, but all numerical values look identical

Reviewer Tips

Suggestions for verifying that this PR works or other notes for the reviewer.

These lines checked if reactions were marked as 'Duplicate' when loading in a previously saved seed that was automatically generated by RMG during a previous run.

codecov · 2019-11-21T00:50:31Z

Codecov Report

Merging #1824 into master will decrease coverage by 0.92%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1824      +/-   ##
==========================================
- Coverage   43.02%   42.09%   -0.93%     
==========================================
  Files          80       88       +8     
  Lines       21099    22431    +1332     
  Branches     5516     5884     +368     
==========================================
+ Hits         9077     9442     +365     
- Misses      11004    11897     +893     
- Partials     1018     1092      +74

Impacted Files	Coverage Δ
rmgpy/data/kinetics/library.py	`41.03% <100%> (-1.13%)`	⬇️
rmgpy/data/kinetics/common.py	`69.03% <0%> (-1.02%)`	⬇️
rmgpy/data/statmech.py	`42.2% <0%> (ø)`	⬆️
rmgpy/rmg/pdep.py	`12.21% <0%> (ø)`	⬆️
rmgpy/data/kinetics/family.py	`48.35% <0%> (ø)`	⬆️
arkane/kinetics.py	`12.14% <0%> (ø)`	⬆️
rmgpy/yml.py	`15.71% <0%> (ø)`	⬆️
rmgpy/data/kinetics/database.py	`50.61% <0%> (ø)`	⬆️
arkane/sensitivity.py	`10% <0%> (ø)`	⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0a78cf...cf28afc. Read the comment docs.

amarkpayne

I verified that the duplicate flags are there, and the code looks good. I think this PR is good to go

amarkpayne · 2019-12-05T19:54:15Z

Thanks @kspieks for the PR!

Remove unnecessary duplicate check

cf28afc

These lines checked if reactions were marked as 'Duplicate' when loading in a previously saved seed that was automatically generated by RMG during a previous run.

kspieks requested review from mliu49 and amarkpayne November 20, 2019 23:18

kspieks self-assigned this Nov 20, 2019

mliu49 requested a review from mjohnson541 November 26, 2019 17:15

amarkpayne approved these changes Dec 5, 2019

View reviewed changes

amarkpayne merged commit 9a46923 into master Dec 5, 2019

amarkpayne deleted the remove_unnecessary_duplicate_check branch December 5, 2019 19:53

This was referenced Dec 13, 2019

RMG v3.0.0 Release Planning #1830

Closed

RMG-Py v3.0.0 Release #1852

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary duplicate check #1824

Remove unnecessary duplicate check #1824

kspieks commented Nov 20, 2019

codecov bot commented Nov 21, 2019 •

edited

Loading

amarkpayne left a comment

amarkpayne commented Dec 5, 2019

Remove unnecessary duplicate check #1824

Remove unnecessary duplicate check #1824

Conversation

kspieks commented Nov 20, 2019

Motivation or Problem

Description of Changes

Testing

Reviewer Tips

codecov bot commented Nov 21, 2019 • edited Loading

Codecov Report

amarkpayne left a comment

Choose a reason for hiding this comment

amarkpayne commented Dec 5, 2019

codecov bot commented Nov 21, 2019 •

edited

Loading