Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary duplicate check #1824

Merged
merged 1 commit into from
Dec 5, 2019

Conversation

kspieks
Copy link
Contributor

@kspieks kspieks commented Nov 20, 2019

These lines checked if reactions were marked as 'Duplicate' when loading in a previously saved seed that was automatically generated by RMG during a previous run.

Motivation or Problem

Attempts to load in a previous seed from a large mechanism never finished loading and could never start the first iteration.

By large, think PDD pyrolysis with 100,000 + reactions, each with large molecules. The seed would load in, but then each reaction was checked against each other to make sure there were no duplicates. Performing subgraph isomorphism checks on EACH molecule would not complete within 24 hours, so in practice, I could never actually restart a job from a seed because it took so long to get started. The check is also redundant since the duplicate flags are preserved when loading in the previous seed

Description of Changes

Deleted the redundant `Duplicate' check from library.py. If the seed was automatically generated by a previous RMG run, all reactions were already checked for duplicates.

Testing

All tests were done with minimal example. saveSeedModulus=10 was used. The seed from iteration number 20 was restarted. The relevant files were diff'ed. Diff results are the same whether the check for duplicates lines are included or deleted, which gives me confidence that deleting these lines will not affect mechanism generation.

Check reactions.py upon first loading in
• Core: diff previous_restart/restart/reactions.py ../minimal/previous_seeds/iteration_number_20/seed/reactions.py yields nothing
• Edge: diff previous_restart/restart_edge/reactions.py ../minimal/previous_seeds/iteration_number_20/seed_edge/reactions.py yields nothing

Check reaction.py after completing the iteration
• Core: diff seed/seed/reactions.py ../minimal/seed/seed/reactions.py yields white spacing differences
• Edge: diff seed/seed_edge/reactions.py ../minimal/seed/seed_edge/reactions.py yields white spacing differences

Check Chemkin file
• Core: diff chemkin/chem_annotated.inp ../minimal/chemkin/chem_annotated.inp This yields some differences, but it looks like it's just which number is assigned to a molecule. All numerical values look identical
• Edge: diff chemkin/chem_edge_annotated.inp ../minimal/chemkin/chem_edge_annotated.inp Again seems to yield differences in numbering of molecules, but all numerical values look identical

Reviewer Tips

Suggestions for verifying that this PR works or other notes for the reviewer.

These lines checked if reactions were marked as 'Duplicate' when loading in a previously saved seed that was automatically generated by RMG during a previous run.
@kspieks kspieks self-assigned this Nov 20, 2019
@codecov
Copy link

codecov bot commented Nov 21, 2019

Codecov Report

Merging #1824 into master will decrease coverage by 0.92%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1824      +/-   ##
==========================================
- Coverage   43.02%   42.09%   -0.93%     
==========================================
  Files          80       88       +8     
  Lines       21099    22431    +1332     
  Branches     5516     5884     +368     
==========================================
+ Hits         9077     9442     +365     
- Misses      11004    11897     +893     
- Partials     1018     1092      +74
Impacted Files Coverage Δ
rmgpy/data/kinetics/library.py 41.03% <100%> (-1.13%) ⬇️
rmgpy/data/kinetics/common.py 69.03% <0%> (-1.02%) ⬇️
rmgpy/data/statmech.py 42.2% <0%> (ø) ⬆️
rmgpy/rmg/pdep.py 12.21% <0%> (ø) ⬆️
rmgpy/data/kinetics/family.py 48.35% <0%> (ø) ⬆️
arkane/kinetics.py 12.14% <0%> (ø) ⬆️
rmgpy/yml.py 15.71% <0%> (ø) ⬆️
rmgpy/data/kinetics/database.py 50.61% <0%> (ø) ⬆️
arkane/sensitivity.py 10% <0%> (ø) ⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0a78cf...cf28afc. Read the comment docs.

Copy link
Member

@amarkpayne amarkpayne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that the duplicate flags are there, and the code looks good. I think this PR is good to go

@amarkpayne amarkpayne merged commit 9a46923 into master Dec 5, 2019
@amarkpayne amarkpayne deleted the remove_unnecessary_duplicate_check branch December 5, 2019 19:53
@amarkpayne
Copy link
Member

Thanks @kspieks for the PR!

This was referenced Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants