Remove unnecessary duplicate check #1824
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These lines checked if reactions were marked as 'Duplicate' when loading in a previously saved seed that was automatically generated by RMG during a previous run.
Motivation or Problem
Attempts to load in a previous seed from a large mechanism never finished loading and could never start the first iteration.
By large, think PDD pyrolysis with 100,000 + reactions, each with large molecules. The seed would load in, but then each reaction was checked against each other to make sure there were no duplicates. Performing subgraph isomorphism checks on EACH molecule would not complete within 24 hours, so in practice, I could never actually restart a job from a seed because it took so long to get started. The check is also redundant since the duplicate flags are preserved when loading in the previous seed
Description of Changes
Deleted the redundant `Duplicate' check from library.py. If the seed was automatically generated by a previous RMG run, all reactions were already checked for duplicates.
Testing
All tests were done with minimal example. saveSeedModulus=10 was used. The seed from iteration number 20 was restarted. The relevant files were diff'ed. Diff results are the same whether the check for duplicates lines are included or deleted, which gives me confidence that deleting these lines will not affect mechanism generation.
Check reactions.py upon first loading in
• Core:
diff previous_restart/restart/reactions.py ../minimal/previous_seeds/iteration_number_20/seed/reactions.py
yields nothing• Edge:
diff previous_restart/restart_edge/reactions.py ../minimal/previous_seeds/iteration_number_20/seed_edge/reactions.py
yields nothingCheck reaction.py after completing the iteration
• Core:
diff seed/seed/reactions.py ../minimal/seed/seed/reactions.py
yields white spacing differences• Edge:
diff seed/seed_edge/reactions.py ../minimal/seed/seed_edge/reactions.py
yields white spacing differencesCheck Chemkin file
• Core:
diff chemkin/chem_annotated.inp ../minimal/chemkin/chem_annotated.inp
This yields some differences, but it looks like it's just which number is assigned to a molecule. All numerical values look identical• Edge:
diff chemkin/chem_edge_annotated.inp ../minimal/chemkin/chem_edge_annotated.inp
Again seems to yield differences in numbering of molecules, but all numerical values look identicalReviewer Tips
Suggestions for verifying that this PR works or other notes for the reviewer.