Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge model improvements #1649

Merged
merged 4 commits into from
Jul 24, 2019
Merged

Merge model improvements #1649

merged 4 commits into from
Jul 24, 2019

Conversation

goldmanm
Copy link
Contributor

Motivation or Problem

merge_models has some unexpected behavior, like throwing divide by zero errors when empty models are given and renaming the species in a chemkin model

Description of Changes

This PR cleans up some of the workings of the merge_models tool

  1. refactored the execute method to call two separate methods that perform different tasks, which allows for better unittesting
  2. added a unittest to ensure proper behavior
  3. prevents print statements from causing divide by zero errors for models without any reactions :)
  4. minimize reindexing to only those species in conflicts.

Testing

Haha. I ran the new unittest. :) I also ran merge model on minimal and superminimal examples.

Reviewer Tips

Run merge models on two of your favorite RMG generated models :)

@alongd alongd requested a review from mliu49 July 11, 2019 18:40
Copy link
Contributor

@mliu49 mliu49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a couple comments. Could you also rebase this onto master without all of the commits from arkane output fixes?

# ensure no species with same name and index
label_index_dict = {}
for s in final_model.species:
if s.label not in label_index_dict.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be if s.label not in label_index_dict. No need to call keys().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know! Just made the code simpler!

else:
if s.index in label_index_dict[s.label]:
# obtained a duplicate
s.index = max(label_index_dict[s.label]) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have unique indices? In other words, do we want to reindex starting from the largest index in the entire model + 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code isn't guaranteeing unique indexes at all. To do that, we would pretty much have to renumber everything except the first model, which would make this PR much less useful.

From my point of view, if the user starts with an un-indexed mechanism, like aramcomech, the output should also not have indexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit to unique indices that I've overlooked? My main goal for this PR was to not have any CHEMKIN or cantera strings conflict, while keeping those names as similar as possible to the input names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give an example of the functioning in this PR: if model 1 has CH3(15) and model 2 has H2O(15), then neither of their names would be changed, since both the name and index don't match.

I would think for many users, avoiding chemkin or cantera names changing would be preferable to ensuring every species has a unique number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know your thoughts on this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot that we would also need to check for species which already have the same indices.

I think it would only matter if you were merging RMG models and wanted to load the merged model back into RMG, which would parse the indices from the labels. I'm not sure exactly what effect duplicate indices would have in that scenario though. It might not actually matter.

I'm ok with the behavior you implemented. I think the key desired outcome is that the resulting merged mechanism file is directly simulate-able in Chemkin or Cantera.

#for spec in finalModel.species:
# if spec.label not in ['Ar','N2','Ne','He']:
# spec.index = speciesIndex + 1
# speciesIndex += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you implemented a different indexing system, I would just remove this section instead of commenting it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

@codecov
Copy link

codecov bot commented Jul 16, 2019

Codecov Report

Merging #1649 into master will increase coverage by 0.2%.
The diff coverage is 82.85%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #1649     +/-   ##
=========================================
+ Coverage   41.68%   41.89%   +0.2%     
=========================================
  Files         176      176             
  Lines       29341    29354     +13     
  Branches     6049     6052      +3     
=========================================
+ Hits        12232    12299     +67     
+ Misses      16239    16179     -60     
- Partials      870      876      +6
Impacted Files Coverage Δ
rmgpy/rmg/model.py 41.7% <ø> (+3.42%) ⬆️
rmgpy/tools/merge_models.py 46.91% <82.85%> (+32.62%) ⬆️
rmgpy/data/kinetics/family.py 52.9% <0%> (+0.23%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6274d70...a4f0320. Read the comment docs.

@goldmanm goldmanm force-pushed the merge_model_improvements branch 2 times, most recently from 87f2160 to eb995cf Compare July 24, 2019 18:02
Previously merge_models would error if no species or reactions found.
This commit prevents the divide by zero error from stopping merge_models.
This commit refactors merge_model to separate out the merging
and saving of the model so that it can be effectively unittested.
This commit adds unittests to test merge_model functionality.
This commit replaces a complete reindex for merged models
with a smart (as in 'smart speaker') algorithm that only reindexes a
species when it is has both an identical label and index with another
species. This prevents conflicts and reduces unnecessary reindexing.
@mliu49 mliu49 merged commit 2664c50 into master Jul 24, 2019
@mliu49 mliu49 deleted the merge_model_improvements branch July 24, 2019 20:01
@mliu49 mliu49 mentioned this pull request Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants