Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling Reactors #1331

Merged
merged 14 commits into from
May 16, 2018
Merged

Sampling Reactors #1331

merged 14 commits into from
May 16, 2018

Conversation

mjohnson541
Copy link
Contributor

The pull request enables reactors to sample over a range of conditions rather than a single condition. Conditions are chosen by maximizing distance from prior run conditions (weighted by how recent they were, the conversion/time reached and how many objects they returned). These variable condition reactors terminate after a defined number of iterations (nSimsTerm). This should allow us to avoid missing chemistry that occurs between two fixed reactors and improve our efficiency with simulations.

@mjohnson541 mjohnson541 changed the title Sampling reactors Sampling Reactors Mar 26, 2018
@codecov
Copy link

codecov bot commented Mar 26, 2018

Codecov Report

Merging #1331 into master will decrease coverage by 0.18%.
The diff coverage is 16.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1331      +/-   ##
==========================================
- Coverage   42.24%   42.06%   -0.19%     
==========================================
  Files         168      168              
  Lines       27508    27695     +187     
  Branches     5367     5432      +65     
==========================================
+ Hits        11621    11650      +29     
- Misses      15130    15275     +145     
- Partials      757      770      +13
Impacted Files Coverage Δ
rmgpy/tools/fluxdiagram.py 9.5% <ø> (ø) ⬆️
rmgpy/reduction/reduction.py 19.11% <0%> (ø) ⬆️
rmgpy/rmg/input.py 40.87% <33.8%> (-1.5%) ⬇️
rmgpy/rmg/main.py 22.89% <7.89%> (-2.24%) ⬇️
rmgpy/data/kinetics/family.py 59.22% <0%> (+0.4%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a07cc9...ea83368. Read the comment docs.

@alongd
Copy link
Member

alongd commented Mar 28, 2018

Could you add some documentation on this feature?

@mjohnson541
Copy link
Contributor Author

Documentation should be added now.

Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! please see comments/questions below

Advanced Setting: Range Based Reactors
-------------------------------------------------

Under this setting rather than use reactors at fixed points reaction conditions are sampled from a range of conditions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comma after "at fixed points,"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

-------------------------------------------------

Under this setting rather than use reactors at fixed points reaction conditions are sampled from a range of conditions.
Conditions are chosen by maximizing distance from prior run conditions (weighted by how recent they were, the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add(in terms of T, P, concentrations) after maximizing distance
Also, what are the constraints? Simply maximizing the distance would mean choosing the most extreme T/P in the given range. Or did I miss understood something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're weighted I've added "maximizing weighted distance (in terms of ...."

simpleReactor(
temperature=[(1000,'K'),(1500,'K')],
pressure=[(1.0,'bar'),(10.0,'bar')],
nSimsTerm=12,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason that 12 is used here? Does it work best from your experience?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 was my estimate of real reactors to do this system properly, I think 24 might be a more robust choice, but that makes the example run long. I'll add a comment noting that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you direct me to the comment that was added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, in my tests it turns out 12 is actually a really good number for this one so I don't think a comment is necessary. I've also added a rule of thumb for choosing nSimsTerm to the documentation.

):
logging.debug('Found SimpleReactor reaction system')

for value in initialMoleFractions.values():
if value < 0:
if value != list and value < 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be if not isinstance(value, list) and ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really embarrassed about these ones...

rmgpy/rmg/input.py Outdated Show resolved Hide resolved
@@ -764,15 +798,15 @@ def execute(self, **kwargs):
# Run sensitivity analysis post-model generation if sensitivity analysis is on
for index, reactionSystem in enumerate(self.reactionSystems):

if reactionSystem.sensitiveSpecies:
if reactionSystem.sensitiveSpecies and reactionSystem.sensConditions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sensConditions is None if ranges aren't specified. Does this condition allow "normal" SA to be run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should do so now since the non-ranged values will default as sensConditions

@@ -782,8 +816,9 @@ def execute(self, **kwargs):
pdepNetworks = self.reactionModel.networkList,
sensitivity = True,
sensWorksheet = sensWorksheet,
modelSettings = self.modelSettingsList[-1],
modelSettings = ModelSettings(toleranceMoveToCore=1e8,toleranceInterruptSimulation=1e8),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before RMG was guaranteed to not add objects on the sensitivity runs because otherwise the run would not have terminated. Now if we don't do this the sensitivity analysis can terminate without completing the simulation.

Rmem.getCond()
Rmem.addtConvN(1.0,.2,2)
Rmem.generateCond()
Rmem.getCond()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add relevant assertions and helpful error messages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't really think of anything I felt the need to specifically check in the object other than that the basic operations could be done. It is deterministic so I could hard code the test (check whether its generating the same condition), but I feel like unless I'm checking something I have reason to worry about that just forces someone to change the hard coded numbers anytime they change something in the weighing or optimization algorithm.

if conditions:
isConc = hasattr(self,'initialConcentrations')
keys = conditions.keys()
if 'T' in keys and hasattr(self,'T'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could T be in keys w/o having a self.T attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if statement fundamentally just guarantees that the next line won't error. It was made to be general because base.pyx stuff has to work with any kind of reactor we propose.

for k in keys:
if isConc:
if k in self.initialConcentrations.keys():
self.initialConcentrations[k] = Quantity(conditions[k],'mol/m^3')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with LiqReactor, but couldn't the conditions be specified in different units? If so, we're missing a conversion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RMG_Memories object only outputs stuff in SI units

@mjohnson541 mjohnson541 force-pushed the SamplingReactors branch 4 times, most recently from 5c1f051 to 021490e Compare April 28, 2018 19:16
@mjohnson541
Copy link
Contributor Author

Ok, I've run tests I want to and updated the documentation accordingly to give better guidelines. I believe I've also fixed the issues you pointed out.

@mjohnson541
Copy link
Contributor Author

I made some significant changes to the documentation so if you could look that over I'd appreciate it.

@mliu49
Copy link
Contributor

mliu49 commented Apr 28, 2018

I would like to see some more test results before this gets merged, including but not limited to:

  1. RMG-tests
  2. Comparison of output/runtime between multiple reactors vs. sampling reactor over a T range (and potentially P and composition ranges as well)

To be honest, the mistakes which Alon pointed out were a bit concerning. It would be best if you could improve the test coverage of new code which you added.

initialMoleFractions[key] = float(value)
if value < 0:
raise InputError('Initial mole fractions cannot be negative.')
elif isinstance(value,list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be enough to just write else:?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

if value < 0:
raise InputError('Initial mole fractions cannot be negative.')
elif isinstance(value,list):
initialMoleFractions[key] = [float(value[0]),float(value[1])]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we check for cases where the user inputs the concentration as a list but with just one value? (i.e., value[1] does not exist?) I think this should be a separate Error, it it might happen frequently if modifying an input file that has some concentration ranges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added InputErrors with useful messages for these cases.

@mjohnson541
Copy link
Contributor Author

@mliu49 I've greatly improved coverage from the original PR, however it doesn't show up because except for testing the new RMG_Memory objects it makes the most sense to test the rest of this system using functional tests. I believe under the functional tests nearly all lines of code I've added are covered now.

I'm planning to present some results related to the output in RMG meeting this week. What question are you trying to answer with runtime information?

@mliu49
Copy link
Contributor

mliu49 commented Apr 30, 2018

I understand that it may be difficult to write unit tests for these changes, and I wasn't referring explicitly to the coverage percentage. I mainly meant that the fact that you didn't catch those errors during testing suggested that you weren't running the appropriate tests, whether unit or functional.

I mentioned runtime comparison since you said in your initial comment that this should provide improved efficiency. That makes sense, but I was hoping you could quantify it, even if just in one example case.

@mjohnson541
Copy link
Contributor Author

mjohnson541 commented Apr 30, 2018

I apologize for the state of the original PR, but I'm pretty confident the tests I've added since are sufficient to thoroughly test this feature.

Ok, I expect I can demonstrate that.

@mjohnson541
Copy link
Contributor Author

So after discovering a space sampling issue with my original method for choosing conditions I discussed the sampling problem with Kevin Silmore from the Swan lab and he proposed a different sampling algorithm I've now adopted in this PR. In this algorithm we evaluate the objective function at a grid of condition points, normalize these values and then choose a point where each point has probability of its normalized objective function value of being chosen. After a grid point is chosen a random step (of maximum length half the distance to the next grid point) is taken from the grid point to choose a final sample point.

@mjohnson541 mjohnson541 force-pushed the SamplingReactors branch 2 times, most recently from 71cca85 to 0045ce9 Compare May 4, 2018 20:57
@mjohnson541
Copy link
Contributor Author

The deterministic-ness issue was actually unrelated to random number generation and actually related to non-deterministic solver output. By removing termination conversion and time from the objective function the algorithm is now as deterministic as regular RMG.

@mjohnson541
Copy link
Contributor Author

Ok, I believe I've fixed everything that needed fixed in this.

@mjohnson541
Copy link
Contributor Author

Where does that occur?

@mjohnson541
Copy link
Contributor Author

Ok, I believe I've fixed all of the above issues.

@mjohnson541 mjohnson541 force-pushed the SamplingReactors branch 2 times, most recently from 62c168a to 6686976 Compare May 14, 2018 19:45
@mjohnson541
Copy link
Contributor Author

Are there any other issues or can we merge this?

@alongd
Copy link
Member

alongd commented May 16, 2018

Could you take a look at the Codacy report? I think there a some recommendations there we could accept. Everything else looks good to me.

this system allows RMG to choose new conditions for each simulation using the
Weighted Stochastic Grid Sampling Algorithm (idea credit to Kevin Silmore) discussed below.

First all conditions are normalized (P is normalized in log P units).  An objective function is defined that weighs conditions based on their distance from prior conditions, on the number of objects returned and how recent the prior condition was. This objective function is evaluated in a multidimensional grid and these evaluation points are normalized so that they sum to one.  Then a condition is chosen randomly with each point being chosen with the probability of its normalized objective function value.  After this a random step (maximum magnitude is sqrt(2)/2 times the distance to another grid point) is taken from the grid point to give a final condition.  This causes it to sample from the entire condition range.

The random number generator used is seeded so the process is deterministic.

The RMG_Memory object keeps track of all of this. The number of simulations completed successfully (without adding objects) is also tracked to determine when the reactor can complete.
…actors

Also adaption of conversion calculations and return type for simulate so that the time and conversion can be put into the RMG_Memory objects. Addition of sensConditions for specifying the conditions under, which to do sensitivity analysis.
@mjohnson541
Copy link
Contributor Author

Ok, I've fixed some of codacy's recommendations.

If a balance species is set mole fractions are generated thus:  the ranged mole fractions are varied within their ranges according to the algorithm while the others remain constant, then the balance species is adjusted so that all of the mole fractions sum to one.  This causes all mole fractions except the balanceSpecies to remain in their ranges
Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Ready for Review PR is complete and ready to be reviewed Type: Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants