Enable modification of user input data fields in Python #1031

speth · 2021-05-09T02:37:19Z

Changes proposed in this pull request

Adds update_user_data and clear_user_data methods to Python objects with an input_data property

This allows custom data to be stored when writing to YAML files. Implementation of this was, as expected, 🍰 with the dict_to_anymap method already in place thanks to #1014.

Checklist

There is a clear use-case for this code change
The commit message has a short title & references relevant issues
Build passes (scons build & scons test) and unit tests address code coverage
Style & formatting of contributed code follows contributing guidelines
The pull request is ready for review

codecov · 2021-05-09T03:14:12Z

Codecov Report

Merging #1031 (d49d98a) into main (01bf417) will increase coverage by 0.00%.
The diff coverage is n/a.

❗ Current head d49d98a differs from pull request most recent head 941b5dc. Consider uploading reports for the commit 941b5dc to get more accurate results

@@           Coverage Diff           @@
##             main    #1031   +/-   ##
=======================================
  Coverage   72.59%   72.60%           
=======================================
  Files         356      356           
  Lines       46508    46508           
=======================================
+ Hits        33763    33765    +2     
+ Misses      12745    12743    -2

Impacted Files	Coverage Δ
src/base/AnyMap.cpp	`89.61% <0.00%> (+0.20%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01bf417...941b5dc. Read the comment docs.

interfaces/cython/cantera/base.pyx

ischoegl

I may have some thoughts later this week.

ischoegl · 2021-05-13T20:48:10Z

OK ... I mostly have a philosophical question about the implementation, which boils down to what input/input_data actually means.

When I first saw the term input in #984, I thought of it as being 'original input data', but going through I quickly realized that as it (still) stands it is 'input data for a future incarnation'. Prior to this PR, there was relatively little impact, as once instantiated, there wasn't much of a way of adding/changing the parameterization. I.e. thus far, it was indeed input.

Going forward, I can see two options for how users interact with objects:

Users change objects as they go, and would have to reload the original Solution if they want to reset to the original input
Users have the ability to walk back changes at any time (e.g. by calling a Solution.reset() or Reaction.reset(), etc. to restore original data).

The PR as it stands is (1), and is a legitimate 'solution'; I mainly want to make sure that there may not be a potential use case for (2) as this would be the time to start. The implementation would be quite simple, where all that is needed is a second AnyMap member variable that holds information added after the objects are instantiated, plus some tweaks for logic in some of the getters.

speth · 2021-05-14T23:30:34Z

It occurred to me recently that with the addition of YAML serialization, even the phrase "input file" is a little bit weird, since now you can output an input file. I'm certainly open to suggestions for alternative terminology here, as I do see why the name input_data might have led you down that path.

I don't quite know what a potential use case for (2) would be, but as you said, this information can be retrieved by loading another Solution (or whatever) from the input file. So I don't think that we're precluding any possible uses by not differentiating between the original input data and ways that the object has been modified since the input file was loaded. Realistically, I think that Solution.reset() would be as expensive as instantiating a new Solution in many instances.

decaluwe · 2021-05-15T02:40:14Z

Re "input file" vs. some other name, I've been using "mechanism file" in my own code, which I think works.

The rebuttal question that immediately comes to mind, though: even if we are "outputting" a YAML file, is is only useful/a thing we care about to the extent that it will one day be imported as an "input" for some other, future simulation/numerical experiment?

I actually think the answer is "no - it has other uses." i.e if you are fitting experimental data and adjusting the mechanism details on the fly, those rates, thermo values, etc, would be the result of the fitting, and something you would report out.

bryanwweber

Thanks @speth. I don't have any strong opinions about the naming distinction, since as @decaluwe points out, these files can be used in many different contexts. "Mechanism file" implies to me that it involves some reaction mechanism, which also may not always be the case. "Input file" seems to be the best choice to me, but no strong opinion.

I do have several questions here about the implementation, probably because this is my first time really digging into the details. So if my questions/suggestions don't make any sense, please take them in the "please teach me something" spirit 😄

interfaces/cython/cantera/_cantera.pxd

interfaces/cython/cantera/base.pyx

bryanwweber · 2021-05-20T14:11:47Z

interfaces/cython/cantera/base.pyx

+        Clear any user-provided auxillary input data, so that the data given by
+        `input_data` or `write_yaml` will only include values generated by Cantera.
+        """
+        self.thermo.input().clear()


I don't see how this distinguishes between data that the user has added using update_user_data and the data coming from the YAML file that was used to create the Solution. Can you clarify that for me? Thanks 😄

It doesn't distinguish - both are erased here. But once the object's been constructed, Cantera doesn't need a pristine copy of the data from the input file, so clearing this doesn't have any noticeable effect besides removing the auxiliary data, since the necessary fields will be calculated through the calls to getParameters.

So in principle, this does not support true round-tripping of an input file, right? If all the data are cleared, and any data computed by Cantera is not quite identical to the original file, the resulting generated file wouldn't be identical?

This may be a pedantic thing, but since the docstring says one thing, and the code actually does something else, I foresee potential problems as a result.

Again, I think we're getting pretty far off the purpose of this PR, as this basically all about features that were introduced in #984 if not earlier, but I can try to provide a bit of an explanation.

We don't keep track of how an object has changed since it was instantiated from the input file, so any time you ask for the input_data or call write_yaml, that's being calculated on the fly. I don't think there's anything that promises to precisely round-trip an input file, and there are currently a variety of ways in which that isn't currently possible. Some of these will probably be addressed relatively soon, like the serialization of the file-level metadata (as requested in #1013), while others may not (like retaining comments in YAML files). The only promise that I think we should be trying to make regarding serialization is that you can generate an input file that will recreate an object in its current state, and I think that is currently the case.

So in principle, this does not support true round-tripping of an input file, right? If all the data are cleared, and any data computed by Cantera is not quite identical to the original file, the resulting generated file wouldn't be identical?

This may be a pedantic thing, but since the docstring says one thing, and the code actually does something else, I foresee potential problems as a result.

This is quite similar to what I had mentioned in one of my comments above (what does input data mean?). Either path forward is fine; I believe round-trip conversion would be simple to implement as stated earlier:

The implementation would be quite simple, where all that is needed is a second AnyMap member variable that holds information added after the objects are instantiated, plus some tweaks for logic in some of the getters.

I think it's fine if we consider changing/discussing the round-trip behavior out of scope for this PR. That said, I'd prefer if this docstring were a little more accurate to reflect what's actually happening. Since this very explicitly says that user-provided data will be cleared, it implies that's the only thing that's cleared. Maybe something like:

Clear auxiliary input data, including user-provided values. When write_yaml or input_data are called, Cantera will regenerate the input data set to reproduce the current object state.

I'm not sure if this is accurate though; I'm thinking of the case where a user may have manually, e.g., changed a species thermo entry (for example, to debug a discontinuous polynomial) and that data will be written out instead of the data that was originally used to create the Solution(). Is that correct?

I had been using "auxiliary" to mean data that is not used by Cantera, so there is no auxiliary input data that isn't user-provided. I guess this would be more precise:

Clear all saved input data, so that the data given by input_data or write_yaml will only include values generated by Cantera based on the current object state.

That sounds great to me! Thanks!

interfaces/cython/cantera/test/test_composite.py

Data specified this way is stored when writing to YAML files.

ischoegl

Looks good to me.

bryanwweber

Thanks @speth!

bryanwweber reviewed May 9, 2021

View reviewed changes

interfaces/cython/cantera/base.pyx Outdated Show resolved Hide resolved

speth force-pushed the modify-user-data branch from 170f152 to 42b7633 Compare May 9, 2021 22:21

speth requested a review from bryanwweber May 9, 2021 22:24

ischoegl reviewed May 10, 2021

View reviewed changes

bryanwweber requested changes May 20, 2021

View reviewed changes

speth force-pushed the modify-user-data branch from 42b7633 to d49d98a Compare June 6, 2021 03:26

speth requested a review from bryanwweber June 6, 2021 03:51

[Python] Enable modification of user input data fields

941b5dc

Data specified this way is stored when writing to YAML files.

speth force-pushed the modify-user-data branch from d49d98a to 941b5dc Compare June 6, 2021 22:42

ischoegl reviewed Jun 6, 2021

View reviewed changes

bryanwweber approved these changes Jun 7, 2021

View reviewed changes

bryanwweber merged commit 38bbc66 into Cantera:main Jun 7, 2021

speth deleted the modify-user-data branch June 7, 2021 01:16

This was referenced Jun 9, 2021

Refactor reaction rate evaluators #995

Merged

Paper cuts in YAML serialization API #1053

Closed

ischoegl mentioned this pull request Dec 14, 2021

Export Solution to Chemkin Format Cantera/enhancements#52

Closed

speth mentioned this pull request May 3, 2022

Optional argument for Parameter Uncertainties in Cantera Input Files and Objects Cantera/enhancements#62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable modification of user input data fields in Python #1031

Enable modification of user input data fields in Python #1031

speth commented May 9, 2021 •

edited

Loading

codecov bot commented May 9, 2021 •

edited

Loading

ischoegl left a comment

ischoegl commented May 13, 2021 •

edited

Loading

speth commented May 14, 2021

decaluwe commented May 15, 2021

bryanwweber left a comment

bryanwweber May 20, 2021

speth Jun 6, 2021

bryanwweber Jun 6, 2021

speth Jun 6, 2021

ischoegl Jun 6, 2021

bryanwweber Jun 6, 2021

speth Jun 6, 2021

bryanwweber Jun 6, 2021

ischoegl left a comment

bryanwweber left a comment

Enable modification of user input data fields in Python #1031

Enable modification of user input data fields in Python #1031

Conversation

speth commented May 9, 2021 • edited Loading

codecov bot commented May 9, 2021 • edited Loading

Codecov Report

ischoegl left a comment

Choose a reason for hiding this comment

ischoegl commented May 13, 2021 • edited Loading

speth commented May 14, 2021

decaluwe commented May 15, 2021

bryanwweber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ischoegl left a comment

Choose a reason for hiding this comment

bryanwweber left a comment

Choose a reason for hiding this comment

speth commented May 9, 2021 •

edited

Loading

codecov bot commented May 9, 2021 •

edited

Loading

ischoegl commented May 13, 2021 •

edited

Loading