Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable modification of user input data fields in Python #1031

Merged
merged 1 commit into from
Jun 7, 2021

Conversation

speth
Copy link
Member

@speth speth commented May 9, 2021

Changes proposed in this pull request

  • Adds update_user_data and clear_user_data methods to Python objects with an input_data property

This allows custom data to be stored when writing to YAML files. Implementation of this was, as expected, 🍰 with the dict_to_anymap method already in place thanks to #1014.

Checklist

  • There is a clear use-case for this code change
  • The commit message has a short title & references relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage
  • Style & formatting of contributed code follows contributing guidelines
  • The pull request is ready for review

@codecov
Copy link

codecov bot commented May 9, 2021

Codecov Report

Merging #1031 (d49d98a) into main (01bf417) will increase coverage by 0.00%.
The diff coverage is n/a.

❗ Current head d49d98a differs from pull request most recent head 941b5dc. Consider uploading reports for the commit 941b5dc to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1031   +/-   ##
=======================================
  Coverage   72.59%   72.60%           
=======================================
  Files         356      356           
  Lines       46508    46508           
=======================================
+ Hits        33763    33765    +2     
+ Misses      12745    12743    -2     
Impacted Files Coverage Δ
src/base/AnyMap.cpp 89.61% <0.00%> (+0.20%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01bf417...941b5dc. Read the comment docs.

Copy link
Member

@ischoegl ischoegl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have some thoughts later this week.

@ischoegl
Copy link
Member

ischoegl commented May 13, 2021

OK ... I mostly have a philosophical question about the implementation, which boils down to what input/input_data actually means.

When I first saw the term input in #984, I thought of it as being 'original input data', but going through I quickly realized that as it (still) stands it is 'input data for a future incarnation'. Prior to this PR, there was relatively little impact, as once instantiated, there wasn't much of a way of adding/changing the parameterization. I.e. thus far, it was indeed input.

Going forward, I can see two options for how users interact with objects:

  1. Users change objects as they go, and would have to reload the original Solution if they want to reset to the original input
  2. Users have the ability to walk back changes at any time (e.g. by calling a Solution.reset() or Reaction.reset(), etc. to restore original data).

The PR as it stands is (1), and is a legitimate 'solution'; I mainly want to make sure that there may not be a potential use case for (2) as this would be the time to start. The implementation would be quite simple, where all that is needed is a second AnyMap member variable that holds information added after the objects are instantiated, plus some tweaks for logic in some of the getters.

@speth
Copy link
Member Author

speth commented May 14, 2021

It occurred to me recently that with the addition of YAML serialization, even the phrase "input file" is a little bit weird, since now you can output an input file. I'm certainly open to suggestions for alternative terminology here, as I do see why the name input_data might have led you down that path.

I don't quite know what a potential use case for (2) would be, but as you said, this information can be retrieved by loading another Solution (or whatever) from the input file. So I don't think that we're precluding any possible uses by not differentiating between the original input data and ways that the object has been modified since the input file was loaded. Realistically, I think that Solution.reset() would be as expensive as instantiating a new Solution in many instances.

@decaluwe
Copy link
Member

Re "input file" vs. some other name, I've been using "mechanism file" in my own code, which I think works.

The rebuttal question that immediately comes to mind, though: even if we are "outputting" a YAML file, is is only useful/a thing we care about to the extent that it will one day be imported as an "input" for some other, future simulation/numerical experiment?

I actually think the answer is "no - it has other uses." i.e if you are fitting experimental data and adjusting the mechanism details on the fly, those rates, thermo values, etc, would be the result of the fitting, and something you would report out.

Copy link
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @speth. I don't have any strong opinions about the naming distinction, since as @decaluwe points out, these files can be used in many different contexts. "Mechanism file" implies to me that it involves some reaction mechanism, which also may not always be the case. "Input file" seems to be the best choice to me, but no strong opinion.

I do have several questions here about the implementation, probably because this is my first time really digging into the details. So if my questions/suggestions don't make any sense, please take them in the "please teach me something" spirit 😄

interfaces/cython/cantera/_cantera.pxd Show resolved Hide resolved
interfaces/cython/cantera/base.pyx Show resolved Hide resolved
Clear any user-provided auxillary input data, so that the data given by
`input_data` or `write_yaml` will only include values generated by Cantera.
"""
self.thermo.input().clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this distinguishes between data that the user has added using update_user_data and the data coming from the YAML file that was used to create the Solution. Can you clarify that for me? Thanks 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't distinguish - both are erased here. But once the object's been constructed, Cantera doesn't need a pristine copy of the data from the input file, so clearing this doesn't have any noticeable effect besides removing the auxiliary data, since the necessary fields will be calculated through the calls to getParameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in principle, this does not support true round-tripping of an input file, right? If all the data are cleared, and any data computed by Cantera is not quite identical to the original file, the resulting generated file wouldn't be identical?

This may be a pedantic thing, but since the docstring says one thing, and the code actually does something else, I foresee potential problems as a result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think we're getting pretty far off the purpose of this PR, as this basically all about features that were introduced in #984 if not earlier, but I can try to provide a bit of an explanation.

We don't keep track of how an object has changed since it was instantiated from the input file, so any time you ask for the input_data or call write_yaml, that's being calculated on the fly. I don't think there's anything that promises to precisely round-trip an input file, and there are currently a variety of ways in which that isn't currently possible. Some of these will probably be addressed relatively soon, like the serialization of the file-level metadata (as requested in #1013), while others may not (like retaining comments in YAML files). The only promise that I think we should be trying to make regarding serialization is that you can generate an input file that will recreate an object in its current state, and I think that is currently the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in principle, this does not support true round-tripping of an input file, right? If all the data are cleared, and any data computed by Cantera is not quite identical to the original file, the resulting generated file wouldn't be identical?

This may be a pedantic thing, but since the docstring says one thing, and the code actually does something else, I foresee potential problems as a result.

This is quite similar to what I had mentioned in one of my comments above (what does input data mean?). Either path forward is fine; I believe round-trip conversion would be simple to implement as stated earlier:

The implementation would be quite simple, where all that is needed is a second AnyMap member variable that holds information added after the objects are instantiated, plus some tweaks for logic in some of the getters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine if we consider changing/discussing the round-trip behavior out of scope for this PR. That said, I'd prefer if this docstring were a little more accurate to reflect what's actually happening. Since this very explicitly says that user-provided data will be cleared, it implies that's the only thing that's cleared. Maybe something like:

Clear auxiliary input data, including user-provided values. When write_yaml or input_data are called, Cantera will regenerate the input data set to reproduce the current object state.

I'm not sure if this is accurate though; I'm thinking of the case where a user may have manually, e.g., changed a species thermo entry (for example, to debug a discontinuous polynomial) and that data will be written out instead of the data that was originally used to create the Solution(). Is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had been using "auxiliary" to mean data that is not used by Cantera, so there is no auxiliary input data that isn't user-provided. I guess this would be more precise:

Clear all saved input data, so that the data given by input_data or write_yaml will only include values generated by Cantera based on the current object state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great to me! Thanks!

interfaces/cython/cantera/test/test_composite.py Outdated Show resolved Hide resolved
Data specified this way is stored when writing to YAML files.
Copy link
Member

@ischoegl ischoegl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @speth!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants