Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML: consistently clarify reaction type #747

Closed
ischoegl opened this issue Nov 12, 2019 · 8 comments
Closed

YAML: consistently clarify reaction type #747

ischoegl opened this issue Nov 12, 2019 · 8 comments
Labels
discussion Topics that are being discussed Kinetics

Comments

@ischoegl
Copy link
Member

ischoegl commented Nov 12, 2019

System information

  • Cantera version: 2.5.0a3
  • OS: all
  • Python/MATLAB version: N/A

Expected behavior

YAML entries clearly identify type of reaction, with the default being elementary. This is being followed consistently for GasKinetics (with three-body, etc.), and the same should be done for InterfaceKinetics etc., e.g.

- equation: CO2(S) => CO2 + PT(S) 
  type: interface # <--- suggested
  rate-constant: {A: 10000000000000.0, b: 0, Ea: 20500}

Edit: the type interface as default for InterfaceKinetics does makes sense (similar to elementary as default for GasKinetics). The following is less intuitive, as the beta and exchange-current-density-formulation fields are not required.

- equation: Li+[elyt] + V[cathode] + electron <=> Li[cathode] 
  type: electrochemical # <--- suggested
  rate-constant: {A: 562900000000.0, b: 0.0, Ea: 58 kJ/mol}
  exchange-current-density-formulation: true
  beta: 0.5

Actual behavior

The type field is not used for interface and electrochemical reactions, which can only be detected by parsing the species string (the type of reaction is purely implicit). Further, several electrochemical reaction types appear to be discontinued in the YAML format without a deprecation warning.

Additional context

The implicit specification of types is not future-proof, as it is not easily extendable. As YAML is not yet officially released, these clarifications can be easily added to 2.5.

@speth
Copy link
Member

speth commented Nov 12, 2019

For an interface reaction, you do not need to parse the reaction equation to know that it's of that type. It's required by the fact that you're adding it to an InterfaceKinetics object. I think requiring every reaction in a surface mechanism file to be tagged with type: interface would be a decrease in the readability of the YAML format.

For electrochemical reactions, the problem is the case where the reaction involves charge transfer but the reaction isn't explicitly identified as such (see #452). If the user has to specify type: electrochemical for all electrochemical reactions, we still have to check that it really is an electrochemical reaction, and that other reactions aren't, so it doesn't save much work on the implementation side.

I think there are still options for extensibility (including user-derived types) even if we use the reaction equation or other information besides the type field to determine reaction type in some cases. The most straightforward one would be to use the type field first if it's provided, and then check for reactions where it can be deduced by other means. If you wanted to offer even more flexibility, you could provide an interface that used functions like isElectrochemicalReaction for dispatching, instead of just using the type field as we do in the existing factory classes.

I might even consider it an improvement to go further in this direction and not require the type field in other cases where it can be deduced automatically, e.g. three-body reactions which can be identified by the M on each side of the equation.

@ischoegl
Copy link
Member Author

ischoegl commented Nov 12, 2019

@speth ... yes, it is true that things can be deduced, but is this an efficient implementation? If we have a reaction mechanism with 10k+ reactions, it is quicker to simply access a type field than to string-search for a third-body collision partner.

Imho, there are several arguments against trying to eliminate the type field:

  1. Specific tests for nested definitions need to be hard-coded, whereas a simple differentiation by a type field can be easily automated. I.e. for a factory class, it is much easier to create an object based on an assigned type field than to check through various cases to figure out what is actually implemented.
  2. Speed of parsing for very large mechanisms (10k+ reactions, 1k+ species is not uncommon) probably not as bad if it can be deduced from the reaction string itself.
  3. Omitting the 'type' field takes away from human-readability (if the field is there, it is immediately clear: think about new users of cantera).
  4. If a user makes an error in hand-coded reaction specifications, the setup step should fail with an appropriate error message ... so there is nothing to be gained by automatic type detection.
  5. There is no apparent reason why parsing of reaction specifications should require passing of a kinetics manager as an argument. Ok. I see the point.

PS: I wasn't aware of #452, but would argue that this is a shortcoming in the XML input file format, which imho has better solutions for YAMl. I.e. this leaves type: electrochemical as a much more intuitive solution on the table.

@ischoegl
Copy link
Member Author

I guess that in many cases (including three-body), autodetection may not introduce much overhead (I even may take a stab). Still, for electrochemical a type field would be imho a cleaner approach as the correct type is not easily inferred from the reaction definition alone.

@speth
Copy link
Member

speth commented Nov 13, 2019

Imho, there are several arguments against trying to eliminate the type field:

  1. Specific tests for nested definitions need to be hard-coded, whereas a simple differentiation by a type field can be easily automated. I.e. for a factory class, it is much easier to create an object based on an assigned type field than to check through various cases to figure out what is actually implemented.

Agreed, it does simplify the factory class implementation.

  1. Speed of parsing for very large mechanisms (10k+ reactions, 1k+ species is not uncommon)

Two things:

First, I did some benchmarking while developing the YAML parsing code, and I've been pretty happy with the performance so far. It's significantly faster than using the XML format, let alone converting CTI to XML. To provide a bit of data, I just ran a comparison for a fairly large mechanism (LLNL butanol mechanism with 431 species and 2551 reactions) and importing in Python with transport_model=None, I get the following results:

  • Full processing, with caches temporarily disabled (includes reading file, [conversion from cti,] parsing to XML_Node / AnyMap, and creating a Solution object each time):

    • CTI: 1.86 s ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    • XML: 1.25 s ± 9.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    • YAML: 619 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  • Solution instantiations from cached XML_Node / AnyMap:

    • XML: 1.08 s ± 8.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    • CTI: 1.08 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    • YAML: 82.2 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The takeaway is that creating a Solution from YAML is 2x-10x faster, so I think we can spare a few CPU cycles here if there's a good use case. Of course, this all gets blown away by the time it takes to do the transport property fits (another 5 seconds) if you need to instantiate a transport object.

Second, we have to parse the reaction equation at some point. And if the check that we would
have to do for a reaction is too time consuming, we could rely on an explicit type field -- I don't mean to suggest that we need to identify all reaction types this way.

  1. Omitting the 'type' field takes away from human-readability (if the field is there, it is immediately clear: think about new users of cantera).

I think there are different ways to think about what's more useful to the user. One is that having the type field gives the user a way to track down what it is that Cantera does with a reaction entry, e.g. if I search the code for pressure-dependent-Arrhenius, I will find out how those reactions are handled. This is less helpful for something like electrochemical, though. Another perspective is that it's visual clutter when there is other information in the reaction entry that tells me what kind of reaction it is. I don't know which consideration is more important, and the answer may be reaction-type dependent. For instance, there may be charge transfer reactions that aren't completely obvious (if they don't have an explicit electron participating?) and where writing type: electrochemical would clarify things, while it's pretty obvious that H + O2 + M <=> HO2 + M is a three body reaction.

  1. If a user makes an error in hand-coded reaction specifications, the setup step should fail with an appropriate error message ... so there is nothing to be gained by automatic type detection.

While I agree that we should try to detect any inconsistencies, requiring redundant information to be encoded in the reaction entry just compounds the potential for errors. And I think the setup step for YAML currently does fail for many (I don't want underestimate the creativity of users and say all) of the possible inconsistencies in reaction information.

  1. There is no apparent reason why parsing of reaction specifications should require passing of a kinetics manager as an argument. Ok. I see the point.

This is perhaps a bit subtle, but it's impossible to determine the units of the pre-exponential factor, which is needed to do unit conversions, without having the Kinetics object and all of the attached ThermoPhase objects available. This was handled in CTI/XML by doing the unit conversions in ctml_writer, which therefore had to know all of this information, and the XML file could only have the rate constant in Cantera's default units.

@ischoegl
Copy link
Member Author

@speth ... thanks for the detailed response! Responding where appropriate ...

  1. First, I did some benchmarking while developing the YAML parsing code

those number are impressive!

  1. Second, we have to parse the reaction equation at some point.

Agreed, but it should be done exactly once. In the current implementation, every surface reaction gets parsed twice due to the omission of an appropriate type field for electrochemistry, i.e.

// See if this is an electrochemical reaction
Reaction testReaction(0);
parseReactionEquation(testReaction, node["equation"], kin);

Imho, not setting an electrochemical type in the YAML specification is an omission that complicates the code. Adopting a type will require a transition with some warnings, but long-term there won't be an issue with historical work-arounds.

  1. I think there are different ways to think about what's more useful to the user. One is that having the type field gives the user a way to track down what it is that Cantera does with a reaction entry, e.g. if I search the code for pressure-dependent-Arrhenius, I will find out how those reactions are handled. This is less helpful for something like electrochemical, though.

In the current implementation, electrochemical is the only type needed, but it should obviously be differentiated further, e.g. butler-volmer, global etc. (see old XML types).

  1. [...] requiring redundant information to be encoded in the reaction entry just compounds the potential for errors.

I am pretty unapologetic about this, as it's better to have repeated failures due to erroneous user input than have Cantera incorrectly guessing what a user had in mind. As the bulk of the cases are machine generated (ck2yaml etc.) potential errors are only occurring for those users who hand-assemble input files. In those cases, the science section is recommended reading.

Based on the discussion, I certainly see the appeal of detection of three-body reactions, etc.. On the other hand, a self-documenting approach (enforce type unless it's a phase default) is imho a much clearer approach. I think it makes more sense to strictly enforce syntax (and throw meaningful warnings/errors) than to implement 'elegant' solutions that are hard to maintain.

@speth
Copy link
Member

speth commented Nov 13, 2019

Imho, not setting an electrochemical type in the YAML specification is an omission that complicates the code. Adopting a type will require a transition with some warnings, but long-term there won't be an issue with historical work-arounds.

I don't see why a warning would be required -- the YAML format is not in any stable release, so I think we can make breaking changes as needed.

I am pretty unapologetic about this, as it's better to have repeated failures due to erroneous user input than have Cantera incorrectly guessing what a user had in mind. As the bulk of the cases are machine generated (ck2yaml etc.) potential errors are only occurring for those users who hand-assemble input files. In those cases, the science section is recommended reading.

For the vast majority of user input, we rely on the user to write what they mean, and have no way of checking that that input is what they meant. Obviously, in those cases where we can detect inconsistencies, we should, but I don't think we should try to design the input format to require redundant data just so that we can check for consistency.

enforce type unless it's a phase default

I think this is a simple and logically-consistent option that won't result in too much clutter in input files, while providing the opportunity to simplify reaction parsing. For now, I think this means that the only necessary change to the YAML specification is that electrochemical reactions need to be labeled as type: electrochemical.

@ischoegl
Copy link
Member Author

I think this is a simple and logically-consistent option that won't result in too much clutter in input files, while providing the opportunity to simplify reaction parsing. For now, I think this means that the only necessary change to the YAML specification is that electrochemical reactions need to be labeled as type: electrochemical.

Perfect!

@ischoegl
Copy link
Member Author

Closing per discussion in #750: automatic detection is preferred over simplicity of parsing. Long-term, it would make sense to avoid duplicate parsing of surface reactions by other means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Topics that are being discussed Kinetics
Projects
None yet
Development

No branches or pull requests

3 participants