YAML: consistently clarify reaction type #747

ischoegl · 2019-11-12T05:16:20Z

System information

Cantera version: 2.5.0a3
OS: all
Python/MATLAB version: N/A

Expected behavior

YAML entries clearly identify type of reaction, with the default being elementary. This is being followed consistently for GasKinetics (with three-body, etc.), and the same should be done for InterfaceKinetics etc., e.g.

- equation: CO2(S) => CO2 + PT(S) 
  type: interface # <--- suggested
  rate-constant: {A: 10000000000000.0, b: 0, Ea: 20500}

Edit: the type interface as default for InterfaceKinetics does makes sense (similar to elementary as default for GasKinetics). The following is less intuitive, as the beta and exchange-current-density-formulation fields are not required.

- equation: Li+[elyt] + V[cathode] + electron <=> Li[cathode] 
  type: electrochemical # <--- suggested
  rate-constant: {A: 562900000000.0, b: 0.0, Ea: 58 kJ/mol}
  exchange-current-density-formulation: true
  beta: 0.5

Actual behavior

The type field is not used for interface and electrochemical reactions, which can only be detected by parsing the species string (the type of reaction is purely implicit). Further, several electrochemical reaction types appear to be discontinued in the YAML format without a deprecation warning.

Additional context

The implicit specification of types is not future-proof, as it is not easily extendable. As YAML is not yet officially released, these clarifications can be easily added to 2.5.

The text was updated successfully, but these errors were encountered:

speth · 2019-11-12T21:34:17Z

For an interface reaction, you do not need to parse the reaction equation to know that it's of that type. It's required by the fact that you're adding it to an InterfaceKinetics object. I think requiring every reaction in a surface mechanism file to be tagged with type: interface would be a decrease in the readability of the YAML format.

For electrochemical reactions, the problem is the case where the reaction involves charge transfer but the reaction isn't explicitly identified as such (see #452). If the user has to specify type: electrochemical for all electrochemical reactions, we still have to check that it really is an electrochemical reaction, and that other reactions aren't, so it doesn't save much work on the implementation side.

I think there are still options for extensibility (including user-derived types) even if we use the reaction equation or other information besides the type field to determine reaction type in some cases. The most straightforward one would be to use the type field first if it's provided, and then check for reactions where it can be deduced by other means. If you wanted to offer even more flexibility, you could provide an interface that used functions like isElectrochemicalReaction for dispatching, instead of just using the type field as we do in the existing factory classes.

I might even consider it an improvement to go further in this direction and not require the type field in other cases where it can be deduced automatically, e.g. three-body reactions which can be identified by the M on each side of the equation.

ischoegl · 2019-11-12T22:30:50Z

@speth ... yes, it is true that things can be deduced, but is this an efficient implementation? If we have a reaction mechanism with 10k+ reactions, it is quicker to simply access a type field than to string-search for a third-body collision partner.

Imho, there are several arguments against trying to eliminate the type field:

Specific tests for nested definitions need to be hard-coded, whereas a simple differentiation by a type field can be easily automated. I.e. for a factory class, it is much easier to create an object based on an assigned type field than to check through various cases to figure out what is actually implemented.
~~Speed of parsing for very large mechanisms (10k+ reactions, 1k+ species is not uncommon)~~ probably not as bad if it can be deduced from the reaction string itself.
Omitting the 'type' field takes away from human-readability (if the field is there, it is immediately clear: think about new users of cantera).
If a user makes an error in hand-coded reaction specifications, the setup step should fail with an appropriate error message ... so there is nothing to be gained by automatic type detection.
~~There is no apparent reason why parsing of reaction specifications should require passing of a kinetics manager as an argument.~~ Ok. I see the point.

PS: I wasn't aware of #452, but would argue that this is a shortcoming in the XML input file format, which imho has better solutions for YAMl. I.e. this leaves type: electrochemical as a much more intuitive solution on the table.

ischoegl · 2019-11-13T00:03:32Z

I guess that in many cases (including three-body), autodetection may not introduce much overhead (I even may take a stab). Still, for electrochemical a type field would be imho a cleaner approach as the correct type is not easily inferred from the reaction definition alone.

speth · 2019-11-13T02:15:52Z

Imho, there are several arguments against trying to eliminate the type field:

Specific tests for nested definitions need to be hard-coded, whereas a simple differentiation by a type field can be easily automated. I.e. for a factory class, it is much easier to create an object based on an assigned type field than to check through various cases to figure out what is actually implemented.

Agreed, it does simplify the factory class implementation.

Speed of parsing for very large mechanisms (10k+ reactions, 1k+ species is not uncommon)

Two things:

First, I did some benchmarking while developing the YAML parsing code, and I've been pretty happy with the performance so far. It's significantly faster than using the XML format, let alone converting CTI to XML. To provide a bit of data, I just ran a comparison for a fairly large mechanism (LLNL butanol mechanism with 431 species and 2551 reactions) and importing in Python with transport_model=None, I get the following results:

Full processing, with caches temporarily disabled (includes reading file, [conversion from cti,] parsing to XML_Node / AnyMap, and creating a Solution object each time):
- CTI: 1.86 s ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- XML: 1.25 s ± 9.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- YAML: 619 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Solution instantiations from cached XML_Node / AnyMap:
- XML: 1.08 s ± 8.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- CTI: 1.08 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- YAML: 82.2 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The takeaway is that creating a Solution from YAML is 2x-10x faster, so I think we can spare a few CPU cycles here if there's a good use case. Of course, this all gets blown away by the time it takes to do the transport property fits (another 5 seconds) if you need to instantiate a transport object.

Second, we have to parse the reaction equation at some point. And if the check that we would
have to do for a reaction is too time consuming, we could rely on an explicit type field -- I don't mean to suggest that we need to identify all reaction types this way.

Omitting the 'type' field takes away from human-readability (if the field is there, it is immediately clear: think about new users of cantera).

I think there are different ways to think about what's more useful to the user. One is that having the type field gives the user a way to track down what it is that Cantera does with a reaction entry, e.g. if I search the code for pressure-dependent-Arrhenius, I will find out how those reactions are handled. This is less helpful for something like electrochemical, though. Another perspective is that it's visual clutter when there is other information in the reaction entry that tells me what kind of reaction it is. I don't know which consideration is more important, and the answer may be reaction-type dependent. For instance, there may be charge transfer reactions that aren't completely obvious (if they don't have an explicit electron participating?) and where writing type: electrochemical would clarify things, while it's pretty obvious that H + O2 + M <=> HO2 + M is a three body reaction.

If a user makes an error in hand-coded reaction specifications, the setup step should fail with an appropriate error message ... so there is nothing to be gained by automatic type detection.

While I agree that we should try to detect any inconsistencies, requiring redundant information to be encoded in the reaction entry just compounds the potential for errors. And I think the setup step for YAML currently does fail for many (I don't want underestimate the creativity of users and say all) of the possible inconsistencies in reaction information.

~~There is no apparent reason why parsing of reaction specifications should require passing of a kinetics manager as an argument.~~ Ok. I see the point.

This is perhaps a bit subtle, but it's impossible to determine the units of the pre-exponential factor, which is needed to do unit conversions, without having the Kinetics object and all of the attached ThermoPhase objects available. This was handled in CTI/XML by doing the unit conversions in ctml_writer, which therefore had to know all of this information, and the XML file could only have the rate constant in Cantera's default units.

ischoegl · 2019-11-13T03:13:08Z

@speth ... thanks for the detailed response! Responding where appropriate ...

First, I did some benchmarking while developing the YAML parsing code

those number are impressive!

Second, we have to parse the reaction equation at some point.

Agreed, but it should be done exactly once. In the current implementation, every surface reaction gets parsed twice due to the omission of an appropriate type field for electrochemistry, i.e.

cantera/src/kinetics/Reaction.cpp

Lines 1047 to 1049 in 719cb3d

    
           // See if this is an electrochemical reaction 
        
           Reaction testReaction(0); 
        
           parseReactionEquation(testReaction, node["equation"], kin);

Imho, not setting an electrochemical type in the YAML specification is an omission that complicates the code. Adopting a type will require a transition with some warnings, but long-term there won't be an issue with historical work-arounds.

I think there are different ways to think about what's more useful to the user. One is that having the type field gives the user a way to track down what it is that Cantera does with a reaction entry, e.g. if I search the code for pressure-dependent-Arrhenius, I will find out how those reactions are handled. This is less helpful for something like electrochemical, though.

In the current implementation, electrochemical is the only type needed, but it should obviously be differentiated further, e.g. butler-volmer, global etc. (see old XML types).

[...] requiring redundant information to be encoded in the reaction entry just compounds the potential for errors.

I am pretty unapologetic about this, as it's better to have repeated failures due to erroneous user input than have Cantera incorrectly guessing what a user had in mind. As the bulk of the cases are machine generated (ck2yaml etc.) potential errors are only occurring for those users who hand-assemble input files. In those cases, the science section is recommended reading.

Based on the discussion, I certainly see the appeal of detection of three-body reactions, etc.. On the other hand, a self-documenting approach (enforce type unless it's a phase default) is imho a much clearer approach. I think it makes more sense to strictly enforce syntax (and throw meaningful warnings/errors) than to implement 'elegant' solutions that are hard to maintain.

speth · 2019-11-13T23:03:28Z

Imho, not setting an electrochemical type in the YAML specification is an omission that complicates the code. Adopting a type will require a transition with some warnings, but long-term there won't be an issue with historical work-arounds.

I don't see why a warning would be required -- the YAML format is not in any stable release, so I think we can make breaking changes as needed.

I am pretty unapologetic about this, as it's better to have repeated failures due to erroneous user input than have Cantera incorrectly guessing what a user had in mind. As the bulk of the cases are machine generated (ck2yaml etc.) potential errors are only occurring for those users who hand-assemble input files. In those cases, the science section is recommended reading.

For the vast majority of user input, we rely on the user to write what they mean, and have no way of checking that that input is what they meant. Obviously, in those cases where we can detect inconsistencies, we should, but I don't think we should try to design the input format to require redundant data just so that we can check for consistency.

enforce type unless it's a phase default

I think this is a simple and logically-consistent option that won't result in too much clutter in input files, while providing the opportunity to simplify reaction parsing. For now, I think this means that the only necessary change to the YAML specification is that electrochemical reactions need to be labeled as type: electrochemical.

ischoegl · 2019-11-13T23:43:36Z

I think this is a simple and logically-consistent option that won't result in too much clutter in input files, while providing the opportunity to simplify reaction parsing. For now, I think this means that the only necessary change to the YAML specification is that electrochemical reactions need to be labeled as type: electrochemical.

Perfect!

ischoegl · 2019-11-28T03:07:07Z

Closing per discussion in #750: automatic detection is preferred over simplicity of parsing. Long-term, it would make sense to avoid duplicate parsing of surface reactions by other means.

ischoegl mentioned this issue Nov 12, 2019

Add Reaction factory #745

Closed

bryanwweber mentioned this issue Nov 12, 2019

Discuss deprecating Butler-Volmer reaction types #749

Closed

bryanwweber added discussion Topics that are being discussed Kinetics labels Nov 12, 2019

ischoegl closed this as completed Nov 28, 2019

ischoegl mentioned this issue Dec 20, 2019

YAML format documentation Cantera/cantera-website#91

Merged

ischoegl mentioned this issue Jul 24, 2020

Broken extract_submechanism.py example #821

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YAML: consistently clarify reaction type #747

YAML: consistently clarify reaction type #747

ischoegl commented Nov 12, 2019 •

edited

Loading

speth commented Nov 12, 2019

ischoegl commented Nov 12, 2019 •

edited

Loading

ischoegl commented Nov 13, 2019

speth commented Nov 13, 2019

ischoegl commented Nov 13, 2019

speth commented Nov 13, 2019

ischoegl commented Nov 13, 2019

ischoegl commented Nov 28, 2019

YAML: consistently clarify reaction type #747

YAML: consistently clarify reaction type #747

Comments

ischoegl commented Nov 12, 2019 • edited Loading

speth commented Nov 12, 2019

ischoegl commented Nov 12, 2019 • edited Loading

ischoegl commented Nov 13, 2019

speth commented Nov 13, 2019

ischoegl commented Nov 13, 2019

speth commented Nov 13, 2019

ischoegl commented Nov 13, 2019

ischoegl commented Nov 28, 2019

ischoegl commented Nov 12, 2019 •

edited

Loading

ischoegl commented Nov 12, 2019 •

edited

Loading