-
-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to convert CTML to YAML input format #693
Conversation
Questions about this script:
|
To-do list
|
@ischoegl Yes, but to the extent that we can avoid such questions, it would be preferable. I guess the question is really, how likely is it that a) a user has a hand-written/hand-edited CTML file (since the files converted by |
Given that for files generated from CK->CTI->XML, this is the note from the thermo data entry, I think this belongs with the thermo entry and not at the species level. (In
I think it's ok if an invalid input results in invalid output. You could potentially import the input file and use that to catch errors, but that would require the Python module to be present, and gets a little tricky if you have input files with surface definitions.
No, an exponent can't have units. If units for b were specified in the XML file, they would be silently ignored by the current parser. I don't think you need to worry about it. |
My concern is more in the conversion script crashing out because, e.g., the text of a node that's supposed to be a float isn't and raises a |
Hmmm... adding the
|
bcf03a3
to
29904a1
Compare
Codecov Report
@@ Coverage Diff @@
## master #693 +/- ##
==========================================
+ Coverage 70.82% 71.35% +0.52%
==========================================
Files 372 372
Lines 43745 43758 +13
==========================================
+ Hits 30983 31223 +240
+ Misses 12762 12535 -227
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #693 +/- ##
=======================================
Coverage 70.63% 70.63%
=======================================
Files 372 372
Lines 43567 43567
=======================================
Hits 30773 30773
Misses 12794 12794
Continue to review full report at Codecov.
|
29904a1
to
bde88fe
Compare
I'm confused about the implementation of the
and the species
The corresponding XML entries are
and the species
In the YAML, the density is included with the species definition, while in the XML it is included with the phase definition. This is inconsistent with the
I'm wondering why this difference exists and whether it makes sense to resolve it. |
Another question. Does the YAML format support
|
@speth At this point, I've implemented all the equivalents to the I'm pretty good with how the architecture has worked out, but another opinion would be great. Especially for the activity coefficients or the constant density phase/species stuff where information from the phase node is required to build the species node.
I believe @ischoegl deleted his comment that noted that many people probably rely on those extra phase definitions, so removing them would not be very nice, or at least, make the advice "replace |
I can't find any tests of |
In general, with the YAML format I've tried to move properties of individual species out of the phase entry and into the species entries. With the
They aren't currently read. I guess they could be, but they aren't used anywhere -- there are very few places where temperature limits are enforced in Cantera (for better or worse).
I don't know how I missed this 😲 No, there are currently no explicit tests of
I don't doubt that people are using these definitions, but I don't really want to keep treating the GRI 3.0 mechanism in a special way that is distinct from what is done with every other mechanism. |
Related to #707, I'm not sure how to specify multiple options to the cantera/src/kinetics/KineticsFactory.cpp Lines 99 to 101 in 265a186
skip-undeclared-third-bodies should be a phase-level option, but this feels more like something that should be under the reactions key, provided that it can be handled on a per-reaction-source level.
Here is a sample XML entry that contains this option (this is generated by
When converting this, should it be an error if both sources do not include skipping the undeclared third bodies? I'm leaning towards no, because deleting the option from the second array does not result in an error when loading this XML file with |
OK, so I should handle the
I have two conflicting thoughts about this. One is that it is useful to keep this data, in case someone wants to use this file for something else. The other is that it is confusing to include this data and then not use it in Cantera anywhere. 🤷♂
The
I get that, and I still think we should keep these phase definitions in the "official" distributed version. Breaking people's code is not something we should do lightly, especially when we're trying to convince them to use a new format with unclear (immediate) benefits to them. If we do want to remove these, I think we should figure out a way to go through a deprecation cycle and warn people before their code just breaks. Are there other reasons you're concerned aside from not wanting to make an exception for GRI-3.0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is still in-progress, but I figured it was worth sharing a few comments on the implementation now.
Just used this to convert a NASA9 xml file, it worked quite nicely! A default output name may be useful, i.e. |
@speth How does YAML handle the |
In the YAML format, I tried to put all species-specific information with the species, so the simple answer is that it doesn't. I don't think there's a compelling reason to have two ways of specifying this information in the input file. In all of the Debye Huckel examples, the phase-level node is only occasionally used, and when it is, it's just a duplicate of the species-node value. |
Ok that makes sense. The code comments in that section of the init method say that the phase level node is only used to override the species specific information. Maybe what I'll do is move it from the phase to the species if the equivalent node is not already present in the species. |
6faf96f
to
64a6e77
Compare
The changes now match all of the options in newReaction, including synonyms. Butler-Volmer parameters are deprecated so the user is warned. Simplify processing interface type reactions into one method, since the electrochem node is optional.
Add a docstring for get_float_or_units. Use the clean_node_text function.
As demonstrated in Cantera#683, sometimes the zero values are intentional and meaningful. Rework the SpeciesTransport class to not use this function.
The Python Expat parser requires that the <?xml version...> tag occur as the first characters in the file, even before any blank space, so lstrip is used to remove any whitespace. In addition, raw & characters are replaced with their escaped version.
If a reactionData or speciesData node has a duplicated id attribute, combine the duplicate sections together. If duplicate reaction id attributes or species names are found, either warn or raise an error.
Add intersphinx extension to auto-link to Python docs. Bump KaTeX version.
A quantity is a value that can be either just a number (i.e., a float) or a number plus a unit (i.e., a str).
Add parsing of arguments passed when the script is run from the command line. A new function is introduced to parse the arguments and is the default function when the file is executed as a script. Update the documentation to add the new function. Add a new argument to convert that takes a string containing the content of a CTML file. This allows convert to be used from within the Python interpreter more easily.
Node tag names are now quoted for clarity.
The Redlich-Kwong phase-thermo model doesn't require the activityCoefficients node. If not specified, values will be found from critProperties.xml
ctml_writer.py maintains global state between calling convert(). This change commits the resulting XML file into the test/data directory rather than converting during the test.
The index from the enumerate is not actually used anywhere.
There isn't a reason to convert gri30 every time the class is instantiated. Move the conversion into the test_gri30 function.
Better conform to PEP-8.
The YAML converters are now installed as scripts from the Python package. The zip_safe flag is needed to ensure that data files are accessible by the package. This was added to resolve a build problem with Python 3.8. I'm not sure why it hasn't been a problem until now.
Bump the minimum Python version to 3.5 to support features in ctml2yaml.
Use argparse to have the CLI match ctml2yaml.py
b93075d
to
6508179
Compare
The script in the PR will convert CTML to YAML input format. The first version here works for GRI-3.0 and I'm posting it to get feedback on the approach I'm taking. Obviously, there is still a lot to be done (I'll try to generate a checklist at some point).
I've attached the current
gri30.yaml
file to this message as a.txt
file.gri30.txt
Some notes about the YAML format I'm noticing as I do this:
note
field of a species entry should be at thespecies
level, not thethermo
level? This would match the CTI and CTML locations of this data, but I do believe it is extracted from the NASA thermo entry, so maybe it does belong in the thermo field. (ctml1yaml.py
currently puts it at the species level, while I believeck2yaml.py
puts it at thethermo
level)gri30.yaml
that is generated only has one phase entry for mixture averaged transport. This is different fromgri30.cti
andgri30.xml
that have 3 phase entries.