Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CSV file import for C++ SolutionArray #163

Open
1 of 2 tasks
ischoegl opened this issue Mar 21, 2023 · 6 comments
Open
1 of 2 tasks

Implement CSV file import for C++ SolutionArray #163

ischoegl opened this issue Mar 21, 2023 · 6 comments
Labels
feature-request New feature request help wanted Extra attention is needed

Comments

@ischoegl
Copy link
Member

ischoegl commented Mar 21, 2023

Abstract

Recent work added HDF support to Sim1D::save/restore (Cantera/cantera#1385) and implemented SolutionArray::save/restore for HDF and YAML (Cantera/cantera#1426). On the back-end, SolutionArray handles file IO in both cases. As those methods are implemented in the C++ layer, they are portable across all API's.

Adding CSV support to SolutionArray::save/restore in C++ to replace Python's SolutionArray.write_csv/read_csv is a logical extension. It can build on the existing infrastructure, and would be a good way of handling CSV support in a consistent way - which would replace the historically grown patchwork of dissimilar approaches used at the moment. One additional benefit would be to resolve Cantera/cantera#1372.

Motivation

Describe the need for the proposed change:

  • What problem is it trying to solve? ... streamline CSV handling
  • Who is affected by the change? ... anyone preferring CSV to nominally superior YAML/HDF formats
  • Why is this a good solution? ... replace patchwork of dissimilar approaches

Possible Solutions

Create versions of SolutionArray::readEntry/writeEntry that handle CSV. While writing is straight-forward, reading CSV will need the implementation of a suitable parser. Per Cantera/cantera#1372 (comment) by @speth

[...] the C++ standard library now includes <regex> (which we use elsewhere) and has I believe essentially the same API as the Boost version. See https://en.cppreference.com/w/cpp/regex. Anything available in C++14 or older is fair game as far as Cantera is concerned.

References

@ischoegl
Copy link
Member Author

ischoegl commented Jun 21, 2023

Regarding parsing of CSV files in C++, here are some preliminary findings

  • Suitable regex expression on SO: regex-to-split-a-csv
  • Regex tool to verify syntax: regex101
  • Unfortunately, C++ <regex> mostly uses EMCAScript, which does not support conditional matching

@bryanwweber
Copy link
Member

I feel like there has to be something in Boost to do this... Implementing a csv parser from scratch seems overkill 🤔

@ischoegl
Copy link
Member Author

ischoegl commented Jun 21, 2023

I feel like there has to be something in Boost to do this... Implementing a csv parser from scratch seems overkill 🤔

Wouldn't be too hard if this regex were supported by C++'s <regex>. It may, however, be supported by <boost/regex.hpp> ... lost most of my appetite after spending more time than what seemed necessary trying to figure out how to translate the conditional to EMCAScript.

@speth
Copy link
Member

speth commented Jun 21, 2023

Not sure it would even resolve the problem, but I wanted to add a word of caution. boost/regex.hpp is a compiled part of Boost, which is something we've been avoiding a dependency on, due to some of the complications involved in linking to those.

@ischoegl
Copy link
Member Author

ischoegl commented Jun 21, 2023

Not sure it would even resolve the problem, but I wanted to add a word of caution. boost/regex.hpp is a compiled part of Boost, which is something we've been avoiding a dependency on, due to some of the complications involved in linking to those.

Too bad. I just confirmed that <boost/regex.hpp> would indeed resolve the problem 😢

PS: this is how to get the header line after opening the file ...

string line;
std::getline(file, line);

boost::regex rgx(
    "(?:^|,)(?=[^\"]|(\")?)\"?((?(1)[^\"]*|[^,\"]*))\"?(?=,|$)");
vector<string> labels;
auto line_begin = boost::sregex_iterator(line.begin(), line.end(), rgx);
auto line_end = boost::sregex_iterator();
for (boost::sregex_iterator item = line_begin; item != line_end; ++item) {
    boost::smatch match = *item;
    labels.push_back(match.str(2));
}

The syntax would be the same for <regex>, but the capturing string doesn't work.

@speth
Copy link
Member

speth commented Jun 21, 2023

We could vendor this single file, header-only CSV reader: https://github.com/ben-strasser/fast-cpp-csv-parser, or something similar.

@ischoegl ischoegl changed the title Implement CSV file IO for C++ SolutionArray Implement CSV file import for C++ SolutionArray Jun 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature request help wanted Extra attention is needed
Projects
No open projects
Development

No branches or pull requests

3 participants