-
-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What to do about str
columns in SolutionArray
#896
Comments
Thanks again for investigating @ischoegl. I think it may be appropriate to handle CSV and HDF differently here. In particular, my understanding is that CSV has a much easier time representing heterogeneous data types, so IMO importing strings from CSV should be supported. It's actually a pretty small change to def read_csv(self, filename):
"""
Read a CSV file named *filename* and restore data to the `SolutionArray`
using `restore_data`. This method allows for recreation of data
previously exported by `write_csv`.
"""
data = np.genfromtxt(filename, delimiter=',', dtype=None, names=True, encoding=None)
data_dict = OrderedDict({l: data[l] for l in data.dtype.names})
self.restore_data(data_dict) Specifying Since HDF may have more trouble with the heterogeneous data, it might make sense to throw an exception if strings are written to/read from HDF, but I didn't look into how this could be supported. The lack of a
|
@bryanwweber ... thanks for the feedback. And 👍 for the CSV import suggestion! Regarding, There may be a way to even fix HDF: a quick search resulted in storing-a-list-of-strings-to-a-hdf5-dataset-from-python. So allowing for |
As an aside: I’ve seen that the minimum requirements for |
@ischoegl It depends on whether we want to continue supporting NumPy from the Ubuntu 16.04 repositories, which is up for discussion (which should probably be at #838 (comment)) |
Alright. Fix is proposed in #900 for both CSV and HDF. I also added |
Totally unrelated, but congrats on filing issue #900 😃 |
Problem description
The current implementation of
SolutionArray
allows forextra
columns containingstr
(see discussion in #838), which may or may not be an intended use case: built-in attributes are almost exclusively numeric, with the notable exception ofstate_of_matter
. Other implementations did not anticipate non-numeric data, e.g. CSV import and HDF export/import, sostr
support either needs to be deprecated/disabled or consistently implemented.One interpretation is that
str
input was never intended, but also not explicitly checked for. I.e. it just happens to be supported bynumpy
(similar to sequences not being checked, see #895). In this case, the most appropriate resolution may be to catch and deprecatestr
columns, while warning that it is not fully supported for CSV/HDF.Steps to reproduce
Behavior
The pre-existing
write_csv
supportsstr
, the newly introducedread_csv
fails to import it, andh5py
has issues.I am filing this as a bug report as this presumably needs to be fixed prior to the release of 2.5.
System information
Additional context
#838, #895
The text was updated successfully, but these errors were encountered: