Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add writing complex data to docs #3297

Closed
DerWeh opened this issue Sep 9, 2019 · 10 comments · Fixed by #9509
Closed

Add writing complex data to docs #3297

DerWeh opened this issue Sep 9, 2019 · 10 comments · Fixed by #9509

Comments

@DerWeh
Copy link

DerWeh commented Sep 9, 2019

Is there a recommended way how to save complex data? I found some option on stack overflow, but they don't seem to satisfactory.

The main point of having self-describing data which I write as binary data, is that people can just read the data, and don't have to worry how to interpret it. Thus, the only viable option to me would be using engine='h5netcdf'.

On the other hand, if something like adding an axis would be done internally by xarray it would be also OK, as everyone could read my data using the library.

@crusaderky
Copy link
Contributor

My 2 cents:

For a proper resolution, I'd rather have the topic discussed with the NetCDF specs maintainers, so that NetCDF can just be expanded to support the same structure like HDF5. Once the format is standard, it would then be a trivial PR to h5netcdf to suppress the warning. We've already gone through the exact same process for compression algorithms other than gzip. Adding the functionality to the NetCDF C library and the python wrapper would be a completely different order of magnitude of work.

Another good alternative is to use h5netcdf forcing the malformation through, and just call the file .h5 instead of .nc 😉

If you really need to interact with (non-Python) people that are stuck on the NetCDF C library, and who for reasons I can't imagine can't switch to the HDF5 C library, I think writing two bespoke pre/postprocess functions in your code to add a dimension is the best approach.

@DerWeh
Copy link
Author

DerWeh commented Sep 9, 2019

I agree that including it in NetCDF is the 'most sane' approach. I don't really know how much work it is, expanding the standard.

To be honest, I don't really care about NetCDF, for me xarray is just an incredible good way to make code more stable and readable (though it still has several usability issues). In my community everyone uses HDF5 anyway, so dropping compatibility is no big issue. I just want a way to persist data as it is and conveniently load it for plotting and post processing.

I would still encourage you to push saving of complex data. In most fields people use complex data and it is hard to convince them that they benefit from this great library, if saving simple data takes complicated keyword arguments and annoys you with warnings compared to a simple np.savez on regular ndarrays.

@dcherian
Copy link
Contributor

dcherian commented Sep 9, 2019

I think the answer here is to use h5netcdf until a proper hdf5 backend is created.

It would be nice to add this to the documentation and mention h5netcdf more generally under https://xarray.pydata.org/en/stable/io.html . @DerWeh Are you up for sending in a PR?

@dcherian dcherian changed the title saving complex data Add writing complex data to docs Sep 9, 2019
@shoyer
Copy link
Member

shoyer commented Sep 10, 2019

It might make sense to implement engine=“hdf5” as an alias for engine=“h5netcdf” with invalid_netcdf=True. It would certainly be a more ergonomic API.

@ulijh
Copy link
Contributor

ulijh commented Sep 10, 2019

I am in the exact same situation. @DerWeh with the current master you can do

da.to_netcdf("complex.nc", engine="h5netcdf", invalid_netcdf=True)

which works for me until there is engine="hdf5" or may be a method da.to_hdf()?

@shoyer
Copy link
Member

shoyer commented Sep 10, 2019

I opened an issue to discuss this in the CF convention issue tracker -- let's see what they think: cf-convention/discuss#369

@DerWeh
Copy link
Author

DerWeh commented Sep 12, 2019

Sorry for the slow response, I have little time at the moment. The option invalid_netcdf=True is not yet in the latest release, is it? I get an TypeError.
I would have to use a manually installed version of xarray to use it, right?

@shoyer
Copy link
Member

shoyer commented Sep 12, 2019

Yes, this will be in the next release. (Which will hopefully be very soon!)

@ZedThree
Copy link
Contributor

ZedThree commented Aug 2, 2023

I'd like to revive the request to add support for complex numbers to xarray's netcdf4 engine. I recently added the ability to netcdf-C to read the compound and enum types that h5netcdf creates (via h5py). I believe it will be in the next release. Once it's available, it will allow netcdf-C to read files created by h5netcdf, for example with current netcdf-C main:

with h5netcdf.File("h5netcdf_test.h5", "w", invalid_netcdf=True) as h5f:
    v = h5f.create_variable("complex_var", dtype=complex)
    v[()] = 1j

with netCDF4.Dataset("h5netcdf_test.h5", "r") as f:
    print(f["complex_var"][()])

# (0., 1.)

However note that it doesn't actually yet convert to a complex. There is a reluctance from netcdf to add a basic complex type like NC_COMPLEX before HDF5 adds something -- and unfortunately nothing has happened there for more than a decade. Even if HDF5 does add complex types, we'd still have to wait for netcdf to adopt them too.

The other thing to mention is that even when (if) netcdf gets support for complex types, there will still be files using other representations. For example some projects use an extra dimension for the real/imaginary components.

H5py handles complex numbers by providing some configuration for the names of the real/imaginary fields, defaulting to ('r', 'i').

So what I propose is that xarray automatically convert compound datatypes of the form that h5netcdf/h5py use to complex, with some configuration option for the field names. On top of this, we could also convert the other common representation using an extra dimension.
If netcdf does end up adding their own built-in complex type, this may be of a different form and so wouldn't remove the need to do some conversion in xarray.

This would remove a very annoying and painful roadbump for those of using complex numbers and xarray.

@Dave-Allured
Copy link

Please see this current proposal to add HDF5 complex data types.
HDFGroup/hdf5#3339

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants