-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Porting NetCDF I/O to Julia #3
Comments
The most pressing issues with NetCDF_jll will hopefully go away with JuliaPackaging/Yggdrasil#5251. But yeah, due to these issues I've also thought about porting parts of netCDF to julia, though never attempting anything. Do you need netCDF 3 or 4? 3 is simpler, for instance SciPy does this in 1000 lines of code: https://github.com/scipy/scipy/blob/v1.8.1/scipy/io/_netcdf.py 4 is based on HDF5, and is much more complex. Though there is https://github.com/JuliaIO/JLD2.jl which has quite a good subset of HDF5. It could be interesting to see how much is needed to use JLD2 to do (a subset of) netCDF 4 I/O. A pure julia alternative that already exists is https://github.com/JuliaIO/Zarr.jl/. |
Awesome! 🤞
As I understand it, we need netCDF 4, since our
That would be interesting. This line in the JLD2 docs concerns me:
We use netCDF for standardized serialization and archival of outputs, which should be readable and usable across languages, so it's essential that we be able to read netCDF 4 files created by the Python ArviZ package (written to netcdf 4 using xarray) and that we write files that Python ArviZ can read.
Cool, might be a nice alternative to provide in addition to netCDF. Can it just write single arrays or also hierarchical data structures containing such arrays? e.g. at our lowest level we have multidimensional arrays with named dimensions, but a higher level ties them together into groups with shared named dimensions, and an even higher level ties the groups together. |
Note that being able to read netCDF4 files written by Python ArviZ/xarray is already a much smaller subset than "any valid HDF5 file found out in the wild". And I would expect netCDF files written through JLD2 to be fine for xarray to read. But still, it would need testing. I spoke to one of the JLD2 devs on Slack a while back and I think they thought it was worth a shot for sure. Zarr indeed handles groups and shared named dimensions as well. There even something called NCZarr https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in. |
It seems that with current functionality of JLD2, netCDF files written by ArviZ are not readable. julia> using ArviZ, JLD2
julia> idata = load_arviz_data("centered_eight");
julia> to_netcdf(idata, "tmp.jld2")
"tmp.jld2"
julia> jldopen("tmp.jld2")
ERROR: ArgumentError: "/home/sethaxen/projects/ArviZ.jl/tmp.jld2" is not a JLD2 file I'd have to look more into how JLD2 writes its data to test the other direction. |
Ha yeah I tried commenting out the header checks, and |
Here is a prof of concept of a pure julia NetCDF 3 reader based on pupynere (also in SciPy): https://github.com/Alexander-Barth/NetCDF3/blob/main/test/runtests.jl#L42 Surprisingly, for reading a whole variable julia is faster than the C library (about 4 times faster). |
Haha that is awesome. I can delete my mostly empty NetCDF3 that I made yesterday and clone this! |
haha, I don't know, mine is mostly empty too :-) In any case, NetCDF3 can be put on JuliaGeo et JuliaIO too (now or later). pupynere/SciPy does not allow to read/write a subset of a NetCDF variable which might be not so trivial to do (but still doable). Another "format" in NetCDF that is quite common is OPENDAP (DAP2 and DAP4 is supported in libnetcdf, but I never came accross a DAP4 server). |
It seems that most or all of the linked issues will be resolved in the next few days. So we can begin work on a native Julia
|
from_netcdf
andto_netcdf
being ported to Julia is an essential step toInferenceData
being truly stand-alone (see arviz-devs/ArviZ.jl#207).2 Julia packages provide NetCDF I/O: NCDatasets.jl and NetCDF.jl. Both are wrappers around NetCDF_jll.jl, which is an BinaryBuilder-generated binary for the NetCDF C package.
Currently there are some issues with the NetCDF_jll binary on Windows that would need to be resolved for us to use either of these packages. Descriptions of the issues can be found here:
The text was updated successfully, but these errors were encountered: