Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr support in NetCDF-Fortran for "cloud-native" model simulations? #209

Open
JiaweiZhuang opened this issue Jan 11, 2020 · 17 comments
Open
Assignees
Milestone

Comments

@JiaweiZhuang
Copy link

JiaweiZhuang commented Jan 11, 2020

First thanks for all the great work on NetCDF!

I have a research project that will significantly benefit from NetCDF-Zarr. I recently saw a tweet from @jhamman that "pre-alpha will be available early in 2020". I also notice some Zarr-related updates like Unidata/netcdf-c#1259. I am excited to test the new Zarr capability with real models and give feedbacks. Is it possible to get a preliminary version to play with around Feb-March? Or is it still too early to say?

More details about the use case: My workflow involves running Fortran-based models in a cloud-native container environment, for example AWS Batch or Kubernetes cluster. The main benefit is to scale out ensemble runs quickly via AWS Batch Array Jobs or Kubernetes Parallel Jobs . Similar to what Pangeo does, but here for Fortran models instead of Dask workers. However I/O is a major pain in a container environment (need to deal with Persistent Volumes for example). It is actually possible to mount a Lustre to Kubernetes, but the workflow will be much, much simpler if the model can directly read/write with S3.

@WardF WardF self-assigned this Jan 15, 2020
@WardF
Copy link
Member

WardF commented Jan 15, 2020

We are hoping to have a version out in the next month or two, so the Feb-March timeframe is perfectly reasonable!

@WardF WardF added this to the Future milestone Jan 15, 2020
@JiaweiZhuang
Copy link
Author

Just to check -- is it possible to get a testing version this month?

@DennisHeimbigner
Copy link
Collaborator

In fortran no. In C maybe. But we still need an S3 driver. We are currently
using local storage formats for testing.

@DennisHeimbigner
Copy link
Collaborator

I take that back. Once the C version is working, it should also work with any
language that used the C library. If, that is, the language will no interfere
with the use of URLs as path names for nc_open and nc_create.

@rsignell-usgs
Copy link

rsignell-usgs commented Apr 8, 2021

@DennisHeimbigner and @WardF, do you think it would be possible to write Zarr from FORTRAN using the new 4.8.0 NetCDF C library with this approach @ocefpaf pointed me toward:
https://riptutorial.com/fortran/example/7149/calling-c-from-fortran

@DennisHeimbigner
Copy link
Collaborator

It should be possible assuming that the nf_open path can take
a URL string. I think one of our interns tested this over the summer and
I believe it worked.

@rsignell-usgs
Copy link

Cool! Which intern was it? It would be nice to find out what they discovered.

@rsignell-usgs
Copy link

@DennisHeimbigner pingity ping ping

@edwardhartnett
Copy link
Contributor

I just built netcdf-c-4.8.0 with netcdf-fortran-4.5.3, also using MPI for parallelIO.

All tests passed.

I had to use:
FCFLAGS='-fallow-argument-mismatch -g -Wall' FFLAGS='-fallow-argument-mismatch -g -Wall'

The fortran library just hands the path over to the C library, so Zarr stuff should work transparently to Fortran, just as DAP does.

@rsignell-usgs
Copy link

@edhartnett , you had to use "-g"? So not ready for prime time (e.g. "-O3" yet)?

What I'd like to do is write Zarr from our ocean modeling simulations that would look exactly like what xarray produces...

@edhartnett
Copy link
Contributor

edhartnett commented May 5, 2021 via email

@rsignell-usgs
Copy link

rsignell-usgs commented May 5, 2021

@edhartnett, do you have a sample fortran program that creates a zarr dataset you could share?

@edwardhartnett
Copy link
Contributor

No, sorry. I haven't tried Zarr.

@rsignell-usgs
Copy link

rsignell-usgs commented May 5, 2021

@edhartnett, Ah bummer. But it should now be possible for me to do that, right?
Ooh, maybe I could use "ncgen -f" to get a sample code.

@DennisHeimbigner
Copy link
Collaborator

Take any simple Fortran program that creates a simple netcdf4 dataset.
Suppose it creates a file called "simple.nc".
Replace the call of nf_create("simple.nc",NF_NETCDF4,ncid)
with nf_create("file://simple.zarr#mode=zarr,file",NF_NETCDF4,ncid)
That should create directory called simple.zarr that is in pure zarr format.
You can replace the mode=zarr,file with mode=nczarr,file if you want to create
with NCZarr format.

@rsignell-usgs
Copy link

@DennisHeimbigner, okay, I'll try that! And mode=nczarr,xarray,file if we want to create xarray-compatible zarr, right?

@DennisHeimbigner
Copy link
Collaborator

Depends. If you use the github master, then yes, mode=xarray,file should produce
pure zarr with the xarray convention. If you use 4.8.0, then it does not xarray support.
Please let me know if you have problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants