Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safely open / close netCDF files without resource locking #2887

Closed
fujiisoup opened this issue Apr 11, 2019 · 9 comments · Fixed by #2917
Closed

Safely open / close netCDF files without resource locking #2887

fujiisoup opened this issue Apr 11, 2019 · 9 comments · Fixed by #2917

Comments

@fujiisoup
Copy link
Member

fujiisoup commented Apr 11, 2019

Code Sample, a copy-pastable example if possible

(essentially the same to #1629)

Opening netCDF file via xr.open_dataset locks a resource, preventing to write a file with the same name (as pointed out and answered as an expected behavior in #1629).

import xarray as xr
ds = xr.Dataset({'var': ('x', [0, 1, 2])})
ds.to_netcdf('test.nc')

ds_read = xr.open_dataset('test.nc')
ds.to_netcdf('test.nc')   # -> PermissionError
ds_read = xr.open_dataset('test.nc').load()
ds.to_netcdf('test.nc')   # -> PermissionError
ds_read = xr.open_dataset('test.nc').load()
ds_read.close()
ds.to_netcdf('test.nc')  # no error

Problem description

Another program cannot write the same netCDF file that xarray has opened, unless close method is not called.


-- EDIT --

close() method does not return the object, thus it cannot be put in the chain call, such as

some_function(xr.open_dataset('test.nc').close())

It is understandable when we do not want to load the entire file into the memory.
However, sometimes I want to read the file that will be updated soon by another program.
Also, I think that many users who are not accustomed to netCDF may expect this behavior (as np.loadtxt does) and will be surprised after getting PermissionError.

I think it would be nice to have an option such as load_all=True or even make it a default?

Expected Output

No error

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.1 (default, Oct 23 2018, 19:19:42) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-1035-oem machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.1

xarray: 0.12.0+11.g7d0e895f.dirty
pandas: 0.23.4
numpy: 1.15.4
scipy: 1.2.0
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.2.1
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 1.0.0
distributed: 1.25.0
matplotlib: 2.2.2
cartopy: None
seaborn: 0.9.0
setuptools: 40.5.0
pip: 18.1
conda: None
pytest: 4.0.1
IPython: 7.1.1
sphinx: 1.8.2

@dcherian
Copy link
Contributor

what is the recommended code pattern for reading a file; adding a few variables; and then writing back to the same file?

@shoyer
Copy link
Member

shoyer commented Apr 11, 2019

This pattern should work:

with xr.open_dataset('test.nc') as ds:
    ds.load()
ds.to_netcdf('test.nc')

@dcherian
Copy link
Contributor

OK. But what about the usual scientist workflow where you work in multiple cells

ds.load()

# do computation

next cell:

do more computation

...

ds.close()  # is this right?
ds.to_netcdf('test.nc')

I wonder if we should add autoclose as a kwarg for to_netcdf. If autoclose=True, remove that file from the LRU cache if present, and then write to file.

@shoyer
Copy link
Member

shoyer commented Apr 12, 2019

I think this is more of a limitation of netCDF-C / HDF5 than xarray. For example, this example works if you use SciPy's netCDF reader/writer:

import xarray as xr
ds = xr.Dataset({'var': ('x', [0, 1, 2])})
ds.to_netcdf('test.nc', engine='scipy')

ds_read = xr.open_dataset('test.nc', engine='scipy')
ds.to_netcdf('test.nc', engine='scipy')

@fujiisoup
Copy link
Member Author

I didn't notice that with statement can be used for open_dataset.
Thanks for the comments.

I personally want to a simpler function for the daily (not so big data) analysis without caring open / close stuff.
(and probably also for new users)
Is it too much if we add load_dataset function that works similar to np.load_txt in terms of the file handling?

@shoyer
Copy link
Member

shoyer commented Apr 13, 2019

Just to clarify, load_dataset would be equivalent to:

def load_dataset(*args, **kwargs):
    with xarray.open_dataset(*args, **kwargs) as ds:
        return ds.load()

This also seems pretty reasonable to me. I’ve written a version of this utility function a handful of times, so I at least would find it useful.

Would we also want load_dataarray?

@dnowacki-usgs
Copy link
Contributor

I used to use the autoclose=True argument to open_dataset() all the time to avoid this issue. Then it was deprecated with the LRU introduction, and I now do a two-line ds = open_dataset().load() and then ds.close(). Just adding my use-case and what I found to be a regression in terms of usability. Something like load_dataset() would be helpful but I wonder if adding an argument to the existing open_dataset() would be better.

@fujiisoup
Copy link
Member Author

Just to clarify, load_dataset would be equivalent to:

def load_dataset(*args, **kwargs):
with xarray.open_dataset(*args, **kwargs) as ds:
return ds.load()

Yes, that is actually in my mind.

I added a tag good first issue for this.
I love to implement this by myself but I don't think I have enough free time near future...

dnowacki-usgs added a commit to dnowacki-usgs/xarray that referenced this issue Apr 25, 2019
BUG: Fixes pydata#2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray
TST: Add tests for sequentially opening and writing to files using new functions
DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring
@dnowacki-usgs
Copy link
Contributor

Took a stab at implementing these functions.

dnowacki-usgs added a commit to dnowacki-usgs/xarray that referenced this issue May 16, 2019
BUG: Fixes pydata#2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray
TST: Add tests for sequentially opening and writing to files using new functions
DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring
dnowacki-usgs added a commit to dnowacki-usgs/xarray that referenced this issue May 16, 2019
BUG: Fixes pydata#2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray
TST: Add tests for sequentially opening and writing to files using new functions
DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring

Update docstrings and check for cache in kwargs

Undeprecate load_dataset

Add to api.rst, fix whats-new.rst typo, raise error instead of warning
shoyer pushed a commit that referenced this issue May 16, 2019
* Partial fix for #2841 to improve formatting.

Updates formatting to use .format() instead of % operator. Changed all instances of % to .format() and added test for using tuple as key, which errored using % operator.

* Revert "Partial fix for #2841 to improve formatting."

This reverts commit f17f3ad.

* Implement load_dataset() and load_dataarray()

BUG: Fixes #2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray
TST: Add tests for sequentially opening and writing to files using new functions
DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring

Update docstrings and check for cache in kwargs

Undeprecate load_dataset

Add to api.rst, fix whats-new.rst typo, raise error instead of warning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants