Safely open / close netCDF files without resource locking #2887

fujiisoup · 2019-04-11T13:19:45Z

Code Sample, a copy-pastable example if possible

(essentially the same to #1629)

Opening netCDF file via xr.open_dataset locks a resource, preventing to write a file with the same name (as pointed out and answered as an expected behavior in #1629).

import xarray as xr
ds = xr.Dataset({'var': ('x', [0, 1, 2])})
ds.to_netcdf('test.nc')

ds_read = xr.open_dataset('test.nc')
ds.to_netcdf('test.nc')   # -> PermissionError

ds_read = xr.open_dataset('test.nc').load()
ds.to_netcdf('test.nc')   # -> PermissionError

ds_read = xr.open_dataset('test.nc').load()
ds_read.close()
ds.to_netcdf('test.nc')  # no error

Problem description

Another program cannot write the same netCDF file that xarray has opened, unless close method is not called.

-- EDIT --

close() method does not return the object, thus it cannot be put in the chain call, such as

some_function(xr.open_dataset('test.nc').close())

It is understandable when we do not want to load the entire file into the memory.
However, sometimes I want to read the file that will be updated soon by another program.
Also, I think that many users who are not accustomed to netCDF may expect this behavior (as np.loadtxt does) and will be surprised after getting PermissionError.

I think it would be nice to have an option such as load_all=True or even make it a default?

Expected Output

No error

Output of `xr.show_versions()`

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.1 (default, Oct 23 2018, 19:19:42) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-1035-oem machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.1

xarray: 0.12.0+11.g7d0e895f.dirty
pandas: 0.23.4
numpy: 1.15.4
scipy: 1.2.0
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.2.1
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 1.0.0
distributed: 1.25.0
matplotlib: 2.2.2
cartopy: None
seaborn: 0.9.0
setuptools: 40.5.0
pip: 18.1
conda: None
pytest: 4.0.1
IPython: 7.1.1
sphinx: 1.8.2

The text was updated successfully, but these errors were encountered:

dcherian · 2019-04-11T22:58:28Z

what is the recommended code pattern for reading a file; adding a few variables; and then writing back to the same file?

shoyer · 2019-04-11T22:58:37Z

This pattern should work:

with xr.open_dataset('test.nc') as ds:
    ds.load()
ds.to_netcdf('test.nc')

dcherian · 2019-04-12T15:36:39Z

OK. But what about the usual scientist workflow where you work in multiple cells

ds.load()

# do computation

next cell:

do more computation

...

ds.close()  # is this right?
ds.to_netcdf('test.nc')

I wonder if we should add autoclose as a kwarg for to_netcdf. If autoclose=True, remove that file from the LRU cache if present, and then write to file.

shoyer · 2019-04-12T16:26:17Z

I think this is more of a limitation of netCDF-C / HDF5 than xarray. For example, this example works if you use SciPy's netCDF reader/writer:

import xarray as xr
ds = xr.Dataset({'var': ('x', [0, 1, 2])})
ds.to_netcdf('test.nc', engine='scipy')

ds_read = xr.open_dataset('test.nc', engine='scipy')
ds.to_netcdf('test.nc', engine='scipy')

fujiisoup · 2019-04-13T07:05:24Z

I didn't notice that with statement can be used for open_dataset.
Thanks for the comments.

I personally want to a simpler function for the daily (not so big data) analysis without caring open / close stuff.
(and probably also for new users)
Is it too much if we add load_dataset function that works similar to np.load_txt in terms of the file handling?

shoyer · 2019-04-13T15:10:22Z

Just to clarify, load_dataset would be equivalent to:

def load_dataset(*args, **kwargs):
    with xarray.open_dataset(*args, **kwargs) as ds:
        return ds.load()

This also seems pretty reasonable to me. I’ve written a version of this utility function a handful of times, so I at least would find it useful.

Would we also want load_dataarray?

dnowacki-usgs · 2019-04-14T16:32:21Z

I used to use the autoclose=True argument to open_dataset() all the time to avoid this issue. Then it was deprecated with the LRU introduction, and I now do a two-line ds = open_dataset().load() and then ds.close(). Just adding my use-case and what I found to be a regression in terms of usability. Something like load_dataset() would be helpful but I wonder if adding an argument to the existing open_dataset() would be better.

fujiisoup · 2019-04-23T20:30:02Z

Just to clarify, load_dataset would be equivalent to:

def load_dataset(*args, **kwargs):
with xarray.open_dataset(*args, **kwargs) as ds:
return ds.load()

Yes, that is actually in my mind.

I added a tag good first issue for this.
I love to implement this by myself but I don't think I have enough free time near future...

@shoyer

BUG: Fixes pydata#2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray TST: Add tests for sequentially opening and writing to files using new functions DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring

dnowacki-usgs · 2019-04-25T03:57:59Z

Took a stab at implementing these functions.

@shoyer

BUG: Fixes pydata#2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray TST: Add tests for sequentially opening and writing to files using new functions DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring

@shoyer

BUG: Fixes pydata#2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray TST: Add tests for sequentially opening and writing to files using new functions DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring Update docstrings and check for cache in kwargs Undeprecate load_dataset Add to api.rst, fix whats-new.rst typo, raise error instead of warning

@shoyer

* Partial fix for #2841 to improve formatting. Updates formatting to use .format() instead of % operator. Changed all instances of % to .format() and added test for using tuple as key, which errored using % operator. * Revert "Partial fix for #2841 to improve formatting." This reverts commit f17f3ad. * Implement load_dataset() and load_dataarray() BUG: Fixes #2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray TST: Add tests for sequentially opening and writing to files using new functions DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring Update docstrings and check for cache in kwargs Undeprecate load_dataset Add to api.rst, fix whats-new.rst typo, raise error instead of warning

fujiisoup added the contrib-good-first-issue label Apr 23, 2019

dnowacki-usgs mentioned this issue Apr 25, 2019

Implement load_dataset() and load_dataarray() #2917

Merged

3 tasks

shoyer closed this as completed in #2917 May 16, 2019

snowman2 mentioned this issue Mar 8, 2022

How to overwrite raster opened with rioxarray.open_rasterio ? corteva/rioxarray#485

Closed

BSchilperoort mentioned this issue May 3, 2024

Issues using xarrays in update step eWaterCycle/leakybucket-bmi#3

Closed

spjuhel mentioned this issue Jul 16, 2024

Loading TCTracks with from_ibtracs_netcdf does not properly close the filestream CLIMADA-project/climada_python#920

Open

abinashmk mentioned this issue Aug 28, 2024

Deleting a recently opened netCDF4 #9410

Closed

spjuhel mentioned this issue Sep 26, 2024

Use context manager for xarray dataset file opening CLIMADA-project/climada_python#953

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safely open / close netCDF files without resource locking #2887

Safely open / close netCDF files without resource locking #2887

fujiisoup commented Apr 11, 2019 •

edited

Loading

dcherian commented Apr 11, 2019

shoyer commented Apr 11, 2019 •

edited

Loading

dcherian commented Apr 12, 2019

shoyer commented Apr 12, 2019

fujiisoup commented Apr 13, 2019

shoyer commented Apr 13, 2019 •

edited

Loading

dnowacki-usgs commented Apr 14, 2019

fujiisoup commented Apr 23, 2019

dnowacki-usgs commented Apr 25, 2019

Safely open / close netCDF files without resource locking #2887

Safely open / close netCDF files without resource locking #2887

Comments

fujiisoup commented Apr 11, 2019 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of xr.show_versions()

dcherian commented Apr 11, 2019

shoyer commented Apr 11, 2019 • edited Loading

dcherian commented Apr 12, 2019

shoyer commented Apr 12, 2019

fujiisoup commented Apr 13, 2019

shoyer commented Apr 13, 2019 • edited Loading

dnowacki-usgs commented Apr 14, 2019

fujiisoup commented Apr 23, 2019

dnowacki-usgs commented Apr 25, 2019

fujiisoup commented Apr 11, 2019 •

edited

Loading

Output of `xr.show_versions()`

shoyer commented Apr 11, 2019 •

edited

Loading

shoyer commented Apr 13, 2019 •

edited

Loading