Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

passing unlimited_dims to to_netcdf triggers RuntimeError: NetCDF: Invalid argument #1849

Closed
gerritholl opened this issue Jan 22, 2018 · 12 comments · Fixed by #2941
Closed

Comments

@gerritholl
Copy link
Contributor

For some datafiles with properties I cannot quite reproduce, .to_netcdf leads to a RuntimeError: NetCDF: Invalid argument if and only if I pass an unlimited_dims corresponding to y. The problem is hard to reproduce. It happens to this particular dataset, but not to seemingly identical ones created from scratch. I attach sample.nc (gzipped so github would let me upload it).

$ cat mwe.py 
#!/usr/bin/env python3.6
import xarray

ds = xarray.open_dataset("sample.nc")
ds.to_netcdf("sample2.nc", unlimited_dims=["y"])
$ ncdump sample.nc 
netcdf sample {
dimensions:
        y = 6 ;
variables:
        float x(y) ;
                x:_FillValue = NaNf ;
        int64 y(y) ;
data:

 x = 0, 0, 0, 0, 0, 0 ;

 y = 0, 1, 2, 3, 4, 5 ;
}
$ ./mwe.py 
Traceback (most recent call last):
  File "./mwe.py", line 5, in <module>
    ds.to_netcdf("sample2.nc", unlimited_dims=["y"])
  File "/dev/shm/gerrit/venv/stable-3.6/lib/python3.6/site-packages/xarray/core/dataset.py", line 1133, in to_netcdf
    unlimited_dims=unlimited_dims)
  File "/dev/shm/gerrit/venv/stable-3.6/lib/python3.6/site-packages/xarray/backends/api.py", line 627, in to_netcdf
    unlimited_dims=unlimited_dims)
  File "/dev/shm/gerrit/venv/stable-3.6/lib/python3.6/site-packages/xarray/core/dataset.py", line 1070, in dump_to_store
    unlimited_dims=unlimited_dims)
  File "/dev/shm/gerrit/venv/stable-3.6/lib/python3.6/site-packages/xarray/backends/common.py", line 254, in store
    *args, **kwargs)
  File "/dev/shm/gerrit/venv/stable-3.6/lib/python3.6/site-packages/xarray/backends/common.py", line 221, in store
    unlimited_dims=unlimited_dims)
  File "/dev/shm/gerrit/venv/stable-3.6/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 339, in set_variables
    super(NetCDF4DataStore, self).set_variables(*args, **kwargs)
  File "/dev/shm/gerrit/venv/stable-3.6/lib/python3.6/site-packages/xarray/backends/common.py", line 233, in set_variables
    name, v, check, unlimited_dims=unlimited_dims)
  File "/dev/shm/gerrit/venv/stable-3.6/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 385, in prepare_variable
    fill_value=fill_value)
  File "netCDF4/_netCDF4.pyx", line 2437, in netCDF4._netCDF4.Dataset.createVariable
  File "netCDF4/_netCDF4.pyx", line 3439, in netCDF4._netCDF4.Variable.__init__
  File "netCDF4/_netCDF4.pyx", line 1638, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Invalid argument


Output of xr.show_versions()

# Paste the output here xr.show_versions() here $ ./mwe.py

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-696.6.3.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

xarray: 0.10.0+dev39.ge31cf43
pandas: 0.22.0
numpy: 1.14.0
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: None
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.16.1
distributed: None
matplotlib: 2.1.2
cartopy: 0.15.1
seaborn: 0.8.1
setuptools: 38.4.0
pip: 9.0.1
conda: 4.3.16
pytest: 3.1.2
IPython: 6.1.0
sphinx: 1.6.2

sample.nc.gz

@gerritholl
Copy link
Contributor Author

Not sure if the attachment came through. Trying again:

sample.nc.gz

@jhamman
Copy link
Member

jhamman commented Jan 30, 2018

Thanks for the report. This seems like a bug to me and I'm frankly not sure why it isn't working. I'll look into it more.

@markelg
Copy link
Contributor

markelg commented Feb 16, 2018

This happened to me today after introducing some modifications in a code that was working fine. I have tried to trace it without success. Finally, I found a workaround which consist on removing the "contiguous" entry from the .encoding attributes. This works with gerritholl's file:

import xarray as xr
ds = xr.open_dataset("sample.nc")
del ds.x.encoding["contiguous"]
del ds.y.encoding["contiguous"]
ds.to_netcdf("sample2.nc", unlimited_dims=["y"])

So it seems that this entry in the encoding dictionaries is triggering the error.

OK, so I guess that this explains it, from the netCDF4 documentation:

"contiguous: if True (default False), the variable data is stored contiguously on disk. Default False. Setting to True for a variable with an unlimited dimension will trigger an error."

This is quite an obscure error right now, so maybe we could force contiguous to be False when unlimited_dims is being used, or either raise a more informative error.

@jhamman
Copy link
Member

jhamman commented Feb 16, 2018

@markelg - thanks for digging into this a bit. Based on what you're saying, I think we need to raise an informative error here.

@floriankrb
Copy link
Contributor

I also ran into this issue : to_netcdf fails for my dataset.

Here is how to reproduce the error : (the testfile is attached here 1.zip )

import netCDF4                                                                                                                                                                           
import xarray as xr                                                                                                                                                                      
print(netCDF4.__version__)                                                                                                                                                               
print(xr.__version__)                                                                                                                                                                    
ds = xr.open_dataset('testfile')                                                                                                                                                         
ds.to_netcdf('outfile.ok')                                                                                                                                                               
ds.to_netcdf('outfile.not.ok', unlimited_dims=['datetime'])  

And the ouput I get :

1.3.1
0.10.1
Traceback (most recent call last):
File "bug.py", line 6, in
ds.to_netcdf('outfile', unlimited_dims=['datetime'])
File "/home/pinaultf/miniconda3/envs/defaultenv/lib/python3.6/site-packages/xarray/core/dataset.py", line 1133, in to_netcdf
unlimited_dims=unlimited_dims)
File "/home/pinaultf/miniconda3/envs/defaultenv/lib/python3.6/site-packages/xarray/backends/api.py", line 632, in to_netcdf
unlimited_dims=unlimited_dims)
File "/home/pinaultf/miniconda3/envs/defaultenv/lib/python3.6/site-packages/xarray/core/dataset.py", line 1070, in dump_to_store
unlimited_dims=unlimited_dims)
File "/home/pinaultf/miniconda3/envs/defaultenv/lib/python3.6/site-packages/xarray/backends/common.py", line 280, in store
unlimited_dims=unlimited_dims)
File "/home/pinaultf/miniconda3/envs/defaultenv/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 348, in set_variables
super(NetCDF4DataStore, self).set_variables(*args, **kwargs)
File "/home/pinaultf/miniconda3/envs/defaultenv/lib/python3.6/site-packages/xarray/backends/common.py", line 317, in set_variables
name, v, check, unlimited_dims=unlimited_dims)
File "/home/pinaultf/miniconda3/envs/defaultenv/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 393, in prepare_variable
fill_value=fill_value)
File "netCDF4/_netCDF4.pyx", line 2437, in netCDF4._netCDF4.Dataset.createVariable
File "netCDF4/_netCDF4.pyx", line 3439, in netCDF4._netCDF4.Variable.init
File "netCDF4/_netCDF4.pyx", line 1638, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Invalid argument

gerritholl added a commit to FIDUCEO/FCDR_HIRS that referenced this issue Jul 20, 2018
@jmccreight
Copy link
Contributor

I apparently have this problem too.
Thanks @gerritholl for the workaround.

@dcherian
Copy link
Contributor

dcherian commented May 2, 2019

@jmccreight Are you up for sending in a PR to raise an informative error message?

@jmccreight
Copy link
Contributor

I could be persuaded.

I just dont understand how 'contiguous' gets set on the encoding of these variables and if that is appropriate. Does that seem obvious/clear to anyone?

I still dont understand why this is happening for me. I made some fairly small modifications to some code that never threw this error in the past. The small mods could have done it, but the identical code on my laptop did not throw this error on a small sample dataset. Then I went to cheyenne, where all bets are off!

@dcherian
Copy link
Contributor

dcherian commented May 2, 2019

does ncdump -sh show whether contiguous is true?

@jmccreight
Copy link
Contributor

jmccreight commented May 3, 2019

Here's what I understand so far.
For my file, i write it with ("ensured") and without ("unensured") the workaround (actually @markelg for discovering this).

(base) jamesmcc@cheyenne3[1021]:/glade/scratch/jamesmcc/florence_cutout_routelink_ensemble_run/ensemble> grep '_Storage' ensured_ncdsh.txt
		feature_id:_Storage = "contiguous" ;
		latitude:_Storage = "contiguous" ;
		longitude:_Storage = "contiguous" ;
		time:_Storage = "chunked" ;
		member:_Storage = "contiguous" ;
		crs:_Storage = "chunked" ;
		order:_Storage = "chunked" ;
		elevation:_Storage = "chunked" ;
		streamflow:_Storage = "chunked" ;
		q_lateral:_Storage = "chunked" ;
		velocity:_Storage = "chunked" ;
		Head:_Storage = "chunked" ;
(base) jamesmcc@cheyenne3[1022]:/glade/scratch/jamesmcc/florence_cutout_routelink_ensemble_run/ensemble> grep '_Storage' unensured_ncdsh.txt
		feature_id:_Storage = "contiguous" ;
		latitude:_Storage = "contiguous" ;
		longitude:_Storage = "contiguous" ;
		time:_Storage = "chunked" ;
		member:_Storage = "contiguous" ;
		crs:_Storage = "chunked" ;

The error that is thrown is, just the tail end of it:

/glade/p/cisl/nwc/jamesmcc/anaconda3/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims)
    466                 least_significant_digit=encoding.get(
    467                     'least_significant_digit'),
--> 468                 fill_value=fill_value)
    469             _disable_auto_decode_variable(nc4_var)
    470 

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.createVariable()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__init__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: Invalid argument

If I go to line 464 in xarray/backends/netCDF4_.py, I see that the variable it is failing on is crs. If I

print(name)
crs
encoding.get('contiguous', False)
True

but the ncdump -sh shows it's actually chunked. I'm not sure this is exactly what's raising the error down the line, but these two things seem to be at odds.

My current question is "why does encoding.get('contiguous', False) return True?"

If you have any insights let me know. I probably wont have time to mess with this until next week.

@dcherian
Copy link
Contributor

dcherian commented May 3, 2019

Because it's set in your input file. Both example files in this thread have _Storage = 'contiguous' set on all variables but the dimensions are not unlimited, so these files are compliant. Here's the output of ncdump -sh sample.nc (second comment in this thread)

netcdf sample {
dimensions:
	y = 6 ;
variables:
	float x(y) ;
		x:_FillValue = NaNf ;
		x:_Storage = "contiguous" ;
		x:_Endianness = "little" ;
	int64 y(y) ;
		y:_Storage = "contiguous" ;
		y:_Endianness = "little" ;

// global attributes:
		:_NCProperties = "version=1,netcdflibversion=4.4.1.1,hdf5libversion=1.8.18" ;
		:_SuperblockVersion = 0 ;
		:_IsNetcdf4 = 1 ;
		:_Format = "netCDF-4" ;

When you ask xarray to write out an unlimited dimension, it doesn't delete encoding['contiguous'] and then netCDF4 raises an error (I think).

It's probable that the underlying software you're using to write has probably changed versions and is setting it by default. You can check this by comparing the output of ncdump -sh file.nc on cheyenne and your local machine

If this is right, the solution would be either
a) delete encoding['contiguous'] if it is True when asked to write out an unlimited dimension.
b) raise a warning and ask the user to do the deletion before writing.

My preference is for (a).

@jmccreight
Copy link
Contributor

@dcherian Thanks,

First, I think you're right that the encoding['contiguous']=True is coming from the input file. That was not clear to me (and I did not read the xarray code to verify). But it makes sense.

Second, my example shows something more slightly complicated than the original example which was also not clear to me. In my case the unlimited dimension (time) is chunked and is being successfully written in both cases (before and after work around). The error/ failure is happening on the a variable that contains the unlimited dimension but which has encoding['contiguous']=True for the variable.

This makes sense upon a slightly more nuanced reading of the netcdf4 manual (as quoted my markelg)

"contiguous: if True (default False), the variable data is stored contiguously on disk. Default False. Setting to True for a variable with an unlimited dimension will trigger an error."

The last sentence apparently means that for any variable with an unlimited dimension the use of contiguous=True triggers an error. That was not clear to me until I looked a bit harder at this. I think that slightly refines the strategy of how to deal with the problem.

I propose that the solution should be both
a) delete encoding['contiguous'] if it is True when asked to write out a variable containing an unlimited dimension.
b) raise an informative warning that the variable was chunked because it contained an unlimited dimension. (If a user hates warnings, they could can handle this deletion herself. One the other hand, there's really nothing else to do, so I'm not sure the warning is necessary... I dont have strong opinion on this, but the code is fiddling with the encodings under the hood, so a warning seems polite).

A final question: should the encoding['contiguous'] be removed from the xarray variable or should it just be removed for purposes of writing it to ncdf4 on disk? I suppose a user could be writing the xarray dataset to another format that might allow what netcdf does not allow. This should be an easy detail.

I'll make a PR with the above and we can evaluate the concrete changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants