You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not quite sure what to actually title this, so feel free to edit it.
I have some netcdf files modeled after the Argo _prof file format (CF Discrete sampling geometry incomplete multidimensional array representation). While working on splitting these into individual profiles, I would occasionally get exceptions thrown complaining about broadcasting. I eventually narrowed this down to some string variables we maintain for historic purposes. Depending on the row split apart, the string data in each cell could be shorter which would result in a stringN having some different N (e.g. string4 = 3 in the CDL). If while serializing, a different string variable is being encoded that actually has length 4, it would reuse the now incorrect string4 dim name.
The above situation seems to only occur when a netCDF file is read back into xarray and the char_dim_name encoding key is set.
What did you expect to happen?
Successful serialization to netCDF.
Minimal Complete Verifiable Example
# setupimportnumpyasnpimportxarrayasxrone_two=xr.DataArray(np.array(["a", "aa"], dtype="object"), dims=["dim0"])
two_two=xr.DataArray(np.array(["aa", "aa"], dtype="object"), dims=["dim0"])
ds=xr.Dataset({"var0": one_two, "var1": two_two})
ds.var0.encoding["dtype"] ="S1"ds.var1.encoding["dtype"] ="S1"# need to write out and read back inds.to_netcdf("test.nc")
# only selecting the shorter string will failds1=xr.load_dataset("test.nc")
ds1[{"dim0": 1}].to_netcdf("ok.nc")
ds1[{"dim0": 0}].to_netcdf("error.nc")
# will work if the char dim name is removed from encoding of the now shorter arrds1=xr.load_dataset("test.nc")
delds1.var0.encoding["char_dim_name"]
ds1[{"dim0": 0}].to_netcdf("will_work.nc")
The failure is a result of an xarray bug that can occur after subsetting
data that was itself loaded from netcdf.
See pydata/xarray#6352 for the issue and
pydata/xarray#7689 for the fix used to create
the workaround.
What happened?
Not quite sure what to actually title this, so feel free to edit it.
I have some netcdf files modeled after the Argo _prof file format (CF Discrete sampling geometry incomplete multidimensional array representation). While working on splitting these into individual profiles, I would occasionally get exceptions thrown complaining about broadcasting. I eventually narrowed this down to some string variables we maintain for historic purposes. Depending on the row split apart, the string data in each cell could be shorter which would result in a stringN having some different N (e.g. string4 = 3 in the CDL). If while serializing, a different string variable is being encoded that actually has length 4, it would reuse the now incorrect string4 dim name.
The above situation seems to only occur when a netCDF file is read back into xarray and the
char_dim_name
encoding key is set.What did you expect to happen?
Successful serialization to netCDF.
Minimal Complete Verifiable Example
Relevant log output
Anything else we need to know?
I've been unable to recreate the specific error I'm getting in a minimal example. However, removing the
char_dim_name
encoding key does solve this.When digging in the xarray issues, these looked maybe relevant: #2219 #2895
Actual traceback I get with my data
Environment
INSTALLED VERSIONS
commit: None
python: 3.9.9 (main, Jan 5 2022, 11:21:18)
[Clang 13.0.0 (clang-1300.0.29.30)]
python-bits: 64
OS: Darwin
OS-release: 21.3.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.13.0
libnetcdf: 4.8.1
xarray: 2022.3.0
pandas: 1.3.5
numpy: 1.22.0
scipy: None
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: 0.18
sparse: None
setuptools: 58.1.0
pip: 21.2.4
conda: None
pytest: 6.2.5
IPython: 7.31.0
sphinx: 4.4.0
The text was updated successfully, but these errors were encountered: