Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set/preserve the character array dimension name #2895

Closed
jmccreight opened this issue Apr 15, 2019 · 2 comments
Closed

Set/preserve the character array dimension name #2895

jmccreight opened this issue Apr 15, 2019 · 2 comments

Comments

@jmccreight
Copy link
Contributor

jmccreight commented Apr 15, 2019

This is a new feature proposal not a bug. I'll open a PR against this issue momentarily, it consists of 4 lines of new code.

I've found it highly annoying that one can not set the name of the character array dimension. Looking at the code, I basically found what I expected, except for what I added. Summary: Using a variable's variable.encoding one can decode the name into variable.encoding['char_dim_name'] or one can simply set it when creating data from scratch. The "char_dim_name" can be applied upon encoding. It's simple. All the new code is the same code that already handled character arrays, so there may not be any nasty edge cases.

This shows how it works and the behavoir it changes:

# # Using the proposed changes.... 
# user@machine-session-1[1]:~/Downloads> ipython

import xarray as xa
char_arr = ['abc', 'def', 'ghi']
ds = xa.Dataset(data_vars={'char_arr': char_arr})
ds.char_arr.encoding.update({"dtype": "S1"})

# Default/current behavior
ds.to_netcdf('char_arr_string.nc')

# New functionality - name the character dimension.
ds.char_arr.encoding.update({"char_dim_name": "char_dim"})
ds.to_netcdf('char_arr_named.nc')

# user@machine-session-2[1]:~/Downloads> ncdump -h char_arr_string.nc
# netcdf char_arr_string {
# dimensions:
# 	char_arr = 3 ;
# 	string3 = 3 ;
# variables:
# 	char char_arr(char_arr, string3) ;
# 		char_arr:_Encoding = "utf-8" ;
# }
#
# user@machine-session-2[2]:~/Downloads> ncdump -h char_arr_named.nc 
# netcdf char_arr_named {
# dimensions:
# 	char_arr = 3 ;
# 	char_dim = 3 ;
# variables:
# 	char char_arr(char_arr, char_dim) ;
# 		char_arr:_Encoding = "utf-8" ;
# }


# New functionality - when decoding, preserve the character dimension name in the variable encoding for... encoding.
ds_read = xa.open_dataset('char_arr_named.nc')
ds_read.char_arr.encoding
# Out[4]: 
# {'_Encoding': 'utf-8',
#  'char_dim_name': 'char_dim',
#  'chunksizes': None,
#  'complevel': 0,
#  'contiguous': True,
#  'dtype': dtype('S1'),
#  'fletcher32': False,
#  'original_shape': (3, 3),
#  'shuffle': False,
#  'source': '/Users/james/Downloads/char_arr_named.nc',
#  'zlib': False}

ds_read.to_netcdf('char_arr_named_2.nc')
exit()


# user@machine-session-1[2]:~/Downloads> ncdump -h char_arr_named_2.nc 
# netcdf char_arr_named_2 {
# dimensions:
# 	char_arr = 3 ;
# 	char_dim = 3 ;
# variables:
# 	char char_arr(char_arr, char_dim) ;
# 		char_arr:_Encoding = "utf-8" ;
# }

# user@machine-session-1[3]:~/Downloads> pip uninstall -y xarray
# user@machine-session-1[4]:~/Downloads> pip install xarray
# user@machine-session-1[5]:~/Downloads> ipython

# The old behavior... does not preserved the char dim name.
import xarray as xa
ds_read = xa.open_dataset('char_arr_named.nc')
ds_read.char_arr.encoding
# Out[4]: 
# {'_Encoding': 'utf-8',
#  'chunksizes': None,
#  'complevel': 0,
#  'contiguous': True,
#  'dtype': dtype('S1'),
#  'fletcher32': False,
#  'original_shape': (3, 3),
#  'shuffle': False,
#  'source': '/Users/james/Downloads/char_arr_named.nc',
#  'zlib': False}

ds_read.to_netcdf('char_arr_string_2.nc')

# user@machine-session-2[6]:~/Downloads> ncdump -y char_arr_string_2.nc 
# netcdf char_arr_string_2 {
# dimensions:
# 	char_arr = 3 ;
# 	string3 = 3 ;
# variables:
# 	char char_arr(char_arr, string3) ;
# 		char_arr:_Encoding = "utf-8" ;
# }

@jmccreight jmccreight changed the title Set/preserved the character array dimension name Set/preserve the character array dimension name Apr 15, 2019
@shoyer
Copy link
Member

shoyer commented Apr 16, 2019

+1 from me, this sounds like an easy enough addition.

@shoyer
Copy link
Member

shoyer commented Apr 19, 2019

fixed by #2899

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants