Handle the character array dim name #2896

jmccreight · 2019-04-15T21:41:36Z

…decode and reapply in encode

shoyer

The implementation looks good to me, but this needs documentation & tests.

Could you kindly add a note to the documentation about this? Maybe somewhere in this section?
http://xarray.pydata.org/en/stable/io.html#string-encoding

Tests could probably go somewhere in this file:
https://github.com/pydata/xarray/blob/master/xarray/tests/test_coding_strings.py

xarray/coding/strings.py

jmccreight · 2019-04-16T15:04:46Z

thanks, @shoyer. I will add the documentation and tests now that the first hurdle is cleared and update the PR.

pep8speaks · 2019-04-17T21:26:41Z

Hello @jmccreight! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-04-19 15:13:29 UTC

xarray/tests/test_coding_strings.py

jmccreight · 2019-04-17T21:42:08Z

@shoyer Added test and documentation. I did not build documentation, wasnt sure if that was necessary.
The history should be squashed when the time comes...

jmccreight · 2019-04-18T03:41:33Z

xarray/tests/test_coding_strings.py

+        Variable(('x',), [b'ab', b'cdef'], encoding={'char_dim_name': 'foo'})
+    ]
+)
+def test_CharacterArrayCoder_char_dim_name(original):


I think this is better and getting warmer. The test improved the underlying code here.

But I still dont see how to eliminate this logic. Putting an or in the assert does not seem like a better alternative, nor does only testing just one of the two possible input scenarios.... let me know if you have a better idea that I'm too blind to see.

Thanks,

The clean way to do this would be to add a second parametrized argument for the expected value, e.g.,

@pytest.mark.parametrize( ['original', 'expected_char_dim_name'], [ (Variable(('x',), [b'ab', b'cdef']), 'string4'), (Variable(('x',), [b'ab', b'cdef'], encoding={'char_dim_name': 'foo'}), 'char_dim_name') ] ) def test_CharacterArrayCoder_char_dim_name(original, expected_char_dim_name): ...

That is the first case of 2-variable usage in that test file... thanks for pointing it out.

jmccreight · 2019-04-18T15:36:06Z

xarray/coding/strings.py

+            else:
+                default_char_dim_name = 'string%s' % data.shape[-1]
+                dims = dims + (default_char_dim_name,)
+                encoding['char_dim_name'] = default_char_dim_name


I think this line is causing a set of failures. I think this line is desirable. I could be wrong.

line 111, to be clear.

The convention we have (which is apparently enforced by tests) is that when you use an encoding you remove it from the encoding dict. That way we catch cases where you have a typo, e.g., char_dimension_name instead of char_dim_name.

So this section should instead probably be something like:

char_dim_name = encoding.pop( 'char_dim_name', 'string%s' % data.shape[-1]) dims = dims + (char_dim_name,)

Great reason, this solves all the errors I was seeing locally. THanks

jmccreight · 2019-04-18T20:22:55Z

I'm uncertain why travis is failing. Two of them look http-related and the other maybe be docs-related (but dont trust me). Running pytest locallin in the xarray/tests/ dir

============== 7007 passed, 1170 skipped, 25 xfailed, 1 xpassed, 30 warnings in 62.12 seconds ==============

dcherian · 2019-04-18T20:27:46Z

One is a lint failure: #2896 (comment)

The docs failure looks like a cartopy install problem

jmccreight · 2019-04-18T21:05:33Z

🎊

shoyer · 2019-04-19T04:49:07Z

Could you please add a brief note to whats-new.rst? Otherwise this looks good to go.

jmccreight · 2019-04-19T17:36:22Z

🤦‍♂️ with that formatting in the whats-new.rst. (a reminder to squash)
I think this is complete.
thanks for the mini tour of xarray internals, I learned some useful things!

dcherian · 2019-04-19T17:53:01Z

Thanks @jmccreight

* master: (29 commits) Handle the character array dim name (pydata#2896) Partial fix for pydata#2841 to improve formatting. (pydata#2906) docs: Move quick overview one level up (pydata#2890) Manually specify chunks in open_zarr (pydata#2530) Minor improvement of docstring for Dataset (pydata#2904) Fix minor typos in docstrings (pydata#2903) Added docs example for `xarray.Dataset.get()` (pydata#2894) Bugfix for docs build instructions (pydata#2897) Return correct count for scalar datetime64 arrays (pydata#2892) Indexing with an empty array (pydata#2883) BUG: Fix pydata#2864 by adding the missing vrt parameters (pydata#2865) Reduce length of cftime resample tests (pydata#2879) WIP: type annotations (pydata#2877) decreased pytest verbosity (pydata#2881) Fix mypy typing error in cftime_offsets.py (pydata#2878) update links to https (pydata#2872) revert to 0.12.2 dev 0.12.1 release Various fixes for explicit Dataset.indexes (pydata#2858) Fix minor typo in docstring (pydata#2860) ...

castelao · 2019-04-19T22:48:54Z

Perfect timing, I just needed that! Thanks @jmccreight et al

Handle the charachter array dim name in a variables encoding, set in …

422ce01

…decode and reapply in encode

shoyer reviewed Apr 16, 2019

View reviewed changes

xarray/coding/strings.py Outdated Show resolved Hide resolved

jmccreight added 3 commits April 16, 2019 23:40

Document char_dim_name

70799e2

Minor change to set of char_dim_name

6cf937f

Test the roundtrip of the char_dim_name in encoding.

7352e50

jmccreight added 2 commits April 17, 2019 15:33

Merge remote-tracking branch 'origin/master'

9bdc94f

pep8 or die

5c57885

jmccreight commented Apr 17, 2019

View reviewed changes

xarray/tests/test_coding_strings.py Outdated Show resolved Hide resolved

Better test for char_dim_name

f5ffd48

jmccreight commented Apr 18, 2019

View reviewed changes

jmccreight added 2 commits April 17, 2019 21:44

pep8 79char madness

7a1753f

nix test logic, use multiple parameterized vars

609e475

jmccreight commented Apr 18, 2019

View reviewed changes

jmccreight added 2 commits April 18, 2019 11:54

When encoding and encoding, remove it from encoding

cda3933

Simpler is better

6070f7c

pep8 visual indent complaint

768dc31

jmccreight added 5 commits April 19, 2019 09:07

what is new!

b190417

what is newer than new!

fc13ea2

what is newer than newer!

1de5b3f

what is newer than newer-er!

1c1a543

what is newer than newer-est!

eb4b055

dcherian merged commit 6d93a95 into pydata:master Apr 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle the character array dim name #2896

Handle the character array dim name #2896

jmccreight commented Apr 15, 2019

shoyer left a comment

jmccreight commented Apr 16, 2019

pep8speaks commented Apr 17, 2019 •

edited

Loading

jmccreight commented Apr 17, 2019

jmccreight Apr 18, 2019

shoyer Apr 18, 2019

jmccreight Apr 18, 2019

jmccreight Apr 18, 2019

jmccreight Apr 18, 2019

shoyer Apr 18, 2019

jmccreight Apr 18, 2019

jmccreight commented Apr 18, 2019

dcherian commented Apr 18, 2019

jmccreight commented Apr 18, 2019

shoyer commented Apr 19, 2019

jmccreight commented Apr 19, 2019 •

edited

Loading

dcherian commented Apr 19, 2019

castelao commented Apr 19, 2019

Handle the character array dim name #2896

Handle the character array dim name #2896

Conversation

jmccreight commented Apr 15, 2019

shoyer left a comment

Choose a reason for hiding this comment

jmccreight commented Apr 16, 2019

pep8speaks commented Apr 17, 2019 • edited Loading

Comment last updated at 2019-04-19 15:13:29 UTC

jmccreight commented Apr 17, 2019

jmccreight Apr 18, 2019

Choose a reason for hiding this comment

shoyer Apr 18, 2019

Choose a reason for hiding this comment

jmccreight Apr 18, 2019

Choose a reason for hiding this comment

jmccreight Apr 18, 2019

Choose a reason for hiding this comment

jmccreight Apr 18, 2019

Choose a reason for hiding this comment

shoyer Apr 18, 2019

Choose a reason for hiding this comment

jmccreight Apr 18, 2019

Choose a reason for hiding this comment

jmccreight commented Apr 18, 2019

dcherian commented Apr 18, 2019

jmccreight commented Apr 18, 2019

shoyer commented Apr 19, 2019

jmccreight commented Apr 19, 2019 • edited Loading

dcherian commented Apr 19, 2019

castelao commented Apr 19, 2019

pep8speaks commented Apr 17, 2019 •

edited

Loading

jmccreight commented Apr 19, 2019 •

edited

Loading