Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle the character array dim name #2896

Merged
merged 17 commits into from
Apr 19, 2019
Merged

Handle the character array dim name #2896

merged 17 commits into from
Apr 19, 2019

Conversation

jmccreight
Copy link
Contributor

See #2895

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me, but this needs documentation & tests.

Could you kindly add a note to the documentation about this? Maybe somewhere in this section?
http://xarray.pydata.org/en/stable/io.html#string-encoding

Tests could probably go somewhere in this file:
https://github.com/pydata/xarray/blob/master/xarray/tests/test_coding_strings.py

xarray/coding/strings.py Outdated Show resolved Hide resolved
@jmccreight
Copy link
Contributor Author

thanks, @shoyer. I will add the documentation and tests now that the first hurdle is cleared and update the PR.

@pep8speaks
Copy link

pep8speaks commented Apr 17, 2019

Hello @jmccreight! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-04-19 15:13:29 UTC

@jmccreight
Copy link
Contributor Author

@shoyer Added test and documentation. I did not build documentation, wasnt sure if that was necessary.
The history should be squashed when the time comes...

Variable(('x',), [b'ab', b'cdef'], encoding={'char_dim_name': 'foo'})
]
)
def test_CharacterArrayCoder_char_dim_name(original):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better and getting warmer. The test improved the underlying code here.

But I still dont see how to eliminate this logic. Putting an or in the assert does not seem like a better alternative, nor does only testing just one of the two possible input scenarios.... let me know if you have a better idea that I'm too blind to see.

Thanks,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clean way to do this would be to add a second parametrized argument for the expected value, e.g.,

@pytest.mark.parametrize(
    ['original', 'expected_char_dim_name'],
    [
        (Variable(('x',), [b'ab', b'cdef']), 'string4'),
        (Variable(('x',), [b'ab', b'cdef'], encoding={'char_dim_name': 'foo'}), 'char_dim_name')
    ]
)
def test_CharacterArrayCoder_char_dim_name(original, expected_char_dim_name):
     ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the first case of 2-variable usage in that test file... thanks for pointing it out.

else:
default_char_dim_name = 'string%s' % data.shape[-1]
dims = dims + (default_char_dim_name,)
encoding['char_dim_name'] = default_char_dim_name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is causing a set of failures. I think this line is desirable. I could be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 111, to be clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention we have (which is apparently enforced by tests) is that when you use an encoding you remove it from the encoding dict. That way we catch cases where you have a typo, e.g., char_dimension_name instead of char_dim_name.

So this section should instead probably be something like:

char_dim_name = encoding.pop(
    'char_dim_name', 'string%s' % data.shape[-1])
dims = dims + (char_dim_name,)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great reason, this solves all the errors I was seeing locally. THanks

@jmccreight
Copy link
Contributor Author

I'm uncertain why travis is failing. Two of them look http-related and the other maybe be docs-related (but dont trust me). Running pytest locallin in the xarray/tests/ dir

============== 7007 passed, 1170 skipped, 25 xfailed, 1 xpassed, 30 warnings in 62.12 seconds ==============

@dcherian
Copy link
Contributor

One is a lint failure: #2896 (comment)

The docs failure looks like a cartopy install problem

@jmccreight
Copy link
Contributor Author

🎊

@shoyer
Copy link
Member

shoyer commented Apr 19, 2019

Could you please add a brief note to whats-new.rst? Otherwise this looks good to go.

@jmccreight
Copy link
Contributor Author

jmccreight commented Apr 19, 2019

🤦‍♂️ with that formatting in the whats-new.rst. (a reminder to squash)
I think this is complete.
thanks for the mini tour of xarray internals, I learned some useful things!

@dcherian dcherian merged commit 6d93a95 into pydata:master Apr 19, 2019
@dcherian
Copy link
Contributor

Thanks @jmccreight

dcherian added a commit to yohai/xarray that referenced this pull request Apr 19, 2019
* master: (29 commits)
  Handle the character array dim name  (pydata#2896)
  Partial fix for pydata#2841 to improve formatting. (pydata#2906)
  docs: Move quick overview one level up (pydata#2890)
  Manually specify chunks in open_zarr (pydata#2530)
  Minor improvement of docstring for Dataset (pydata#2904)
  Fix minor typos in docstrings (pydata#2903)
  Added docs example for `xarray.Dataset.get()` (pydata#2894)
  Bugfix for docs build instructions (pydata#2897)
  Return correct count for scalar datetime64 arrays (pydata#2892)
  Indexing with an empty array (pydata#2883)
  BUG: Fix pydata#2864 by adding the missing vrt parameters (pydata#2865)
  Reduce length of cftime resample tests (pydata#2879)
  WIP: type annotations (pydata#2877)
  decreased pytest verbosity (pydata#2881)
  Fix mypy typing error in cftime_offsets.py (pydata#2878)
  update links to https (pydata#2872)
  revert to 0.12.2 dev
  0.12.1 release
  Various fixes for explicit Dataset.indexes (pydata#2858)
  Fix minor typo in docstring (pydata#2860)
  ...
@castelao
Copy link

Perfect timing, I just needed that! Thanks @jmccreight et al

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants