Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit support to cf-xarray #197

Merged
merged 15 commits into from
May 18, 2021
Merged

Conversation

jthielen
Copy link
Contributor

@jthielen jthielen commented Apr 6, 2021

Closes xarray-contrib/pint-xarray#26

As discussed in xarray-contrib/pint-xarray#26, this PR adds a copy of MetPy's Pint registry (which has some customization to better support CF/UDUNITS-style units most commonly found in CF-compliant NetCDF files) to cf-xarray. However, right now that is all that this does at this point, and so, I'd like to put this up now as a work-in-progress and hopefully start a more detailed conversation about how to move forward with this:

  • Main Question: how should cf-xarray relate to pint-xarray?
    • inherit .pint accessor in the .cf accessor and use with this registry?
    • set this ported registry to the application registry and provide documentation on how to use it with pint-xarray? <-- my preference
    • just provide this ported registry as a standalone entity, but document how to incorporate it and pint-xarray into common CF-related workflows?
  • Should Pint and this registry be a required or optional dependency?
  • How should we handle matplotlib support (which requires enabling on the registry with units.setup_matplotlib()) and testing?
  • How should this unit support best be documented and tested (in isolation from a calculation suite like MetPy has)?

However, feel free to bring up any other related discussion points as well!

cc @dcherian, @keewis

@dcherian
Copy link
Contributor

dcherian commented Apr 6, 2021

Thanks @jthielen! This is very exciting to see.

  1. set this ported registry to the application registry and provide documentation on how to use it with pint-xarray?
  2. just provide this ported registry as a standalone entity, but document how to incorporate it and pint-xarray into common CF-related workflows?

For 2) would the registry would be set at import? I think we may want the user to opt in with cfxr.setup_udunits_registry() (or something like that). I'm really torn here :) I guess it's not a bad assumption that anyone importing cf_xarray would be reasonably happy with setting this registry when possible. Curious to hear other opinions here.

Should Pint and this registry be a required or optional dependency?

Optional. I'd like for cf_xarray to be an easy import that really only depends on xarray for most things.

How should we handle matplotlib support?

This is very cool! But since the user needs to set units manually on the axes, perhaps we can also expect them to call ureg.setup_matplotlib(True) if they want this feature? Also curious to here opinions from people that use this feature.

How should this unit support best be documented and tested

I think we can test quantifying DataArrays with units like "degrees_north" etc. that this registry provides.

@dcherian
Copy link
Contributor

dcherian commented Apr 6, 2021

perhaps we can also expect them to call ureg.setup_matplotlib(True) if they want this feature?

Actually is there any downside to calling setup_matplotlib if the user doesn't set units on the axes? What happens then?

Copy link
Contributor

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

The only thing I still have to figure out is how the registry would be set. Using pint.set_application_registry doesn't really work because pint_xarray sets its default registry on import, which means that we'd have to make sure pint_xarray is imported after cf_xarray (should be fine if isort is used).

Edit: requiring the user to set the unit registry themselves would work, too:

pint_xarray.unit_registry = cf_xarray.units.units

cf_xarray/units.py Outdated Show resolved Hide resolved
cf_xarray/units.py Show resolved Hide resolved
cf_xarray/units.py Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Apr 7, 2021

Codecov Report

Merging #197 (0daca00) into main (5d3e5fe) will decrease coverage by 0.01%.
The diff coverage is 95.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #197      +/-   ##
==========================================
- Coverage   96.47%   96.45%   -0.02%     
==========================================
  Files          12       14       +2     
  Lines        1561     1609      +48     
==========================================
+ Hits         1506     1552      +46     
- Misses         55       57       +2     
Flag Coverage Δ
unittests 96.45% <95.83%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cf_xarray/units.py 88.23% <88.23%> (ø)
cf_xarray/tests/test_units.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d3e5fe...0daca00. Read the comment docs.

@jthielen
Copy link
Contributor Author

jthielen commented Apr 8, 2021

Actually is there any downside to calling setup_matplotlib if the user doesn't set units on the axes? What happens then?

Now that you mention it, no, I don't believe so, as it just sets matplotlib.units.registry[registry.Quantity] so that matplotlib can recognize Quantity instances as having a special converter instead of a duck array type that gets downcast to ndarray. The only downside I can see to calling setup_matplotlib is if a user doesn't have matplotlib installed. For that case, I just wrapped it in an try-except block.

For 2) would the registry would be set at import?

Yes, that is what I was envisioning, however, I wasn't thinking this module would be available on the top level by default (deliberate lack of inclusion in __init__.py), so the import itself (from cf_xarray.units import units) would be the opt-in (so no need for cfxr.setup_udunits_registry()). Given this, it seems like setting it as the application registry is the best solution for most users, however, let me know if that isn't the case!

@jthielen
Copy link
Contributor Author

jthielen commented Apr 8, 2021

Current todo list remaining:

  • add documentation
    • Examples of using cf-xarray and pint-xarray together
    • Instructing users to either import cf-xarray before pint-xarray or set pint_xarray.unit_registry = units
    • List current known shortfalls of this registry compared to full UDUNITS support (namely case sensitivity)
  • add tests of this registry's quantities being used within xarray, and not just correctly parsing units with the customized features.
  • Add whats-new.rst note

@dcherian
Copy link
Contributor

dcherian commented Apr 8, 2021

I can see to calling setup_matplotlib is if a user doesn't have matplotlib installed. For that case, I just wrapped it in an try-except block.

Perfect.

Instructing users to either import cf-xarray before pint-xarray or set pint_xarray.unit_registry = units

Is there a reason someone using cf-xarray (requiring xarray) would want the units registry (requiring pint) without using pint_xarray (requires both xarray and pint)? If not, shall we set pint_xarray.unit_registry = units in units.py?

it seems like setting it as the application registry is the best solution for most users

I don't know enough to have an opinion here. Up to you, @keewis and @jthielen

@dcherian
Copy link
Contributor

dcherian commented Apr 8, 2021

Test failure is real.

=================================== FAILURES ===================================
__________________________ test_udunits_power_syntax ___________________________

    def test_udunits_power_syntax():
        """Test that UDUNITS style powers are properly parsed and interpreted."""
>       assert units("m2 s-2").units == units.m ** 2 / units.s ** 2

cf_xarray/tests/test_units.py:54: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/share/miniconda/envs/cf_xarray_test/lib/python3.7/site-packages/pint/registry.py:1264: in parse_expression
    lambda x: self._eval_token(x, case_sensitive=case_sensitive, **values)
/usr/share/miniconda/envs/cf_xarray_test/lib/python3.7/site-packages/pint/pint_eval.py:93: in evaluate
    return bin_op[op_text](left, self.right.evaluate(define_op, bin_op, un_op))
/usr/share/miniconda/envs/cf_xarray_test/lib/python3.7/site-packages/pint/pint_eval.py:93: in evaluate
    return bin_op[op_text](left, self.right.evaluate(define_op, bin_op, un_op))
/usr/share/miniconda/envs/cf_xarray_test/lib/python3.7/site-packages/pint/quantity.py:115: in wrapped
    return f(self, *args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Quantity(1, 'second')>, other = -2

    @check_implemented
    def __pow__(self, other):
        ...
            new_self = self
            if other == 1:
                return self
            elif other == 0:
                exponent = 0
                units = self.UnitsContainer()
            else:
                if not self._is_multiplicative:
                    if self._REGISTRY.autoconvert_offset_to_baseunit:
                        new_self = self.to_root_units()
                    else:
                        raise OffsetUnitCalculusError(self._units)
    
                if getattr(other, "dimensionless", False):
                    exponent = other.to_root_units().magnitude
                    units = new_self._units ** exponent
                elif not getattr(other, "dimensionless", True):
                    raise DimensionalityError(other._units, "dimensionless")
                else:
                    exponent = _to_magnitude(
                        other, self.force_ndarray, self.force_ndarray_like
                    )
                    units = new_self._units ** exponent
    
>           magnitude = new_self._magnitude ** exponent
E           ValueError: Integers to negative integer powers are not allowed.

@keewis
Copy link
Contributor

keewis commented Apr 8, 2021

that's a numpy issue:

In [3]: np.array(2) ** -1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-8b4168950dcb> in <module>
----> 1 np.array(2) ** -1

ValueError: Integers to negative integer powers are not allowed.

which can be fixed by casting the value or the exponent to float

Edit: not sure how to do that in the re.sub call
Edit2: now that I look at it, you might be able to speed up the call to re.sub a little by precompiling the expression (re.compile).

cf_xarray/units.py Outdated Show resolved Hide resolved
@jthielen
Copy link
Contributor Author

Sorry for the delayed response here!

Is there a reason someone using cf-xarray (requiring xarray) would want the units registry (requiring pint) without using pint_xarray (requires both xarray and pint)? If not, shall we set pint_xarray.unit_registry = units in units.py?

Not that I can see! While we could set pint_xarray.default_registry = units in units.py, we'd have to make sure the import ordering is addressed properly, since pint_xarray.default_registry gets set to the application registry in pint-xarray. Some careful tests with both import orders are likely needed to verify we get the intended behavior.

not sure if we need a complete fix, but we could work around the failure:

...

which would pass and is also what ds.pint.quantify uses. Maybe documenting that units.__call__ and units.parse_expression won't work in general with this type of definition is enough?

I really do intend for that test to pass as-is, since if we're trying to support UDUNITS syntax, I'd think that we don't want to just support it in the units attribute/being parsed with the verbose method, but through the full standard usage of pint; i.e., something like the following should "just work":

new_thing = (2 * units("s-1")) * quantified_data_array

If we don't want to hold up this PR for such a fix (such as resolving the scalar issue that necessitates using force_ndarray_like in pint-xarray or tweaking pint's parsing internals to better handle this case), however, we can definitely xfail it for now.

Co-authored-by: keewis <keewis@users.noreply.github.com>
@keewis
Copy link
Contributor

keewis commented Apr 17, 2021

tweaking pint's parsing internals

I think that's the way to go: having parse_expression fail because force_ndarray or force_ndarray_like is active seems like a bug in pint. However, we still should investigate whether xarray can treat x.ndim == 0 as a scalar and whether having pint.Quantity.ndim default to 0 if the magnitude raises an AttributeError is possible (which would get rid of the need for force_ndarray_like).

What do you think about adding an additional test for parse_units and xfailing the parse_expression test?

Copy link
Contributor

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcherian, with a fix being discussed in hgrecco/pint#1295 I think we should xfail the __call__ / parse_expression test and add a new one for use with parse_units (we might be able to remove that new test should pint change __call__ to to call parse_units). That would leave only the documentation and examples (I don't think testing pint + xarray with the new unit registry is really necessary, but I would check that the unit registry is used by pint-xarray)

cf_xarray/tests/test_units.py Show resolved Hide resolved
Co-authored-by: keewis <keewis@users.noreply.github.com>
@keewis keewis closed this Apr 28, 2021
@keewis keewis reopened this Apr 28, 2021
@keewis
Copy link
Contributor

keewis commented May 11, 2021

FYI the bug in pint is fixed (the upstream-dev CI should xpass the test)

@dcherian
Copy link
Contributor

so... is this ready to go in?

@keewis
Copy link
Contributor

keewis commented May 18, 2021

the code should be ready (if you don't count the integration test mentioned by @jthielen), but the docs are still missing. If you don't mind a separate PR for that I think this can be merged.

@dcherian dcherian changed the title WIP: Add unit support to cf-xarray (by way of copying MetPy's CF/UDUNITS-approximating Pint registry) Add unit support to cf-xarray May 18, 2021
@keewis keewis marked this pull request as ready for review May 18, 2021 15:48
@dcherian dcherian mentioned this pull request May 18, 2021
2 tasks
* upstream/main:
  Update README.rst
  Add earthcube 2021 notebook link to readme
  v0.5.2
  Some CMIP6 support
  Add pooch to binder environment (xarray-contrib#223)
  `add_bounds` uses `keys` rather than `dims` (xarray-contrib#221)
@dcherian dcherian merged commit 24bf4d3 into xarray-contrib:main May 18, 2021
@dcherian
Copy link
Contributor

Thanks @jthielen & @keewis . Excited to see this go in!

dcherian added a commit to jukent/cf-xarray that referenced this pull request Jun 8, 2021
* main: (57 commits)
  Add CITATION.cff, tributors, zenodo.json (xarray-contrib#231)
  Add zenodo badge
  Improve `rename_like` (xarray-contrib#222)
  Don't apply mappers to DataArrays (xarray-contrib#227)
  Add unit support to cf-xarray (xarray-contrib#197)
  Update README.rst
  Add earthcube 2021 notebook link to readme
  v0.5.2
  Some CMIP6 support
  Add pooch to binder environment (xarray-contrib#223)
  `add_bounds` uses `keys` rather than `dims` (xarray-contrib#221)
  Add .cf.formula_terms (xarray-contrib#213)
  Update whats-new.rst (xarray-contrib#217)
  add bounds property (xarray-contrib#214)
  Update some tests.
  Compile regexes
  Fix black
  Add __version__ (xarray-contrib#208)
  add skip to rename_like (xarray-contrib#206)
  Refactor out coordinate criteria to criteria.py (xarray-contrib#205)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CF units / udunits
4 participants