Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: testing.assert_* check dtype #4760

Closed
wants to merge 27 commits into from

Conversation

mathause
Copy link
Collaborator

@mathause mathause commented Jan 4, 2021


This adds a dtype check for equal, identical, broadcast_equal, and the xr.testing.assert_* functions. It is far from complete: tests and documentation are still missing, but I wanted to get it online for feedback.

When I set check_dtype=True there are around 600 failures. Fixing that is for another PR. #4759 should help a bit.

  • I added the checks to lazy_array_equiv, however, sometimes dask can get the dtype wrong before the compute (see below). Do you think I need to put it in the non-lazy part?
import numpy as np
import xarray as xr

da = xr.DataArray(np.array([0, np.nan], dtype=object)).chunk()

da.prod().dtype # -> dtype('O')
da.prod().compute().dtype # -> dtype('int64')
  • check_dtype is still missing from assert_duckarray_allclose & assert_duckarray_equal - do you think there are required?

  • The dtypes of array elements are not tested (see below). I don't think I'll implement that here.

da0 = xr.DataArray(np.array([0], dtype=object))
da1 = xr.DataArray(np.array([0.], dtype=object))

xr.testting.assert_equal(da0, da1, check_dtype=True)

@dcherian
Copy link
Contributor

dcherian commented Jan 4, 2021

sometimes dask can get the dtype wrong before the compute (see below)

I suspect this is only for object arrays since you need not get back an object array? Maybe check in lazy_array_equiv only if all dtypes are not object

@mathause
Copy link
Collaborator Author

mathause commented Feb 5, 2021

@keewis The following test is giving me trouble: pytest xarray/tests/test_units.py::test_align_dataarray[int64-10-data-compatible_unit]

def test_align_dataarray(value, variant, unit, error, dtype):

I now try to reproduce what it does but I cannot even manage that... I think the following is what the test does but it fails with DimensionalityError: Cannot convert from 'dimensionless' to 'meter' - can you see what I do wrong/ different from the test?

import numpy as np
import pint
import xarray as xr

unit_registry = pint.UnitRegistry()

dtype = np.int64

data_unit1 = unit_registry.m
data_unit2 = unit_registry.mm

array1 = np.linspace(0, 10, 2 * 5).reshape(2, 5).astype(dtype)
array2 = np.linspace(0, 8, 2 * 5).reshape(2, 5).astype(dtype)

x = np.arange(2) # * dim_unit1
y1 = np.arange(5) # * dim_unit1
y2 = np.arange(2, 7) # * dim_unit2

u1 = np.array([3, 5, 7, 8, 9]) # * coord_unit1
u2 = np.array([7, 8, 9, 11, 13])  # * coord_unit2

coords1 = {"x": x, "y": y1}
coords2 = {"x": x, "y": y2}

data_array1 = xr.DataArray(data=array1 * data_unit1, coords=coords1, dims=("x", "y"))
data_array2 = xr.DataArray(data=array2 * data_unit2, coords=coords2, dims=("x", "y"))

fill_value = 10

actual_a, actual_b = xr.align(data_array1, data_array2, join="outer", fill_value=fill_value * data_unit1)

The underlying problem is that when I run this test with check_dtype=True it raises an error (which is fine) but instead of a nice diff_repr it elevates the UnitStrippedWarning to an error and I have not the slightest clue why...

>       assert_allclose(expected_b, actual_b)

/home/mathause/code/xarray/xarray/tests/test_units.py:542: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/mathause/code/xarray/xarray/tests/__init__.py:208: in assert_allclose
    xarray.testing.assert_allclose(a, b, check_dtype=check_dtype, **kwargs)
/home/mathause/code/xarray/xarray/testing.py:178: in assert_allclose
    assert allclose, formatting.diff_array_repr(
/home/mathause/code/xarray/xarray/core/formatting.py:667: in diff_array_repr
    if not equiv(a, b, check_dtype=check_dtype):
/home/mathause/code/xarray/xarray/testing.py:40: in _data_allclose_or_equiv
    return duck_array_ops.allclose_or_equiv(
/home/mathause/code/xarray/xarray/core/duck_array_ops.py:249: in allclose_or_equiv
    arr1 = asarray(arr1)
/home/mathause/code/xarray/xarray/core/duck_array_ops.py:184: in asarray
    return data if is_duck_array(data) else xp.asarray(data)
/home/mathause/conda/envs/xarray_dev/lib/python3.8/site-packages/numpy/core/_asarray.py:102: in asarray
    return array(a, dtype, copy=False, order=order)
/home/mathause/code/xarray/xarray/core/common.py:139: in __array__
    return np.asarray(self.values, dtype=dtype)
/home/mathause/code/xarray/xarray/core/dataarray.py:634: in values
    return self.variable.values
/home/mathause/code/xarray/xarray/core/variable.py:544: in values
    return _as_array_or_item(self._data)
/home/mathause/code/xarray/xarray/core/variable.py:276: in _as_array_or_item
    data = np.asarray(data)
/home/mathause/conda/envs/xarray_dev/lib/python3.8/site-packages/numpy/core/_asarray.py:102: in asarray
    return array(a, dtype, copy=False, order=order)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Quantity([[10. 10.  0.  0.  1.  2.  3.]
 [10. 10.  4.  5.  6.  7.  8.]], 'millimeter')>, t = None

    def __array__(self, t=None):
>       warnings.warn(
            "The unit of the quantity is stripped when downcasting to ndarray.",
            UnitStrippedWarning,
            stacklevel=2,
        )
E       pint.errors.UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.

/home/mathause/conda/envs/xarray_dev/lib/python3.8/site-packages/pint/quantity.py:1683: UnitStrippedWarning

@keewis
Copy link
Collaborator

keewis commented Feb 5, 2021

the difference is that the unit registry has force_ndarray_like enabled. This is important because a pint.Quantity wrapping a python scalar is not a duck array, so the unit gets stripped.

@mathause
Copy link
Collaborator Author

mathause commented Feb 5, 2021

Thanks! That drove me nuts but I finally figured out the problem:

pytest.mark.filterwarnings("error::pint.UnitStrippedWarning"),

promotes the warning to an error so diff_array_repr fails (because the units are stripped...) and that is what we see. That also fails on master - run the following in with pytest:

import pint
import pytest
import xarray as xr

pytestmark = [
    pytest.mark.filterwarnings("error::pint.UnitStrippedWarning"),
]

unit_registry = pint.UnitRegistry(force_ndarray=True)


def test_y():

    data_array1 = xr.DataArray(data=1 * unit_registry.m)
    data_array2 = xr.DataArray(data=2 * unit_registry.m)

    xr.testing.assert_allclose(data_array1, data_array2)

@keewis do you have an idea how to fix this? One idea is to supress warnings in diff_array_repr and diff_dataset_repr. Do we need to strip the units before sending them to these functions?

@keewis
Copy link
Collaborator

keewis commented Feb 5, 2021

to really fix this we need some way to say "I really want to convert to a numpy.ndarray" (unfortunately, np.asarray seems to be commonly used as "ensure that this is a numpy.ndarray"). That's the same issue as with cupy.Array, which will raise in __array__.

@mathause
Copy link
Collaborator Author

mathause commented Feb 5, 2021

True, but for the problem that the diff repr is not shown I think it makes sense to go for this, e.g. the following works fine:

def test_x():

    data_array1 = xr.DataArray(data=1 * unit_registry.m)
    data_array2 = xr.DataArray(data=2 * unit_registry.m)

    with warnings.catch_warnings():
        warnings.simplefilter("always")

        xr.testing.assert_allclose(data_array1, data_array2)

@keewis
Copy link
Collaborator

keewis commented Feb 5, 2021

yes, I agree that it makes sense to set warnings to "always" in the assert_* functions.

@mathause mathause closed this Oct 18, 2021
@mathause mathause deleted the assert_check_dtype branch October 18, 2021 14:06
@mathause mathause restored the assert_check_dtype branch October 18, 2021 14:14
@mathause
Copy link
Collaborator Author

I closed the PR by accident but I think I'll keep it closed, unless there is interest to get this in. Testing dtypes could be an interesting option, however,

  • there are O(1000) tests that do not preserve dtype - so this would be an enormous cleanup required if we ever seriously wanted to do this
  • some elements cannot preserve dtypes (str types in pandas index (not sure what the index refactor does here) and empty str arrays)

@mathause mathause deleted the assert_check_dtype branch January 26, 2022 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xr.testing.assert_equal does not test for dtype
3 participants