Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow string formatting of scalar DataArrays #5981

Merged
merged 8 commits into from
May 9, 2022

Conversation

fmaussion
Copy link
Member

@fmaussion fmaussion commented Nov 12, 2021

This is a first try at formatting dataarray scalars. Here is the current behavior:

In [1]: import xarray as xr
   ...: import numpy as np

In [2]: a = np.array(1)
   ...: da = xr.DataArray(a)

In [3]: print(a)
1

In [4]: print(da)
<xarray.DataArray ()>
array(1)

In [5]: print('{}'.format(a))
1

In [6]: print('{}'.format(da))
<xarray.DataArray ()>
array(1)

In [7]: print('{:.3f}'.format(a))
1.000

In [8]: print('{:.3f}'.format(da))
1.000

In [9]: a = np.array([1, 2])
   ...: da = xr.DataArray(a)

In [10]: print('{}'.format(a))
[1 2]

In [11]: print('{}'.format(da))
<xarray.DataArray (dim_0: 2)>
array([1, 2])
Dimensions without coordinates: dim_0

In [12]: print('{:.3f}'.format(a))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-c5afc7863e89> in <module>
----> 1 print('{:.3f}'.format(a))

TypeError: unsupported format string passed to numpy.ndarray.__format__

In [13]: print('{:.3f}'.format(da))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-bddebd8462bd> in <module>
----> 1 print('{:.3f}'.format(da))

~/disk/Dropbox/HomeDocs/git/xarray/xarray/core/common.py in __format__(self, format_spec)
    162             return formatting.array_repr(self)
    163         # Else why fall back to numpy
--> 164         return self.values.__format__(format_spec)
    165 
    166     def _iter(self: Any) -> Iterator[Any]:

TypeError: unsupported format string passed to numpy.ndarray.__format__

I don't think there is any backwards compatibility issue but lets see if the tests pass

@fmaussion
Copy link
Member Author

fmaussion commented Nov 12, 2021

Note that there would be a way to change the behavior to:

print(da) -> lengthy repr
print('{}'.format(da) -> fall back to numpy.__format__

This would break backwards compatibility, but I think it would be my preference - there are some chances that it brakes some code (in documentation pages maybe?), but I don't think that many people rely on '{}'.format(da) to return __repr__ ...

Comment on lines 418 to 426
assert var.__format__("") == "<xarray.DataArray ()>\narray(0)"
assert var.__format__("d") == "0"
assert var.__format__(".2f") == "0.00"

var = xr.DataArray([0])
assert var.__format__("") == (
"<xarray.DataArray (dim_0: 1)>\narray([0])"
"\nDimensions without coordinates: dim_0"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert var.__format__("") == "<xarray.DataArray ()>\narray(0)"
assert var.__format__("d") == "0"
assert var.__format__(".2f") == "0.00"
var = xr.DataArray([0])
assert var.__format__("") == (
"<xarray.DataArray (dim_0: 1)>\narray([0])"
"\nDimensions without coordinates: dim_0"
)
assert var.__format__("") == var.__repr__()
assert var.__format__("d") == "0"
assert var.__format__(".2f") == "0.00"
var = xr.DataArray([0])
assert var.__format__("") == var.__repr__()

repr style can change over time. I think this should work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks that makes sense!

However I think I would like to simplify this even more and always call ndarray.__format__ when DataArray.__format__ is called. This won't be backwards compatible but I think this would be better and more predictable.

@fmaussion
Copy link
Member Author

Side note: with this design (no call to repr() in format()) we would mimick numpy's behavior:

a = np.array([0.1, 0.2])
a.__format__("")
Out[16]: '[0.1 0.2]'
a.__repr__()
Out[17]: 'array([0.1, 0.2])'

However I'm quite surprised as to why .__format__("") works on non-scalar arrays but not with other specifiers - this is discussed somehow in numpy/numpy#5543 . I think it's okay for xarray to defer to whatever numpy is doing here, but of course we can also keep the status-quo, it's quite a small change after all.

@fmaussion
Copy link
Member Author

@Illviljan and other devs - I still think this small change would be a nice addition for accessibility and teaching. Let me know if we can purse or if I should close this.

@Illviljan Illviljan added the plan to merge Final call for comments label May 3, 2022
@Illviljan Illviljan merged commit bbb14a5 into pydata:main May 9, 2022
@Illviljan
Copy link
Contributor

Thanks @fmaussion!

dcherian added a commit to dcherian/xarray that referenced this pull request May 20, 2022
* main: (24 commits)
  Fix overflow issue in decode_cf_datetime for dtypes <= np.uint32 (pydata#6598)
  Enable flox in GroupBy and resample (pydata#5734)
  Add setuptools as dependency in ASV benchmark CI (pydata#6609)
  change polyval dim ordering (pydata#6601)
  re-add timedelta support for polyval (pydata#6599)
  Minor Dataset.map docstr clarification (pydata#6595)
  New inline_array kwarg for open_dataset (pydata#6566)
  Fix polyval overloads (pydata#6593)
  Restore old MultiIndex dropping behaviour (pydata#6592)
  [docs] add Dataset.assign_coords example (pydata#6336) (pydata#6558)
  Fix zarr append dtype checks (pydata#6476)
  Add missing space in exception message (pydata#6590)
  Doc Link to accessors list in extending-xarray.rst (pydata#6587)
  Fix Dataset/DataArray.isel with drop=True and scalar DataArray indexes (pydata#6579)
  Add some warnings about rechunking to the docs (pydata#6569)
  [pre-commit.ci] pre-commit autoupdate (pydata#6584)
  terminology.rst: fix link to Unidata's "netcdf_dataset_components" (pydata#6583)
  Allow string formatting of scalar DataArrays (pydata#5981)
  Fix mypy issues & reenable in tests (pydata#6581)
  polyval: Use Horner's algorithm + support chunked inputs (pydata#6548)
  ...
dcherian added a commit to headtr1ck/xarray that referenced this pull request May 20, 2022
commit 398f1b6
Author: dcherian <deepak@cherian.net>
Date:   Fri May 20 08:47:56 2022 -0600

    Backward compatibility dask

commit bde40e4
Merge: 0783df3 4cae8d0
Author: dcherian <deepak@cherian.net>
Date:   Fri May 20 07:54:48 2022 -0600

    Merge branch 'main' into dask-datetime-to-numeric

    * main:
      concatenate docs style (pydata#6621)
      Typing for open_dataset/array/mfdataset and to_netcdf/zarr (pydata#6612)
      {full,zeros,ones}_like typing (pydata#6611)

commit 0783df3
Merge: 5cff4f1 8de7061
Author: dcherian <deepak@cherian.net>
Date:   Sun May 15 21:03:50 2022 -0600

    Merge branch 'main' into dask-datetime-to-numeric

    * main: (24 commits)
      Fix overflow issue in decode_cf_datetime for dtypes <= np.uint32 (pydata#6598)
      Enable flox in GroupBy and resample (pydata#5734)
      Add setuptools as dependency in ASV benchmark CI (pydata#6609)
      change polyval dim ordering (pydata#6601)
      re-add timedelta support for polyval (pydata#6599)
      Minor Dataset.map docstr clarification (pydata#6595)
      New inline_array kwarg for open_dataset (pydata#6566)
      Fix polyval overloads (pydata#6593)
      Restore old MultiIndex dropping behaviour (pydata#6592)
      [docs] add Dataset.assign_coords example (pydata#6336) (pydata#6558)
      Fix zarr append dtype checks (pydata#6476)
      Add missing space in exception message (pydata#6590)
      Doc Link to accessors list in extending-xarray.rst (pydata#6587)
      Fix Dataset/DataArray.isel with drop=True and scalar DataArray indexes (pydata#6579)
      Add some warnings about rechunking to the docs (pydata#6569)
      [pre-commit.ci] pre-commit autoupdate (pydata#6584)
      terminology.rst: fix link to Unidata's "netcdf_dataset_components" (pydata#6583)
      Allow string formatting of scalar DataArrays (pydata#5981)
      Fix mypy issues & reenable in tests (pydata#6581)
      polyval: Use Horner's algorithm + support chunked inputs (pydata#6548)
      ...

commit 5cff4f1
Merge: dfe200d 6144c61
Author: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Date:   Sun May 1 15:16:33 2022 -0700

    Merge branch 'main' into dask-datetime-to-numeric

commit dfe200d
Author: dcherian <deepak@cherian.net>
Date:   Sun May 1 11:04:03 2022 -0600

    Minor cleanup

commit 35ed378
Author: dcherian <deepak@cherian.net>
Date:   Sun May 1 10:57:36 2022 -0600

    Support dask arrays in datetime_to_numeric
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants