Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas=2.0 support #7724

Merged
merged 29 commits into from
Apr 12, 2023
Merged

pandas=2.0 support #7724

merged 29 commits into from
Apr 12, 2023

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Apr 5, 2023

As mentioned in #7716 (comment), this tries to unpin pandas.

@keewis keewis changed the title try unpinning pandas try to unpin pandas Apr 5, 2023
@github-actions github-actions bot added CI Continuous Integration tools dependencies Pull requests that update a dependency file labels Apr 5, 2023
@keewis
Copy link
Collaborator Author

keewis commented Apr 6, 2023

It seems this fixes the failing tests unrelated to the datetime64 range.

Regarding the failing doctests, I decided to cast the values of arr.dt.<field> to int64 to keep the API unchanged. However, if we decide to follow pandas and return int32 I'm happy to make that change.

Copy link
Collaborator Author

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's some comments for the individual changes to make reviewing easier

Comment on lines 1040 to 1049
pytest.param(
np.array([0.0, 0.111, 0.222, 0.333], dtype="float16"),
slice(1, 3),
id="float16",
marks=[
pytest.mark.skipif(
has_pandas_version_two, reason="not supported for pandas >= 2.0"
)
],
),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main change here: we skip the float16 check with pandas>=2.0. Another option would be to change the code to explicitly cast to float32 / float64, then remove the skipif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit cast sounds good to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to which precision should we cast? float64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, IIUC these arrays are converted to pandas Indexes, which used to upcast to float64 always.

Perhaps we should upcast for backwards compatibility and raise a DeprecationWarning asking the user to cast explicitly

Copy link
Collaborator Author

@keewis keewis Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this turns out not to be as simple as I thought it would be: the cast to PandasIndex (and thus pandas.Index) happens in multiple places. First is the DataArray / Dataset constructor, which through several layers calls safe_cast_to_index, where the actual cast to PandasIndex happens. The second cast happens when selecting using a array of values. In that case, get_indexer_nd calls index.get_indexer (pandas.Index.get_indexer), which appears to create a index from the indexer values.

Should both casts emit the DeprecationWarning? I think the second might be a implementation detail and that we should just cast the indexer values from float16 to float64 (not sure, though).

cc @benbovy

Comment on lines +28 to +29
[np.array(["a"]), np.array(["b"]), np.array(["a", "b"])],
[np.array([1], dtype="int64"), np.array([2], dtype="int64"), pd.Index([1, 2])],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, the idea is to avoid the different default precision on windows by explicitly setting the int precision on construction

xarray/tests/test_accessor_dt.py Show resolved Hide resolved
xarray/core/accessor_dt.py Show resolved Hide resolved
xarray/tests/test_accessor_dt.py Show resolved Hide resolved
@keewis keewis changed the title try to unpin pandas support pandas=2.0 Apr 7, 2023
@keewis keewis changed the title support pandas=2.0 pandas=2.0 support Apr 7, 2023
@keewis
Copy link
Collaborator Author

keewis commented Apr 7, 2023

@Illviljan, @headtr1ck, I just noticed that the CI version of mypy is pinned to <0.990, but the pre-commit hook is at 1.1.1. Is that intentional / known? Running the hook also exposes quite a few typing errors.

Otherwise this should be ready for a final review, and the failing datetime-related tests will be fixed by #7731.

@headtr1ck
Copy link
Collaborator

@Illviljan, @headtr1ck, I just noticed that the CI version of mypy is pinned to <0.990, but the pre-commit hook is at 1.1.1. Is that intentional / known? Running the hook also exposes quite a few typing errors.

I think this is because of #7270

The pre-commit hook of mypy should be disabled anyway because it takes too long to run (should be actually much faster since mypy >=1).

I think we could check if the newest mypy still segfaults or not...

@keewis
Copy link
Collaborator Author

keewis commented Apr 9, 2023

I think we could check if the newest mypy still segfaults or not...

it didn't for me when I ran the hook, but that might just have been luck. To confirm, we'd need a PR that unpins mypy and potentially fixes all the errors that popped up since.

The mypy hook didn't take that long, but I guess it's the accumulated time that counts. Also, I learned today that the black-jupyter hook does all that black does plus format notebooks, so removing id: black should also shave off a bit.

@headtr1ck
Copy link
Collaborator

Locally it always works, it segfaults in CI, which makes it impossible to debug.

@jsignell
Copy link
Contributor

Is there anything I can do to help out on this? It sounds like the blocker is mypy?

@keewis keewis added the plan to merge Final call for comments label Apr 11, 2023
@keewis
Copy link
Collaborator Author

keewis commented Apr 11, 2023

no, mypy should be fine, as the CI version does not complain and I assume whatever the hook is reporting is also present on main (I didn't check, though).

As far as I can tell, the only thing left is another round of reviews.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dcherian
Copy link
Contributor

I think we can double check that the only failures are cftimeindex, restore the pin, then merge, and then remove the pin in #7731

@dcherian dcherian merged commit db2d414 into pydata:main Apr 12, 2023
@keewis keewis deleted the unpin-pandas branch April 12, 2023 13:24
dcherian added a commit to dcherian/xarray that referenced this pull request Apr 18, 2023
* main: (34 commits)
  Update whats-new.rst
  Fix binning by unsorted array (pydata#7762)
  Bump codecov/codecov-action from 3.1.1 to 3.1.2 (pydata#7760)
  Fix typing errors using mypy 1.2 (pydata#7752)
  [skip-ci] dev whats-new
  Add whats-new for v2023.04.0 (pydata#7757)
  remove the `black` hook (pydata#7756)
  reword the what's new entry for the `pandas` 2.0 dtype changes (pydata#7755)
  restructure the contributing guide (pydata#7681)
  Continue to use nanosecond-precision Timestamps in precision-sensitive areas (pydata#7731)
  minor doc updates to clarify extensions using accessors (pydata#7751)
  align: Avoid reindexing when join="exact" (pydata#7736)
  `pandas=2.0` support (pydata#7724)
  Clarify vectorized indexing documentation (pydata#7747)
  Avoid recasting a CFTimeIndex (pydata#7735)
  fix typo (pydata#7746)
  [pre-commit.ci] pre-commit autoupdate (pydata#7745)
  Bump pypa/gh-action-pypi-publish from 1.8.4 to 1.8.5 (pydata#7743)
  preserve boolean dtype in encoding (pydata#7720)
  [skip-ci] Add alignment benchmarks (pydata#7738)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration tools dependencies Pull requests that update a dependency file plan to merge Final call for comments topic-indexing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants