Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve concat performance #7824

Merged
merged 61 commits into from
Jun 2, 2023
Merged

Improve concat performance #7824

merged 61 commits into from
Jun 2, 2023

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented May 7, 2023

  • Don't use python for loops for possibly large coords. Rather create a np array once then filter out bad data.
  • DuckArrayModule is slightly slow, so cache the first import in a dict instead to speed up later calls.
  • Add more typing to be more confident that inputs are valid and then remove redundant checks and conversions.

@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label May 7, 2023
@github-actions github-actions bot added topic-arrays related to flexible array support topic-performance labels May 7, 2023
xarray/core/variable.py Outdated Show resolved Hide resolved
xarray/core/merge.py Outdated Show resolved Hide resolved
asv_bench/benchmarks/combine.py Outdated Show resolved Hide resolved
asv_bench/benchmarks/combine.py Outdated Show resolved Hide resolved
asv_bench/benchmarks/combine.py Outdated Show resolved Hide resolved
xarray/core/concat.py Outdated Show resolved Hide resolved
doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/core/concat.py Show resolved Hide resolved
xarray/core/dataset.py Show resolved Hide resolved
xarray/core/indexes.py Outdated Show resolved Hide resolved
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @Illviljan Great PR!

@Illviljan Illviljan added the plan to merge Final call for comments label Jun 2, 2023
@dcherian dcherian merged commit c9d89e2 into pydata:main Jun 2, 2023
@dcherian dcherian mentioned this pull request Jun 15, 2023
19 tasks
dstansby pushed a commit to dstansby/xarray that referenced this pull request Jun 28, 2023
* 1. var_idx very slow

* 2. slow any

* Add test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 3. Slow array_type called multiple times

* 4. Can use fastpath for variable.concat?

* 5. slow init of pd.unique

* typos

* Update concat.py

* Update merge.py

* 6. Avoid recalculating in loops

* 7. No need to transpose 1d arrays.

* 8. speed up dask_dataframe

* Update dataset.py

* Update dataset.py

* Update dataset.py

* Add dask combine test with many variables

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update combine.py

* Update combine.py

* Update combine.py

* list not needed

* dim is usually string, might be faster to check for that

* first_var.dims doesn't change and can be defined 1 time

* mask bad points rather than append good points

* reduce duplicated code

* don't think id() is required here.

* get dtype directly instead of through result_dtype

* seems better to delete rather than append,

* use internal fastpath if it's a dataset, values should be fine then

* Change isinstance order.

* use fastpath if already xarray objtect

* Update variable.py

* Update dtypes.py

* typing fixes

* more typing fixes

* test undoing as_compatible_data

* undo concat_dim_length deletion

* Update xarray/core/concat.py

* Remove .copy and sum

* Update concat.py

* Use OrderedSet

* Apply suggestions from code review

* Update whats-new.rst

* Update xarray/core/concat.py

* no need to check arrays if cupy isnt even installed

* Update whats-new.rst

* Add concat comment

* minimize diff

* revert sketchy

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow topic-arrays related to flexible array support topic-combine combine/concat/merge topic-indexing topic-performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow performance of concat()
4 participants