Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster interp #4740

Merged
merged 13 commits into from
May 5, 2021
Merged

Faster interp #4740

merged 13 commits into from
May 5, 2021

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Dec 29, 2020

@Illviljan
Copy link
Contributor

Looking good!

One thing I notice now with these changes is that ndim is now creeping up as the 2nd worst time consumer:
bild

An easy improvement to this is adding ndim = var.ndim and replacing all the var.ndim around interp_func. That reduces the number of calls from 68000 to 58000.
Another more difficult idea is to improve the dask arrays shape-calculation. Perhaps saving the value and only do the cached_cumsum call when the chunks actually has changed?

@dcherian
Copy link
Contributor Author

Another more difficult idea is to improve the dask arrays shape-calculation. Perhaps saving the value and only do the cached_cumsum call when the chunks actually has changed?

Thanks for opening an issue on the dask side.

On the xarray side, we want to cache some of these properties, see #3514

@Illviljan
Copy link
Contributor

Well, I think this is good enough. It's quite an improvement already and I think the error will disappear if the CI is rerun.

@Illviljan Illviljan mentioned this pull request Feb 14, 2021
6 tasks
@max-sixty
Copy link
Collaborator

Thanks for pinging @Illviljan . Looks good from my side too.

@Illviljan what's that profiling tool you're using? The hierarchy of functions looks useful!

@Illviljan
Copy link
Contributor

@max-sixty It's the profiler Spyder has. It's been quite useful for hunting down various bottlenecks but it has its quirks as well.

Illviljan added a commit to Illviljan/xarray that referenced this pull request Mar 7, 2021
@Illviljan Illviljan mentioned this pull request Apr 30, 2021
13 tasks
Comment on lines +2965 to +2974
validated_indexers = {
k: _validate_interp_indexer(maybe_variable(obj, k), v)
for k, v in indexers.items()
}

# optimization: subset to coordinate range of the target index
if method in ["linear", "nearest"]:
for k, v in validated_indexers.items():
obj, newidx = missing._localize(obj, {k: v})
validated_indexers[k] = newidx[k]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
validated_indexers = {
k: _validate_interp_indexer(maybe_variable(obj, k), v)
for k, v in indexers.items()
}
# optimization: subset to coordinate range of the target index
if method in ["linear", "nearest"]:
for k, v in validated_indexers.items():
obj, newidx = missing._localize(obj, {k: v})
validated_indexers[k] = newidx[k]
validated_indexers = dict()
method_is_localizeable = method in ["linear", "nearest"]
for k, v in indexers.items():
validated_indexer = _validate_interp_indexer(maybe_variable(obj, k), v)
# optimization: subset to coordinate range of the target index:
if method_is_localizeable:
obj, newidx = missing._localize(obj, {k: validated_indexer})
validated_indexer = newidx[k]
validated_indexers[k] = validated_indexer

Could do something like this to avoid the mutating loops.

@dcherian
Copy link
Contributor Author

dcherian commented May 4, 2021

Test failure is real and looks like some bug with dtype vs meta in our min dask version. I've bumped dask to see if it fixes it.

This reverts commit a6cd732.
@dcherian
Copy link
Contributor Author

dcherian commented May 4, 2021

OK should be good to merge. We can set the meta in a future version (It did not work on 2.15.0 which is the dask version we can bump to right now)

xarray/core/missing.py Outdated Show resolved Hide resolved
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
@pep8speaks
Copy link

pep8speaks commented May 4, 2021

Hello @dcherian! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-04 16:42:00 UTC

@alexamici alexamici mentioned this pull request May 5, 2021
2 tasks
@dcherian dcherian merged commit 7432032 into pydata:master May 5, 2021
@dcherian dcherian deleted the faster-interp branch May 5, 2021 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow initilization of dataset.interp
5 participants