Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute style access is slow #4741

Closed
rhkleijn opened this issue Dec 30, 2020 · 2 comments · Fixed by #4742
Closed

Attribute style access is slow #4741

rhkleijn opened this issue Dec 30, 2020 · 2 comments · Fixed by #4742

Comments

@rhkleijn
Copy link
Contributor

I appreciate xarray's ability to use attribute style access ds.foo as an alternative to ds["foo"] as it requires less characters/keystrokes and has less 'visual clutter'.

A drawback is that it can be much slower as lookup time seems to display O(n) behaviour instead of O(1) with n being the number of variables in the dataset. For e.g. n=100 it is approximately 100 times slower than dictionary-style access:

# Dataset with many (100) variables
ds = xr.Dataset({f'var{v}': [] for v in range(100)})

%timeit ds['var0']
462 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit ds.var0
47.1 ms ± 205 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

dir() and _ipython_key_completions_() which are used for e.g. tab completion in iPython are equally slow:

%timeit dir(ds)
47 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit ds._ipython_key_completions_()
46.8 ms ± 210 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

I would like to see xarray having much better performance for attribute style access.

@rhkleijn
Copy link
Contributor Author

With some modest refactoring in https://github.com/rhkleijn/xarray/tree/faster-attr-access I managed to speed up attribute style access, dir() and _ipython_key_completions_ (in this case ~100 fold) by using a more lazy approach and especially avoiding the eager {d: self[d] for d in self.dims} which constructs many (mostly unneeded) DataArray objects.

%timeit ds.var0
468 µs ± 1.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit dir(ds)
499 µs ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit ds._ipython_key_completions_()
242 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Shall I open a pull request for this?

@dcherian
Copy link
Contributor

Yes please! This looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants