-
-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of GCXS dot ndarray #643
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes! Would you be willing to write benchmarks (examples in the benchmarks
directory, see https://asv.readthedocs.io/en/stable/writing_benchmarks.html)
Sure! |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #643 +/- ##
==========================================
- Coverage 90.22% 90.21% -0.02%
==========================================
Files 20 20
Lines 3674 3670 -4
==========================================
- Hits 3315 3311 -4
Misses 359 359 |
Improves the memory access pattern for
GCXS
arrays dotndarray
s.Was running a few benchmarks of the GCXS class against scipy's csr/csc arrays and noticed that they were significantly slower. So was curious what the biggest difference was as their structures shouldn't be too fundamentally different. When looking at the csr_dot_ndarray code and the csc_dot_ndarray code I realized that the access pattern of the arrays was inefficient.
Here are some of the relevant timings of things:
Scipy Timing:
Sparse timing:
Timing on main:
Timing on PR:
The left ndarray dot GCXS arrays are still a little slower than I'd like, but couldn't quite see where to dive into that at, as it looks like it's doing what it should be for the matvec operation (basically just doing
(GCXS.T @ x.T).T
)One other thing, for 1D ndarrays numba doesn't seem to be able to optimize these as well as it could (considering benchmarks with scipy csr/csc times a 1D array). There should likely be a branching for 1D ndarrays to use a version that doesn't include the loop over the second dimension.