-
-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve dot performance #389
Conversation
Update core.py
Raises an error for negative dimensions.
Co-Authored-By: Hameer Abbasi <einstein.edison@gmail.com>
LGTM caught some valid concerns -- Could you address those? |
And the code coverage is a bit lacking. 😕 |
Thanks @rgommers, I'll make sure to do that going forward. |
With regards to the sorting, the issue is that the dot algorithm produces results that are not sorted:
The indices for each row of the resulting matrix are out-of-order. So I added something like this:
|
@hameerabbasi, yes I'll see if I can fix those. |
This pull request introduces 2 alerts and fixes 1 when merging 5b4f08a into e682a8b - view on LGTM.com new alerts:
fixed alerts:
|
This pull request introduces 1 alert and fixes 1 when merging eb8c0e5 into e682a8b - view on LGTM.com new alerts:
fixed alerts:
|
This pull request introduces 1 alert and fixes 1 when merging ae63040 into e682a8b - view on LGTM.com new alerts:
fixed alerts:
|
Here's an initial mark:
|
sparse/_common.py
Outdated
# ensure sorted indices | ||
order = np.argsort(indices[indptr[i] : indptr[i + 1]]) | ||
data[indptr[i] : indptr[i + 1]] = data[indptr[i] : indptr[i + 1]][order] | ||
indices[indptr[i] : indptr[i + 1]] = indices[indptr[i] : indptr[i + 1]][ | ||
order | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we possibly sort before putting in the data? I suspect it'll be a lot faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you mean. Could you give an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the original CSR-CSR multiplication algorithm doesn't require sorting, correct? I suspect this one only requires it because you're passing in unsorted data in the first place (or iterating over it in a way that's unsorted).
Could you possibly pre-sort the data so the output here will be sorted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original CSR-CSR multiplication algorithm iterates over the data in a way that's unsorted. In scipy, the resulting csr_matrix has indices that are out-of-order:
In [13]: xs.has_sorted_indices
Out[13]: 1
In [14]: res = xs @ ys
In [15]: res.has_sorted_indices
Out[15]: 0
Since many of the operations for COO and GCXS rely on maintaining sorted coordinates, I was thinking it would be best to ensure that the result of dot has sorted coords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case could you post a benchmark which includes sorting for the SciPy version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%%timeit
x @ y # coo coo
10.2 ms ± 51.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
res = xs @ ys
res.sort_indices()
4.05 ms ± 8.83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
I'm going to test doing just one sort at the end to see how that does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%%timeit
x @ y # coo coo - one sort at the end
7.55 ms ± 96.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
res = xs @ ys
res.sort_indices()
4.04 ms ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It looks like doing one sort at the end is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also post some GCXS-GCXS benchmarks for direct comparison with SciPy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%%timeit
x @ y # csr csr
6.32 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
x = x.change_compressed_axes([1])
y = y.change_compressed_axes([1])
x @ y
%%timeit
x @ y # csc csc
6.11 ms ± 46.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
x = x.tocoo()
y = y.tocoo()
x @ y
%%timeit
x @ y # coo coo
7.23 ms ± 93.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
res = xs @ ys
res.sort_indices()
4.08 ms ± 49.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
sparse/tests/test_dot.py
Outdated
assert_eq(sparse.dot(sa, b), sparse.dot(a, sb)) | ||
assert_eq(np.dot(a, b), sparse.dot(sa, sb)) | ||
|
||
if hasattr(operator, "matmul"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a compat check for Python 2. It can be removed.
sparse/tests/test_io.py
Outdated
def test_save_load_npz_file(compression): | ||
x = sparse.random((2, 3, 4, 5), density=0.25) | ||
@pytest.mark.parametrize("format", ["coo", "gcxs"]) | ||
def test_save_load_npz_file(compression, format): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rewrite this using tmp_path
: https://docs.pytest.org/en/latest/tmpdir.html#the-tmp-path-fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the next test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few changes. Looks pretty awesome overall!
Thanks so much for the review, @hameerabbasi. I think those were really helpful changes. Is there anything else you'd like to see? |
sparse/tests/test_dot.py
Outdated
if hasattr(operator, "matmul"): | ||
# Basic equivalences | ||
assert_eq(operator.matmul(a, b), operator.matmul(sa, sb)) | ||
# Test that SOO's and np.array's combine correctly | ||
# Not possible due to https://github.com/numpy/numpy/issues/9028 | ||
# assert_eq(eval("a @ sb"), eval("sa @ b")) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added back, only the "if" was unnecessary.
I have one final question: Does And one possibility for future work: Writing a sparse-dense matrix multiplication kernel separately. |
No, the default behavior is the same. I did make it so that GCXS @ COO results in a GCXS though.
+1 I'm away from my computer the next couple of days so I apologize if I don't get back right away about anymore suggestions. |
Thanks, @daletovar! |
improves dot for two COO arrays, addresses Performance bencmarks comparison with scipy.sparse #331
add dot for GCXS arrays
csr @ csr
andcsc @ csc
csr @ csc
dot product-based algorithm was just to slow, so we convert one of the arrays.add io for GCXS arrays