Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dot performance #389

Merged
merged 80 commits into from
Aug 9, 2020
Merged

Improve dot performance #389

merged 80 commits into from
Aug 9, 2020

Conversation

daletovar
Copy link
Contributor

  • improves dot for two COO arrays, addresses Performance bencmarks comparison with scipy.sparse #331

    • I used the same algorithm that scipy uses.
    • It's a bit slower than scipy because to ensure sorted coordinates, we have to do some sorting.
  • add dot for GCXS arrays

    • Uses the same algorithm.
    • Works for csr @ csr and csc @ csc
    • I found that the csr @ csc dot product-based algorithm was just to slow, so we convert one of the arrays.
  • add io for GCXS arrays

@hameerabbasi
Copy link
Collaborator

LGTM caught some valid concerns -- Could you address those?

@hameerabbasi
Copy link
Collaborator

And the code coverage is a bit lacking. 😕

@daletovar
Copy link
Contributor Author

Thanks @rgommers, I'll make sure to do that going forward.

@daletovar
Copy link
Contributor Author

With regards to the sorting, the issue is that the dot algorithm produces results that are not sorted:

In [1]: import scipy.sparse as ss
In [2]: x = ss.random(1000,1000,density=.2,format='csr')
In [3]: y = ss.random(1000,1000,density=.2,format='csr')
In [4]: res = x @ y
In [5]: res.indptr[1]
Out[5]: 1000
In [6]: res.indices
Out[6]: array([554, 328, 723, ...,  14,   7,   6], dtype=int32)

The indices for each row of the resulting matrix are out-of-order. So I added something like this:

for start, stop in zip(res.indptr[:-1], res.indptr[1:]):
    order = np.argsort(res.indices[start:stop])
    res.data[start:stop] = res.data[start:stop][order]
    res.indices[start:stop] = res.indices[start:stop][order]

@daletovar
Copy link
Contributor Author

@hameerabbasi, yes I'll see if I can fix those.

@lgtm-com
Copy link
Contributor

lgtm-com bot commented Aug 5, 2020

This pull request introduces 2 alerts and fixes 1 when merging 5b4f08a into e682a8b - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import

fixed alerts:

  • 1 for Unused import

@lgtm-com
Copy link
Contributor

lgtm-com bot commented Aug 5, 2020

This pull request introduces 1 alert and fixes 1 when merging eb8c0e5 into e682a8b - view on LGTM.com

new alerts:

  • 1 for Unused import

fixed alerts:

  • 1 for Unused import

@lgtm-com
Copy link
Contributor

lgtm-com bot commented Aug 5, 2020

This pull request introduces 1 alert and fixes 1 when merging ae63040 into e682a8b - view on LGTM.com

new alerts:

  • 1 for Unused import

fixed alerts:

  • 1 for Unused import

@daletovar
Copy link
Contributor Author

Here's an initial mark:

In [2]: x = sparse.random((5000, 5000), density=0.001)
In [3]: y = sparse.random((5000, 5000), density=0.001)
In [4]: xs = x.tocsr()
In [5]: ys = y.tocsr()
In [6]: x @ y # jit
Out[6]: <COO: shape=(5000, 5000), dtype=float64, nnz=124139, fill_value=0.0>
In [7]: %timeit x @ y
10.5 ms ± 104 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: %timeit xs @ ys
1.14 ms ± 2.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Comment on lines 642 to 647
# ensure sorted indices
order = np.argsort(indices[indptr[i] : indptr[i + 1]])
data[indptr[i] : indptr[i + 1]] = data[indptr[i] : indptr[i + 1]][order]
indices[indptr[i] : indptr[i + 1]] = indices[indptr[i] : indptr[i + 1]][
order
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we possibly sort before putting in the data? I suspect it'll be a lot faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you mean. Could you give an example?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the original CSR-CSR multiplication algorithm doesn't require sorting, correct? I suspect this one only requires it because you're passing in unsorted data in the first place (or iterating over it in a way that's unsorted).

Could you possibly pre-sort the data so the output here will be sorted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original CSR-CSR multiplication algorithm iterates over the data in a way that's unsorted. In scipy, the resulting csr_matrix has indices that are out-of-order:

In [13]: xs.has_sorted_indices
Out[13]: 1
In [14]: res = xs @ ys
In [15]: res.has_sorted_indices
Out[15]: 0

Since many of the operations for COO and GCXS rely on maintaining sorted coordinates, I was thinking it would be best to ensure that the result of dot has sorted coords.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case could you post a benchmark which includes sorting for the SciPy version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%%timeit
x @ y # coo coo
10.2 ms ± 51.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
res = xs @ ys
res.sort_indices()
4.05 ms ± 8.83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I'm going to test doing just one sort at the end to see how that does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%%timeit
x @ y # coo coo - one sort at the end
7.55 ms ± 96.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
res = xs @ ys
res.sort_indices()
4.04 ms ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It looks like doing one sort at the end is better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also post some GCXS-GCXS benchmarks for direct comparison with SciPy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%%timeit
x @ y # csr csr
6.32 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

x = x.change_compressed_axes([1])
y = y.change_compressed_axes([1])
x @ y

%%timeit
x @ y # csc csc
6.11 ms ± 46.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

x = x.tocoo()
y = y.tocoo()
x @ y

%%timeit
x @ y # coo coo
7.23 ms ± 93.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
res = xs @ ys
res.sort_indices()
4.08 ms ± 49.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

assert_eq(sparse.dot(sa, b), sparse.dot(a, sb))
assert_eq(np.dot(a, b), sparse.dot(sa, sb))

if hasattr(operator, "matmul"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a compat check for Python 2. It can be removed.

def test_save_load_npz_file(compression):
x = sparse.random((2, 3, 4, 5), density=0.25)
@pytest.mark.parametrize("format", ["coo", "gcxs"])
def test_save_load_npz_file(compression, format):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the next test.

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes. Looks pretty awesome overall!

@daletovar
Copy link
Contributor Author

Thanks so much for the review, @hameerabbasi. I think those were really helpful changes. Is there anything else you'd like to see?

Comment on lines 200 to 207
if hasattr(operator, "matmul"):
# Basic equivalences
assert_eq(operator.matmul(a, b), operator.matmul(sa, sb))
# Test that SOO's and np.array's combine correctly
# Not possible due to https://github.com/numpy/numpy/issues/9028
# assert_eq(eval("a @ sb"), eval("sa @ b"))


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added back, only the "if" was unnecessary.

@hameerabbasi
Copy link
Collaborator

hameerabbasi commented Aug 8, 2020

I have one final question: Does dot now return GCXS by default for COO/ndarray inputs? If so, we should change this.

And one possibility for future work: Writing a sparse-dense matrix multiplication kernel separately.

@daletovar
Copy link
Contributor Author

No, the default behavior is the same. I did make it so that GCXS @ COO results in a GCXS though.

And one possibility for future work: Writing a sparse-dense matrix multiplication kernel separately.

+1

I'm away from my computer the next couple of days so I apologize if I don't get back right away about anymore suggestions.

@hameerabbasi hameerabbasi merged commit 3a6d955 into pydata:master Aug 9, 2020
@hameerabbasi
Copy link
Collaborator

Thanks, @daletovar!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants