Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow for use of different dtypes in coords, indices, and indptr #441

Merged
merged 3 commits into from
Mar 16, 2021

Conversation

daletovar
Copy link
Contributor

In this version, I've added a storage_dtype argument in the COO and GCXS constructors. For most operations, the result array will use the same storage dtype as the input array. There are a few places where the output is upcast to a larger dtype if the input dtype is too small. I decided to do this because in these situations no error is raised and the output is simply wrong. Where it doesn't make sense to upcast, I have it raise an error.

@codecov
Copy link

codecov bot commented Mar 11, 2021

Codecov Report

Merging #441 (5a7bef1) into master (440809f) will increase coverage by 0.23%.
The diff coverage is 97.72%.

@@            Coverage Diff             @@
##           master     #441      +/-   ##
==========================================
+ Coverage   95.37%   95.61%   +0.23%     
==========================================
  Files          19       20       +1     
  Lines        2809     2939     +130     
==========================================
+ Hits         2679     2810     +131     
+ Misses        130      129       -1     

@daletovar daletovar requested a review from hameerabbasi March 11, 2021 22:32
Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment: We should strive for 100% test coverage. If we cannot hit a line with a test, it is probably dead code anyway. In this particular instance, I see that most of the code here can be hit; but I'd like to see it tested since this is a major feature/refactor.

sparse/_compressed/common.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more remaining! Great work!

@@ -93,6 +96,9 @@ def stack(arrays, axis=0, compressed_axes=None):
data = np.concatenate([arr.data for arr in arrays])
ptr_len = arrays[0].indptr.shape[0]
nnz = arrays[0].nnz
total_nnz = sum(int(arr.nnz) for arr in arrays)
if not can_store(indptr.dtype, total_nnz):
indptr = indptr.astype(np.min_scalar_type(total_nnz))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch needs hitting or removing.

@@ -49,16 +50,28 @@ def _from_coo(x, compressed_axes=None):
compressed_shape = (row_size, col_size)
shape = x.shape

if storage_dtype and not can_store(storage_dtype, max(compressed_shape)):
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this one.

if can_store(storage_dtype, max(shape)):
ar.coords = ar.coords.astype(storage_dtype)
else:
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As does this one.

@ivirshup
Copy link
Contributor

Thanks for looking into this more @daletovar! Minor naming suggestion: something like index_dtype/ idx_dtype? I feel like storage_dtype maybe be a bit too generic of a name.

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @daletovar!

@hameerabbasi hameerabbasi merged commit d0a4074 into pydata:master Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants