-
-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow for use of different dtypes in coords, indices, and indptr #441
Conversation
Codecov Report
@@ Coverage Diff @@
## master #441 +/- ##
==========================================
+ Coverage 95.37% 95.61% +0.23%
==========================================
Files 19 20 +1
Lines 2809 2939 +130
==========================================
+ Hits 2679 2810 +131
+ Misses 130 129 -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general comment: We should strive for 100% test coverage. If we cannot hit a line with a test, it is probably dead code anyway. In this particular instance, I see that most of the code here can be hit; but I'd like to see it tested since this is a major feature/refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more remaining! Great work!
@@ -93,6 +96,9 @@ def stack(arrays, axis=0, compressed_axes=None): | |||
data = np.concatenate([arr.data for arr in arrays]) | |||
ptr_len = arrays[0].indptr.shape[0] | |||
nnz = arrays[0].nnz | |||
total_nnz = sum(int(arr.nnz) for arr in arrays) | |||
if not can_store(indptr.dtype, total_nnz): | |||
indptr = indptr.astype(np.min_scalar_type(total_nnz)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch needs hitting or removing.
@@ -49,16 +50,28 @@ def _from_coo(x, compressed_axes=None): | |||
compressed_shape = (row_size, col_size) | |||
shape = x.shape | |||
|
|||
if storage_dtype and not can_store(storage_dtype, max(compressed_shape)): | |||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this one.
if can_store(storage_dtype, max(shape)): | ||
ar.coords = ar.coords.astype(storage_dtype) | ||
else: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As does this one.
Thanks for looking into this more @daletovar! Minor naming suggestion: something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @daletovar!
In this version, I've added a
storage_dtype
argument in theCOO
andGCXS
constructors. For most operations, the result array will use the same storage dtype as the input array. There are a few places where the output is upcast to a larger dtype if the input dtype is too small. I decided to do this because in these situations no error is raised and the output is simply wrong. Where it doesn't make sense to upcast, I have it raise an error.