-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Argument checks for SparseMatrixCSC constructors #31724
Conversation
Thanks for getting this done @KlausC! This solution looks good to me. |
I wanted to add the following text into
Also: # add tests for length(I) > typemax(Ti) when n*m < typemax(Ti) |
Go for it! You can just tag the PR as WIP in the title to prevent someone from prematurely merging it before you're done. |
bumpi |
Both failures seem unrelated (Win32: FileWatching; macOS: can't reach github.com). |
@mbauman or @KristofferC, please merge if this looks good to you. |
Edit: whoops, wrong comment on the wrong PR. Sorry about that. |
bump often! |
bump :-) |
How much slower does this make the matrix construction? |
In one particular use case for me, this makes the matrix construction 7x slower. If we want to add this, please make it an outer constructor. Unless I missed something about the original issue, I don't think this is desirable. People who use the direct constructor should know what they are doing. |
If I understand properly, it seems that the real issue originally is due to the |
@jebej, I try to understand your argument. Actually I do not agree with
I think, everybody in all situations should know what he is doing, but the 'should' is only a weak warranty. There is also no hint in the documentation like 'if you use this constructor, you should guarantee the consistency of all arguments, otherwise a segfault might occur'. I am in favor of input argument checks, if they do not add inappropriate effort. Of course it would be possible to move the check from the constructor to |
bump :-) |
@jebej Any thoughts here? |
I think what I had said earlier still applies, the original issue is with the I would consider the Regarding the test I made, it was the creation of an annihilation operator:
which gives, for the checked and unchecked cases:
So more like 5x. |
I would of course support adding a blurb to the documentation saying that the constructor does not check inputs, and to use |
I improved the PR in to give attention to the above objections:
|
Benchmarks:
|
@jebej, do you still see room for improvements? |
Personally, I don't think these things should be checked at construction time since there are cases where you intermediately might want to have an invalid sparse matrix. I would personally prefer something like #22529 (which is similar to https://docs.scipy.org/doc/scipy-1.2.0/reference/generated/scipy.sparse.csr_matrix.check_format.html#scipy.sparse.csr_matrix.check_format) where you can do a check to see that the matrix is valid before you start to use it. This also allows you to check that the matrix is valid after performing mutating functions on the fields, and not only at construction time. |
That is contrary to my opinion: I think, every exported constructor, if documented or not, should return a valid object of that type.
in such cases it is easy to postpone construction until integrity of components has been established. At least in the julia base and stdlib that could be easily achieved.
That was just a proposal to provide an unchecked version of the constructor besides the checked version, as was suggested by @jebej.
If it is exported it should be documented.
The logic added to the constructor is:
Adding a (exported?) function to perform a validity check is a good idea and should share code with the checks of this PR. |
The type |
That looks like missing documentation, not like "don't use the constructor". What I mean is, if type |
To be constructive, I want to outline a way to save the essence of this PR without touching the constructors.
|
The challenge here comes from the fact that I am not convinced about extending the I feel that this whole thing needs to move out of |
I adapted my outline according to your proposals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the PR looks good to me. So hopefully we can merge this soon.
I am willing to give this PR a shot, with the Just a note - the checking of the colptr especially in SparseMatrixCSC is an issue after someone has written all over it - so you want to check it at that point before it is handed over to other code. |
|
||
Tj = Ti | ||
while isbitstype(Tj) && coolen >= typemax(Tj) | ||
Tj = widen(Tj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This made Tj
type unstable and caused the regression in #32985. Only matters for smallish arrays but would be good to fix nonetheless. Can't we just always use Int
for this or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would just be an error. This segfaulted in 1.2. Let's just make it an error instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error messages here leave some to be desired. For example
gives very little information about what went wrong and how it can be fixed. |
This PR combines the reverted breaking #31118 and the unfinished #31661.
In order to allow
n * m >= typemax(Ti)
it is necessary to put the following restrictions in place is order to avoid segmentation violations.m, n <= typemax(Ti)
length(colptr) >= n+1
1 <= colptr[i] <= colptr[i+1] for i in 1:n
in addition to the preexisting checks.
All uses of
SparseMatrixCSC(m, n, colptr, ....)
with uninitialized colptr have been modified.Fixes #31024.