Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make cumsum result type consistent with sum #9665

Closed
wants to merge 2 commits into from

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Jan 7, 2015

This means that cumsum(Int8[1,2,3]) is Int[1,3,6] rather than an Int8 array, for example. If you really want to do the sum as Int8, you can still do cumsum!(Array(Int8,3), Int8[1,2,3]).

In principle, this is a breaking change, but I'm not sure if any code in the wild actually relied on the lack of promotion in the cumsum results? (The most common usage seems to be for Float64 and Float32 arrays, which are unaffected.)

Also, this makes cumsum! and cumsum use the same pairwise code for 1d arrays (whereas previously cumsum! used a completely different codepath based on the one for multidimensional arrays, which just does naive summation). Ideally, we'd use the pairwise algorithm for multidimensional arrays too, but that requires a much bigger code rewrite.

(See discussion in #9650 with @timholy.)

@stevengj stevengj added the kind:breaking This change will break code label Jan 7, 2015
@stevengj
Copy link
Member Author

stevengj commented Jan 7, 2015

Hmm, getting:

    From worker 8:       * linalg/givens
exception on 6: ERROR: `convert` has no method matching convert(::Type{Base.SparseMatrix.SparseMatrixCSC{Tv,Ti<:Integer}}, ::Int64, ::Int64, ::Array{Int64,1}, ::Array{Int32,1}, ::Array{Float64,1})
This may have arisen from a call to the constructor Base.SparseMatrix.SparseMatrixCSC{Tv,Ti<:Integer}(...), since type constructors fall back to convert methods in julia v0.4.
 in getindex_general at sparse/sparsematrix.jl:1114
 in getindex at sparse/sparsematrix.jl:1122
 in anonymous at no file:18
 in runtests at /Users/stevenj/Code/julia/test/testdefs.jl:5
 in anonymous at multi.jl:852
 in run_work_thunk at multi.jl:603
 in anonymous at task.jl:852
while loading linalg/umfpack.jl, in expression starting on line 13

Not sure how this patch is causing that failure, since cumsum is not used in base/linalg/*.

@andreasnoack
Copy link
Member

Could it be https://github.com/JuliaLang/julia/blob/master/base/sparse/sparsematrix.jl#L1078 that now widens the integer type?

@ivarne
Copy link
Sponsor Member

ivarne commented Jan 7, 2015

+1 on the pairwise behavior.

Neutral on the promotion. The promotion argument is strong for sum and other reductions that transform an array to a scalar, because you get better accuracy at a very low cost. For cumsum you actually pay the cost of using a bigger result array, but if that is a concern you probably can't afford the implicit copy and should use cumsum! anyway.

@stevengj
Copy link
Member Author

stevengj commented Jan 7, 2015

Yikes, yes, cumsum is used all over the sparse code, and the promotion will be a problem for Int32 index types.

@ViralBShah
Copy link
Member

It shouldn't be difficult to adapt the sparse code to this change, if we want it.

@timholy
Copy link
Sponsor Member

timholy commented Jan 8, 2015

+1 for this

@tkelman
Copy link
Contributor

tkelman commented Jan 8, 2015

Seems odd to me to have cumsum and cumsum! giving you different types. I thought we've generally been moving away from integer promotion, not towards more of it?

@timholy
Copy link
Sponsor Member

timholy commented Jan 8, 2015

It also seems odd to have cumsum(x)[end] != sum(x), which is what you'll get from cumsum(rand(UInt8, 1000)).

While we removed promotion from ::UInt8 + ::UInt8 (because there was no way for a user to turn it off, whereas a user can turn it on with UInt(x)+y), we added integer promotion in sum.

@stevengj
Copy link
Member Author

stevengj commented Jan 8, 2015

@tkelman, cumsum! gives you whatever type you want, because you specify the output type via its first argument. If you want it to give the same type as cumsum, you can.

@tkelman
Copy link
Contributor

tkelman commented Jan 8, 2015

I have no problem with cumsum and sum behaving differently with respect to type promotion - the whole point of cumsum is that you need to use the running sum array for some following calculation. Changing the element type in this operation feels a lot like changing the element type in adding two arrays. Overflow is a valid concern, but isn't the widening workaround of cumsum! just as usable there?

@nalimilan
Copy link
Member

OTOH if sum(x::Float32) overflows, then cumsum(x::Float32) overflows too for the last elements. So widening the type in the former case and not the latter does not make much sense, even if the cost associated to the widening is of course larger for cumsum.

@tkelman
Copy link
Contributor

tkelman commented Jan 8, 2015

I'm also concerned about this breaking all of the sparse code. We can grep for cumsum and replace everything, but that's causing churn in code that's already known to be poorly tested, fragile, and in need of a major rewrite.

@ViralBShah
Copy link
Member

There are only 6 mentions of cumsum in the sparse code. I don't think that is hard to fix.

Off-topic, but it would be nice to know what you'd want to do in a rewrite of the sparse code. Perhaps file an issue with your thoughts? I plan to hack on it over the next few days.

@tkelman
Copy link
Contributor

tkelman commented Jan 8, 2015

And elsewhere in base? In packages that deal with sparse matrices?

it would be nice to know what you'd want to do in a rewrite of the sparse code. Perhaps file an issue with your thoughts?

I already more or less did, #8001. I'd like sparse matrices to be first class in terms of the array operations and reductions that work efficiently on them. A few days of hacking can improve things gradually, but I'm envisioning a more comprehensive rewrite that will need changes to the way the rest of the standard library interacts with array objects to make sparse or dense both work well. A more formal concept of interfaces, and a few generic iterator protocols will likely be required here.

@timholy
Copy link
Sponsor Member

timholy commented Jan 8, 2015

Just to make sure the bikeshedding stays in perspective:

  • This patch contains an absolutely essential change, implementing cumsum as a call to cumsum! and allowing users to control the output type themselves. This should darn well be uncontroversial and should go in ASAP.
  • The only concern is over a line or two of code, whether the default should be a promoted type or the same type. Since this is pretty trivial, I'm going to refrain from arguing this into the ground. My final pitch to @tkelman is that when you add two arrays elementwise, each element of the sum is just a+b, i.e., you're only summing two numbers. When you call cumsum, you're forming a sum of length(A). That's basically guaranteed to overflow for small integer types. But like I said, I don't care much one way or another. What I care about is that this patch allows me to assume control of the answer.

@tkelman
Copy link
Contributor

tkelman commented Jan 8, 2015

This patch contains an absolutely essential change, implementing cumsum as a call to cumsum! and allowing users to control the output type themselves. This should darn well be uncontroversial and should go in ASAP.

Yes, that part absolutely makes sense.

When you call cumsum, you're forming a sum of length(A)

You're also returning an array. I really strongly dislike element type promotion in functions that return arrays.

@stevengj
Copy link
Member Author

stevengj commented Jan 8, 2015

@timholy, you don't need this patch to control the answer, because cumsum! already exists in Julia. The only difference here is that cumsum! for 1d arrays calls the pairwise code rather than doing naive summation. (But the pairwise stuff is irrelevant for integer types.)

But that being said, I'm fine with dropping the promotion, and just pushing the patch for the change in cumsum! to the pairwise implementation, since it seems like the case for promotion is ambiguous.

@timholy
Copy link
Sponsor Member

timholy commented Jan 8, 2015

Sounds fine, let's do it that way. We may want to add something to the docs for cumsum, something like:

Cumulative sum along a dimension. The dimension defaults to 1. See also 'cumsum!', especially if the computation might overflow.

@stevengj stevengj closed this Jan 8, 2015
@ivarne
Copy link
Sponsor Member

ivarne commented Jan 8, 2015

There is nothing in cumsum! to prevent overflow, so that suggestion seems slightly misleading. You can overflow any fixed with integer type, so no automatic promotion will fix that for us. How about

See also `cumsum!` which lets you specify a different array-
type to store the result in. This might be useful if the sum
might otherwise overflow, as there is no overflow detection in
this function.

@timholy
Copy link
Sponsor Member

timholy commented Jan 8, 2015

That's fine, and closer to what I originally wrote. I was just trying to be brief.

stevengj added a commit to stevengj/julia that referenced this pull request Jan 8, 2015
@ivarne
Copy link
Sponsor Member

ivarne commented Jan 8, 2015

Why brief, when we get so much complains that people don't understand our documentation and people beg for examples?

@ivarne
Copy link
Sponsor Member

ivarne commented Jan 8, 2015

Seems like @stevengj just nailed it and wrote something both brief and correct.

@timholy
Copy link
Sponsor Member

timholy commented Jan 8, 2015

Entirely reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:breaking This change will break code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants