Functions that produce results in pre-defined outputs #1115

timholy · 2012-08-04T22:06:55Z

Operations like matrix multiplication, C = A*B, have historically been implemented as C = mult(A,B), and this seems very natural. Below, I argue that there are numerous benefits to habitually writing our core algorithms as mult(C, A, B), where C is a pre-allocated output. Note that this issue goes far beyond matrix multiplication: taken to the extreme, one might say that the majority of function that produce any kind of "nontrivial" output (especially ones that allocate storage for that output) might be candidates for refactoring under this proposal. Of course, many areas of Julia already follow this paradigm, e.g., the I/O functions take as first input "io", which is a stream representing the output.

Here are reasons that I think we need to consider doing things this way a matter of habit:

Flexibility efficient sets using hash tables #1: first, note that C = mult(A, B) can be trivially written in terms of mult(C, A, B):

function mult(A, B)
    C = Array(T, size(A, 1), size(B, 2))
    mult(C, A, B)
end

The converse is not true: you can't write mult(C, A, B) in terms of mult(A, B), because mult(A, B) allocates the storage for its output. Putting the actual core code into the three-argument version is therefore an increase in API flexibility.

The only disadvantage is lots of little "stub" functions when we also want the two-argument version to be available. But in general we have a lot of those anyway, and they are a strength of Julia.

Flexibility reimplement Set efficiently using hash tables #2: Better control over output type. For example, for handling big data, I might want the output to be a memory-mapped array. But how is a matrix-multiplication routine supposed to know that? It happens automagically for me if the output is one of the inputs.* Likewise, consider a case where an algorithm might produce results in one of two formats, say a string or a number representation of a date: if I don't want to have to re-format, I might as well have the output be an input to the function that figures out what day it is.
Performance: often one has to do the same computation a ridiculous number of times. Consider a case where you have to do det(A_i * B_i) for millions of small matrices A_i and B_i (applications where this arises: simulations, calculations on a grid, etc.). With the two-argument form of matrix multiplication, malloc has to be called for each intermediate value. One might imagine speed improvements using a single preallocated temporary for each intermediate result. In a test of 2x2 matrix multiplication, this trivially gets you a 3-fold speed improvement. (The fact that it's not larger is a triumph of Julia's memory management.)
Extensibility: there is more than a little interest in using Julia for GPU computing and in SMP multithreading. In such cases, you want to avoid calls from a thread to Julia's allocator functions. Presumably it will be easier to build kernels for such applications if more of our core computations already follow this paradigm.

Aside from encouraging this as a matter of programming style, let me bring up some of the issues worthy of discussion. We use the ! convention to indicate a function that mutates its inputs, such as sort!. However, this is not exactly what's happening here: in a matrix multiplication routine, A and B are not being mutated. There are also subtleties: for plus(C, A, B) it's perfectly safe to call this as plus(A, A, B), meaning the matrix version of A += B. However, you can't safely do matrix multiplication in-place: the output storage has to be independent of the inputs.

In other words, we might need more conventions or notation. I'm not good at naming, but to get the discussion started, let me make a few proposals.

A. We could reserve ! for its "purest" current meaning, things like in-place sort!.

B. We might not need special notation for a version that allows you to supply outputs, e.g., we could have both C = plus(A, B) and plus(C, A, B), where the distinction is handled by multiple dispatch. Alternatively, would it be desirable/feasible to use function names like mult=(C, A, B), where the = indicates that the first (or first several) arguments are outputs?

C. When functions need their output storage to be independent of their inputs, they could be given names like mult_noalias (following notation used in Eigen and Boost) to indicate to the user that it's a bug to call such functions like this: mult_noalias(A, B, A).

D. Finally, there has been a proposal for operator versions like C .= A*B. This kind of thing is clearly a good idea. But by now it should be clear that operators are only the tip of the iceberg---this issue permeates the entire function API.

References to previous related discussions:
#249
https://groups.google.com/d/topic/julia-dev/bWGWeIdhsnI/discussion
https://groups.google.com/d/topic/julia-dev/NaPxnrHtut8/discussion
Commit fc14052, which began the process for some of the linear algebra routines, but was horribly inconsistent in its naming scheme and needs fixing.

*Currently the calling function couldn't determine whether an array is in-memory or mapped to a file. This could potentially lead to problems from too much automagical behavior, when, for example, the best algorithm might depend upon the storage representation. But it should be clear that in general Julia's type system usually allows you to resolve such things very cleanly via multiple dispatch; memory-mapped arrays are a bit of an exception in this regard.

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2012-08-04T22:20:19Z

One idea is to use a keyword argument output as the convention for specifying an output argument. That way the feature can be added to any function without disturbing normal arguments.

A simple rewrite would be A := f(...) becomes A = f(..., output=A). I like := since it is traditionally used for mutating or updating assignment.

Then a more far-out enhancement would be to allow the compiler to speculatively insert output= itself to reuse temporaries:

a = b+2*c+d

could become

tmp = 2*c
tmp2 = b+tmp
+(tmp2, d, output=tmp)
a = tmp

timholy · 2012-08-04T22:41:32Z

To me, putting output in the function declaration seems very clear. How would this proposal interact with this situation?

val := optimize_me(x)
val, grad := optimize_me(x)
val, grad, hess := optimize_me(x)

function optimize_me(output=grad, output=hess, x)

Since val is usually a scalar (in this type of example), it would probably be returned directly. Could this be flexible enough to determine whether it's necessary to bother computing the gradient?

timholy · 2012-08-04T22:46:52Z

Also, this by itself doesn't handle the noalias issue, although of course that's something that documentation could provide.

JeffBezanson · 2012-08-04T22:53:27Z

I think a,b,c := f(x) would have to be a,b,c = f(x,output=(a,b,c)).

We'd also have to decide whether output is mandatory or advisory --- whether the specified output must be used, or just can be used. This affects the case where output might not be the right type. It could be an error, or just result in allocation.

timholy · 2012-08-05T15:09:22Z

If it's not of the right size or type, in general I'd lean towards an error---for use in kernels, it might be nice to have a guarantee that allocation won't be performed. It should be easy enough to write a wrapper around it that does the allocation, if that's the behavior someone actually wants.

timholy · 2012-08-06T08:51:33Z

Another question: presumably output doesn't restrict something from also being used as an input, right? I'm thinking of routines like the BLAS daxpy.

dmbates · 2012-08-06T19:32:31Z

I wrote a comment on commit fff7d42 which probably should have been here instead. It seems to me that trying to find a syntax for linear algebra operations with preallocated results is overkill. The Lapack and BLAS code already does uses preallocated resutls and I think exposing these capabilities to programmers would be best accomplished by having the immediate Julia wrappers of Lapack or BLAS subroutines use preallocated storage for the results. The user-facing Julia functions would allocate the result and the sophisticated programmer who wanted to save the malloc/free overhead could call the low-level wrappers directly.

timholy · 2012-08-06T20:19:54Z

I think we mostly agree on this point, although I think we should provide one level up that "Julia-izes" Lapack but without allocating the outputs. Right now the linalg code tree seems not to provide this reliably.

While filing this issue was prompted by what I found in the linalg files (I hadn't looked at them until very recently), as a matter of "Julia programming style" it goes beyond that. I'm really thinking ahead to the day when Krys will deliver multithreadable kernels from native Julia code :-), and I want to be able to use those with minimum of reading documentation about the ccall interface to external packages. So part of the plan in filing this issue was to collect comments from existing Julia developers and to be able to direct newcomers to Julia to some document that describes why this might be important.

o-jasper · 2012-08-07T11:08:31Z

Hmm what about return type declarations? Basically also pattern-match for the T there; function foo{T}(x)::T, and then select T with y = foo(x)::TypeYouWant. Not sure how hard that is to implement..

:= could be useful. For instance with the distinction, you could allow for Common Lisps setf -functions, because you can tell the difference between get(dict,key) := value (that's assign(dict,value,key)) and get(dict,key) = value; defining a function. (I like setf-functions because it takes the convention out of getter and setters, and makes the language aware of the idea.)

Another instance is special variables, where without the distinction you don't know if you're changing it's value or locally defining it. Well not quite, because = is still assignment for variables, unless we're willing to change all the code. I guess the local keyword covers it.

timholy · 2012-08-09T02:46:17Z

Return type declarations are definitely interesting and have overlap with this proposal, but also have some important differences. The most obvious is that they don't solve the problem of temporary allocation. If I want a 2x2 matrix back, but don't want the algorithm to allocate fresh memory to store it, then I have to tell it which blob of memory I want it to use. That's easy if the output is one of the inputs.

o-jasper · 2012-08-09T13:22:06Z

I agree that it would be good if the user is sure that it will do mult(C,A,B) but it is also good if it can infer it when you type C =A*B in assignment, and C has some compatible matrix value already.

timholy · 2012-08-31T17:05:16Z

For those who don't follow every last detail of Julia development, I realized I should give a status update: the process proposed here has been completed (some time ago) for matrix multiplication. On some not-too-small problems (20x20) we've seen 1.5-fold speed increases, in addition to the benefit of extra flexibility that it permits (e.g., writing the results directly to memory-mapped storage).

The pull request referenced above for tridiagonal and Woodbury is the first case where it might be implemented for inversion, too. That pull request also contains a specific proposal (the last bullet point) that could be the basis for spreading this more widely. So, anyone who cares about this issue may want to check that discussion.

andreasnoack · 2013-09-28T06:43:55Z

I am working on some functionality for Toeplitz matrices where I define e.g. SymmetricToeplitz, Circulant and TriangularToeplitz and only store the first column. The reason is that I want fast matrix-vector multiplication via fft and therefore define methods for each type that do the right circulant embedding and so forth. Next, I wanted to write some iterative solvers for Toeplitz problems and after writing them I realised that they were agnostic to the Toeplitz structure. They only need matrix-vector multiplication and a solver when preconditioning and could be defined for AbstractMatrix. However, the multiplication and solver API would need to be standardised over the different subtypes of AbstractMatrix. In my Toeplitz code I found it convenient to define e.g.

A_mul_B{T}(α::T,A::Circulant{T},x::StridedVector{T},β::T,y::StridedVector{T})

for the operation y=αAx+βy similary to gemv in BLAS. I also defined solve!(A,b) that wrote the solution to b. Now I am ready to follow up on the question of this thread. Would it make sense to have this structure generally and then derive all the usual allocating * and \ from non allocating workhorse functions on the form just presented?

For some more code context have a look at

https://github.com/andreasnoackjensen/ToeplitzMatrices.jl

cdsousa · 2013-09-28T09:35:34Z

Let me suggest something like:

mult!(C, A, B) for the C<-A*B multiplication;
add!(C, A, B) for the C<-A+B addition;
add!(A, B) for the inplace A<-A+B addiction.

Then possible the inplace operators .= *, .= + and .+= of #249 could be defined to map to these function.

simonster · 2015-09-25T16:53:28Z

This is a very old issue. We have settled on the !/first argument convention. Perhaps this can be closed?

timholy mentioned this issue Aug 31, 2012

RFC: Improved tridiagonal and new Woodbury matrix algebra #1243

Closed

JeffBezanson mentioned this issue Jun 2, 2013

Feature request: overloadable updating operators #3217

Closed

JeffBezanson mentioned this issue Jun 18, 2013

compiler optimization tracker #3440

Closed

65 tasks

JeffBezanson mentioned this issue Aug 29, 2013

BigInt and BigFloat performance is poor #4176

Open

timholy mentioned this issue Apr 28, 2014

some reductions over dimensions depend too heavily on zero() #6672

Closed

JeffBezanson mentioned this issue May 30, 2014

make .+= et al. mutating #7052

Closed

JeffBezanson mentioned this issue Jul 25, 2014

Poor performance when using in-place operator for arrays #7726

Closed

andreasnoack mentioned this issue Dec 11, 2014

Generational behavior for the garbage collector #8699

Merged

timholy closed this as completed Sep 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functions that produce results in pre-defined outputs #1115

Functions that produce results in pre-defined outputs #1115

timholy commented Aug 4, 2012

JeffBezanson commented Aug 4, 2012

timholy commented Aug 4, 2012

timholy commented Aug 4, 2012

JeffBezanson commented Aug 4, 2012

timholy commented Aug 5, 2012

timholy commented Aug 6, 2012

dmbates commented Aug 6, 2012

timholy commented Aug 6, 2012

o-jasper commented Aug 7, 2012

timholy commented Aug 9, 2012

o-jasper commented Aug 9, 2012

timholy commented Aug 31, 2012

andreasnoack commented Sep 28, 2013

cdsousa commented Sep 28, 2013

simonster commented Sep 25, 2015

Functions that produce results in pre-defined outputs #1115

Functions that produce results in pre-defined outputs #1115

Comments

timholy commented Aug 4, 2012

JeffBezanson commented Aug 4, 2012

timholy commented Aug 4, 2012

timholy commented Aug 4, 2012

JeffBezanson commented Aug 4, 2012

timholy commented Aug 5, 2012

timholy commented Aug 6, 2012

dmbates commented Aug 6, 2012

timholy commented Aug 6, 2012

o-jasper commented Aug 7, 2012

timholy commented Aug 9, 2012

o-jasper commented Aug 9, 2012

timholy commented Aug 31, 2012

andreasnoack commented Sep 28, 2013

cdsousa commented Sep 28, 2013

simonster commented Sep 25, 2015