Add multiple parallel AddMatMat, #119

danpovey · 2015-09-03T01:26:36Z

@freewym, this is for you, although @naxingyu may have an interest in it too.

Please look over #47 to understand the background for this (Convolutional component). That pull request is for nnet2, but there is a similar set of code in nnet1, with a separate pull request (you can search for that). The original reason we wanted to upgrade to the CuBLAS v2 API is because of parallel matrix multiplication not being available in the v1 API. Now that you've (nearly) finished that task, you can help us add this batched matrix multiplication.

The current AddMatMat has signature
void Matrix::AddMatMat(const Real alpha,
const MatrixBase& A, MatrixTransposeType transA,
const MatrixBase& B, MatrixTransposeType transB,
const Real beta);
I'd like you to add a batched AddMatMat function that is a wrapper for cuBLAS's gemmBatched function. This will later be used in the convolutional component. Of course this will require test code.
The function signature and documentation should be the following (although I won't have the
whitespace correct as I am composing this with non-fixed-width font).
/**
@brief This function executes multiple matrix multiplications, executing them in parallel
using cuBLAS's gemmBatched if we are using a GPU. Vectors a, b and c
must have the same length; for each i, this function executes the matrix operation
c[i] = alpha a[i] b[i] + beta c[i].

  @param [in] alpha   The constant alpha in the equation "c[i] = alpha a[i] b[i] + beta c[i]."
   @param [in] c        A vector of pointers to matrices; all elements must have the same
                                 num-rows, num-cols and stride.  The matrices must point to distinct 
                                regions of GPU memory, or results are undefined.  Ownership of
                                 pointers is retained by the caller.
   @param [in] a        A vector of pointers to matrices; all elements must have the same
                                 num-rows, num-cols and stride.  Ownership of pointers is retained
                                 by the caller
    @param [in] trans_a   Indicates whether we should use the transpose of a[i] in the equation
                                If trans_a == kTrans, transpose(a[i]) appears in place of a[i].
   @param [in] B        A vector of pointers to matrices; all elements must have the same
                                 num-rows, num-cols and stride.  Ownership of pointers is retained
                                 by the caller
    @param [in] trans_b   Indicates whether we should use the transpose of b[i] in the equation
                                If trans_b == kTrans, transpose(b[i]) appears in place of b[i].
    @param [in] beta   The constant beta in the equation "c[i] = alpha a[i] b[i] + beta c[i]."
*/
template <class Real>
void AddMatMatBatched(const Real alpha,
                                        const std::vector<CuSubMatrix<Real>* > &C,
                                      const std::vector<const CuSubMatrix<Real>* > &A,
                                     MatrixTransposeType trans_a,
                                      const std::vector<const CuSubMatrix<Real>* > &B,
                                     MatrixTransposeType trans_a,
                                    const Real beta);

Note: we have to pass vectors of pointers, although it is inconvenient from a memory management perspective, because CuSubMatrix doesn't have an operator=, so we can't easily create a vector of CuSubMatrix directly. Also, we prefer to pass CuMatrixBase in situations like these, but it would create difficulties when deleting the memory (since an abstract base class can't be deleted unless it has a virtual destructor). It's OK; we can always create a CuSubMatrix that's identical to any given matrix.

Please make sure your test code does not have memory leaks; you can run valgrind or cuda-memtest on it.
Also, if you could add some speed-tests to make it possible to see whether the batched matrix-multiplication is helpful for various matrix sizes, that would also be very helpful.

The text was updated successfully, but these errors were encountered:

danpovey · 2015-09-12T00:21:53Z

closing since work is done. thanks, @freewym .

danpovey closed this as completed Sep 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multiple parallel AddMatMat, #119

Add multiple parallel AddMatMat, #119

danpovey commented Sep 3, 2015

danpovey commented Sep 12, 2015

Add multiple parallel AddMatMat, #119

Add multiple parallel AddMatMat, #119

Comments

danpovey commented Sep 3, 2015

danpovey commented Sep 12, 2015