You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@freewym, this is for you, although @naxingyu may have an interest in it too.
Please look over #47 to understand the background for this (Convolutional component). That pull request is for nnet2, but there is a similar set of code in nnet1, with a separate pull request (you can search for that). The original reason we wanted to upgrade to the CuBLAS v2 API is because of parallel matrix multiplication not being available in the v1 API. Now that you've (nearly) finished that task, you can help us add this batched matrix multiplication.
The current AddMatMat has signature
void Matrix::AddMatMat(const Real alpha,
const MatrixBase& A, MatrixTransposeType transA,
const MatrixBase& B, MatrixTransposeType transB,
const Real beta);
I'd like you to add a batched AddMatMat function that is a wrapper for cuBLAS's gemmBatched function. This will later be used in the convolutional component. Of course this will require test code.
The function signature and documentation should be the following (although I won't have the
whitespace correct as I am composing this with non-fixed-width font).
/** @brief This function executes multiple matrix multiplications, executing them in parallel
using cuBLAS's gemmBatched if we are using a GPU. Vectors a, b and c
must have the same length; for each i, this function executes the matrix operation
c[i] = alpha a[i] b[i] + beta c[i].
@param [in] alpha The constant alpha in the equation "c[i] = alpha a[i] b[i] + beta c[i]."
@param [in] c A vector of pointers to matrices; all elements must have the same
num-rows, num-cols and stride. The matrices must point to distinct
regions of GPU memory, or results are undefined. Ownership of
pointers is retained by the caller.
@param [in] a A vector of pointers to matrices; all elements must have the same
num-rows, num-cols and stride. Ownership of pointers is retained
by the caller
@param [in] trans_a Indicates whether we should use the transpose of a[i] in the equation
If trans_a == kTrans, transpose(a[i]) appears in place of a[i].
@param [in] B A vector of pointers to matrices; all elements must have the same
num-rows, num-cols and stride. Ownership of pointers is retained
by the caller
@param [in] trans_b Indicates whether we should use the transpose of b[i] in the equation
If trans_b == kTrans, transpose(b[i]) appears in place of b[i].
@param [in] beta The constant beta in the equation "c[i] = alpha a[i] b[i] + beta c[i]."
*/
template <class Real>
void AddMatMatBatched(const Real alpha,
const std::vector<CuSubMatrix<Real>* > &C,
const std::vector<const CuSubMatrix<Real>* > &A,
MatrixTransposeType trans_a,
const std::vector<const CuSubMatrix<Real>* > &B,
MatrixTransposeType trans_a,
const Real beta);
Note: we have to pass vectors of pointers, although it is inconvenient from a memory management perspective, because CuSubMatrix doesn't have an operator=, so we can't easily create a vector of CuSubMatrix directly. Also, we prefer to pass CuMatrixBase in situations like these, but it would create difficulties when deleting the memory (since an abstract base class can't be deleted unless it has a virtual destructor). It's OK; we can always create a CuSubMatrix that's identical to any given matrix.
Please make sure your test code does not have memory leaks; you can run valgrind or cuda-memtest on it.
Also, if you could add some speed-tests to make it possible to see whether the batched matrix-multiplication is helpful for various matrix sizes, that would also be very helpful.
The text was updated successfully, but these errors were encountered:
@freewym, this is for you, although @naxingyu may have an interest in it too.
Please look over #47 to understand the background for this (Convolutional component). That pull request is for nnet2, but there is a similar set of code in nnet1, with a separate pull request (you can search for that). The original reason we wanted to upgrade to the CuBLAS v2 API is because of parallel matrix multiplication not being available in the v1 API. Now that you've (nearly) finished that task, you can help us add this batched matrix multiplication.
The current AddMatMat has signature
void Matrix::AddMatMat(const Real alpha,
const MatrixBase& A, MatrixTransposeType transA,
const MatrixBase& B, MatrixTransposeType transB,
const Real beta);
I'd like you to add a batched AddMatMat function that is a wrapper for cuBLAS's gemmBatched function. This will later be used in the convolutional component. Of course this will require test code.
The function signature and documentation should be the following (although I won't have the
whitespace correct as I am composing this with non-fixed-width font).
/**
@brief This function executes multiple matrix multiplications, executing them in parallel
using cuBLAS's gemmBatched if we are using a GPU. Vectors a, b and c
must have the same length; for each i, this function executes the matrix operation
c[i] = alpha a[i] b[i] + beta c[i].
Note: we have to pass vectors of pointers, although it is inconvenient from a memory management perspective, because CuSubMatrix doesn't have an operator=, so we can't easily create a vector of CuSubMatrix directly. Also, we prefer to pass CuMatrixBase in situations like these, but it would create difficulties when deleting the memory (since an abstract base class can't be deleted unless it has a virtual destructor). It's OK; we can always create a CuSubMatrix that's identical to any given matrix.
Please make sure your test code does not have memory leaks; you can run valgrind or cuda-memtest on it.
Also, if you could add some speed-tests to make it possible to see whether the batched matrix-multiplication is helpful for various matrix sizes, that would also be very helpful.
The text was updated successfully, but these errors were encountered: