When tensor-gpu engine can't grid big number, how to deal with it? #12751
Replies: 8 comments
-
It's not mxnet bug, tensorflow gpu engine can't grid this big number, your shape must be [x <= 65536, 1], so I think you need to increase your number of batch. |
Beta Was this translation helpful? Give feedback.
-
My data.shape is (2, 3, 128, 128) where batch_size is data.shape[0]. |
Beta Was this translation helpful? Give feedback.
-
Can we edit and rephrase a better title for this issue? |
Beta Was this translation helpful? Give feedback.
-
You have my word. |
Beta Was this translation helpful? Give feedback.
-
@mxnet-label-bot [Question] |
Beta Was this translation helpful? Give feedback.
-
Why is this called tensorflow gpu?!? Did you mean to say tensor-gpu? |
Beta Was this translation helpful? Give feedback.
-
As far as I know, tensorflow gpu engine is same as tensor-gpu. |
Beta Was this translation helpful? Give feedback.
-
I've encountered this same issue, is there a workaround? |
Beta Was this translation helpful? Give feedback.
-
While calculate train_loss += loss.mean().asscalar() on gpu, the loss.shape is (98304, 1), and I got following error.
mxnet.base.MXNetError: [16:26:40] c:\jenkins\workspace\mxnet-tag\mxnet\3rdparty\mshadow\mshadow./cuda/tensor_gpu-inl.cuh:58: too large launch parameter: Softmax[98304,1], [256,1,1]
Source code in tensor_gpu-inl.cuh:
const int kMaxGridDim = 65535;
/*! \brief suggested grid number for mapping kernel /
const int kBaseGridNum = 1024;
/! \brief get align stride for given size in x dimension */
inline index_t GetAlignStride(index_t xsize) {
if (xsize >= MSHADOW_MIN_PAD_RATIO * 32) {
return ((xsize + kMemUnit - 1) >> kMemUnitBits) << kMemUnitBits;
} else {
// if originally space is not aligned, no necessary to to alligned thread allocation
return xsize;
}
}
inline void CheckLaunchParam(dim3 dimGrid, dim3 dimBlock, const char *estr = "") {
if (dimBlock.x * dimBlock.y * dimBlock.z > static_cast(kMaxThreadsPerBlock) ||
dimGrid.x > kMaxGridDim || dimGrid.y > kMaxGridDim) {
LOG(FATAL) << "too large launch parameter: "
<< estr << "["
<< dimGrid.x << ","
<< dimGrid.y << "], ["
<< dimBlock.x << ","
<< dimBlock.y << ","
<< dimBlock.z << "]";
}
}
Beta Was this translation helpful? Give feedback.
All reactions