It is easy to crash MXNet when tensor goes larger #16560

classicsong · 2019-10-21T07:32:50Z

Description

When I use large tensor, it is easy to crash the MXNet kernel.
Using following python code to reproduce:

>>> import mxnet.ndarray as nd

>>> a = nd.random.randn(4, 256, 1, 100, 100)
>>> b = nd.broadcast_axis(a, axis=2, size=256)
>>> b.size
2621440000
>>> b.asnumpy()
CRASH HERE

The error looks like an int32 overflow on shape.size.
Any easy way to fix this out? The only way I found out is to compile MXNet with USE_INT64_TENSOR_SIZE = ON, which is slower than the default one.

Environment info (Required)

mxnet 1.5.1 (pip3 install)

Package used (Python/R/Scala/Julia):
Python

Error Message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/ndarray.py", line 1996, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/usr/local/lib/python3.5/dist-packages/mxnet/base.py", line 253, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [07:26:09] include/mxnet/././tensor_blob.h:290: Check failed: this->shape_.Size() == static_cast<size_t>(shape.Size()) (2621440000 vs. 18446744072036024320) : TBlob.get_with_shape: new and old shape do not match total elements

The text was updated successfully, but these errors were encountered:

ddavydenko · 2019-10-21T18:38:52Z

@mxnet-label-bot Add [Bug, Large Tensor Support]

roywei · 2019-10-21T18:40:57Z

cc @access2rohit

sxjscience · 2019-10-21T23:45:55Z

We should raise an error message in the C++ side when we are going to create a large NDArray.

ChaiBapchya · 2019-10-28T17:43:29Z

Yes it is being tracked here #16570

zachgk · 2019-11-07T22:00:50Z

Is this resolved now that #16570 is merged?

samskalicky · 2019-11-18T19:56:09Z

@lanking520 assign @ChaiBapchya

ChaiBapchya · 2020-06-15T07:33:24Z

We can close this ticket. As the solution is to build with large tensor as the issue author pointed it out. An error message is already raised as part of #16570 if large array is created when large tensor support isn't enabled.

lanking520 added the Bug label Oct 21, 2019

lanking520 assigned ChaiBapchya Nov 18, 2019

szha closed this as completed Aug 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It is easy to crash MXNet when tensor goes larger #16560

It is easy to crash MXNet when tensor goes larger #16560

classicsong commented Oct 21, 2019

ddavydenko commented Oct 21, 2019

roywei commented Oct 21, 2019

sxjscience commented Oct 21, 2019

ChaiBapchya commented Oct 28, 2019

zachgk commented Nov 7, 2019

samskalicky commented Nov 18, 2019

ChaiBapchya commented Jun 15, 2020

It is easy to crash MXNet when tensor goes larger #16560

It is easy to crash MXNet when tensor goes larger #16560

Comments

classicsong commented Oct 21, 2019

Description

Environment info (Required)

Error Message:

ddavydenko commented Oct 21, 2019

roywei commented Oct 21, 2019

sxjscience commented Oct 21, 2019

ChaiBapchya commented Oct 28, 2019

zachgk commented Nov 7, 2019

samskalicky commented Nov 18, 2019

ChaiBapchya commented Jun 15, 2020