Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

It is easy to crash MXNet when tensor goes larger #16560

Closed
classicsong opened this issue Oct 21, 2019 · 7 comments
Closed

It is easy to crash MXNet when tensor goes larger #16560

classicsong opened this issue Oct 21, 2019 · 7 comments
Assignees
Labels

Comments

@classicsong
Copy link

Description

When I use large tensor, it is easy to crash the MXNet kernel.
Using following python code to reproduce:

>>> import mxnet.ndarray as nd

>>> a = nd.random.randn(4, 256, 1, 100, 100)
>>> b = nd.broadcast_axis(a, axis=2, size=256)
>>> b.size
2621440000
>>> b.asnumpy()
CRASH HERE

The error looks like an int32 overflow on shape.size.
Any easy way to fix this out? The only way I found out is to compile MXNet with USE_INT64_TENSOR_SIZE = ON, which is slower than the default one.

Environment info (Required)

mxnet 1.5.1 (pip3 install)

Package used (Python/R/Scala/Julia):
Python

Error Message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/ndarray.py", line 1996, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/usr/local/lib/python3.5/dist-packages/mxnet/base.py", line 253, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [07:26:09] include/mxnet/././tensor_blob.h:290: Check failed: this->shape_.Size() == static_cast<size_t>(shape.Size()) (2621440000 vs. 18446744072036024320) : TBlob.get_with_shape: new and old shape do not match total elements
@ddavydenko
Copy link
Contributor

@mxnet-label-bot Add [Bug, Large Tensor Support]

@lanking520 lanking520 added the Bug label Oct 21, 2019
@roywei
Copy link
Member

roywei commented Oct 21, 2019

cc @access2rohit

@sxjscience
Copy link
Member

We should raise an error message in the C++ side when we are going to create a large NDArray.

@ChaiBapchya
Copy link
Contributor

Yes it is being tracked here #16570

@zachgk
Copy link
Contributor

zachgk commented Nov 7, 2019

Is this resolved now that #16570 is merged?

@samskalicky
Copy link
Contributor

@lanking520 assign @ChaiBapchya

@ChaiBapchya
Copy link
Contributor

We can close this ticket. As the solution is to build with large tensor as the issue author pointed it out. An error message is already raised as part of #16570 if large array is created when large tensor support isn't enabled.

@szha szha closed this as completed Aug 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

9 participants