Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[CI] Python2: CPU - hangs after test_create_np_param #16831

Open
leezu opened this issue Nov 15, 2019 · 6 comments
Open

[CI] Python2: CPU - hangs after test_create_np_param #16831

leezu opened this issue Nov 15, 2019 · 6 comments

Comments

@leezu
Copy link
Contributor

leezu commented Nov 15, 2019

test_numpy_gluon.test_create_np_param ... NumPy-shape semantics has been activated in your code. This is required for creating and manipulating scalar and zero-size tensors, which were not supported in MXNet before, as in the official NumPy library. Please DO NOT manually deactivate this semantics while using `mxnet.numpy` and `mxnet.numpy_extension` modules.
NumPy-shape semantics has been activated in your code. This is required for creating and manipulating scalar and zero-size tensors, which were not supported in MXNet before, as in the official NumPy library. Please DO NOT manually deactivate this semantics while using `mxnet.numpy` and `mxnet.numpy_extension` modules.
NumPy-shape semantics has been activated in your code. This is required for creating and manipulating scalar and zero-size tensors, which were not supported in MXNet before, as in the official NumPy library. Please DO NOT manually deactivate this semantics while using `mxnet.numpy` and `mxnet.numpy_extension` modules.
NumPy-shape semantics has been activated in your code. This is required for creating and manipulating scalar and zero-size tensors, which were not supported in MXNet before, as in the official NumPy library. Please DO NOT manually deactivate this semantics while using `mxnet.numpy` and `mxnet.numpy_extension` modules.
NumPy-shape semantics has been activated in your code. This is required for creating and manipulating scalar and zero-size tensors, which were not supported in MXNet before, as in the official NumPy library. Please DO NOT manually deactivate this semantics while using `mxnet.numpy` and `mxnet.numpy_extension` modules.

and so on until the job is stopped after 4 hours.

http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-16810/runs/7/nodes/294/steps/806/log/?start=0

@DickJC123
Copy link
Contributor

I have also seen a CI hang on a numpy test with the Python2 CI runner:

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-16838/6/pipeline

The last test to add to the above log was test_numpy_op.py:test_numpy_reshape. The next test would have been test_numpy_resize. Have any of the developers of the new numpy facility seen hangs on numpy tests?

@reminisce
Copy link
Contributor

It's probably due to too many logging messages dumped in the unit test. This problem has been fixed in the master branch. Could you rebase your PRs to reduce test time?

@leezu
Copy link
Contributor Author

leezu commented Nov 21, 2019

@reminisce should the fix be backported to 1.6 branch?

@reminisce
Copy link
Contributor

@leezu Yes, there is a 1.6 tag attached to it. #16849

@DickJC123
Copy link
Contributor

I was under the impression that when a PR goes through CI, the code tested is a merge of the PR with the then-current master. My hang was seen just 24 hours ago. @reminisce , what commit to master solved this problem in your view?

@DickJC123
Copy link
Contributor

The hang I reported in this old issue has occurred again in the exact same place, which I inferred to be in test_np_resize() because that is the next test to run after test_np_reshape():

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-17312/1/pipeline/356

A retry run I launched also appears to be hung on the test that follows test_np_empty(), namely test_np_empty_like():

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-17312/2/pipeline/372

I'm not sure how the problem is related to log file length. The lengths of the failing log files are similar, but shorter than a passing run:

<log len in chars>
307812        hang1_log.txt
311103        hang2_log.txt
416848        passes_log.txt

@reminisce @haojin2 @ptrendx

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants