Ci test randomness2 #8526

DickJC123 · 2017-11-03T02:08:27Z

Description

This PR introduces a new simple level of control over unittest random seeds while providing inter-test random number generator (RNG) isolation. This PR is an improved replacement to the pending PR #8313. The improvements over that PR are:

A unittest that fails via an exception will have its seed reported in the test log. Reproducing the failure with the same-seeded data is simple and immediate.
The mx.random seeds are also set (identically to np.random) giving deterministic behavior and test isolation for mxnet cpu and gpu RNG's.
A unittest failure via a core-dump can also be reproduced after the module test is re-run with debugging enabled.

To provide this functionality, a custom decorator "@with_seed()" was created. This was considered more powerful than the nosetests "@with_setup()" facility, and less disruptive than changing all tests to become methods of a nosetests test class. The proposed new approach is demonstrated on a simple "test_module.py" test file of three tests. Assuming that the second test needs a set seed for robustness, the file might currently appear as:

def test_op1():
    <op1 test>

def test_op2():
    np.random.seed(1234)
    <op2 test>

def test_op3():
    <op3 test>

Even though test_op3() is OK with nondeterministic data, it will have only a single dataset because it is run after test_op2, which sets the seed. Also, if test_op1() were to fail, there would be no way to reproduce the failure, except for running the test individually to produce a new and hopefully similar failure.

With the proposed approach, the test file becomes:

from common import *

@with_seed()
def test_op1():
    <op1 test>

@with_seed(1234)
def test_op2():
    <op2 test>

@with_seed()
def test_op3():
    <op3 test>

By importing unittests/common.py, the seeds of the numpy and mxnet RNGs are set initially to a random "module seed" that is output to the log file. The initial RNGs can be thought of as module-level RNGs that are isolated from the individual tests and that provide the string of "test seeds" that determine the behavior of each test's RNGs. The "@with_seed()" test function decorator requests a test seed from the module RNG, sets the numpy and mxnet seeds appropriately and runs the test. Should the test fail, the seed for that test is output to the log file. Pass or fail, the decorator reinstates the module RNG state before the next test's decorator is executed, effectively isolating the tests. Debugging a failing test_op3 in the example would proceed as follows:

$ nosetests --verbose -s test_module.py
[INFO] Setting module np/mx random seeds, use MXNET_MODULE_SEED=3444154063 to reproduce.
test_module.test_op1 ... ok
test_module.test_op2 ... [INFO] Setting test np/mx random seeds, use MXNET_TEST_SEED=1234 to reproduce.
ok
test_module.test_op3 ... [INFO] Setting test np/mx random seeds, use MXNET_TEST_SEED=2096230603 to reproduce.
FAIL
======================================================================
FAIL: test_module.test_op3
----------------------------------------------------------------------
Traceback (most recent call last):
<stack trace appears here>
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx random seeds, use MXNET_TEST_SEED=2096230603 to reproduce.
--------------------- >> end captured logging << ---------------------
----------------------------------------------------------------------
Ran 3 tests in 1.354s
FAILED (failures=1)

Because test_op3 failed, its seed appeared in the log file. Also, the test_op2 seed was displayed as a reminder that that test needs more work before it is robust enough for random data. The command to reproduce the problem is produced simply by cutting and pasting from the log:

$ MXNET_TEST_SEED=2096230603 nostests --verbose -s test_module.py:test_op3

If test_op3 instead dumped core, the test seed would not be initially apparent. Assuming the core-dump is repeatable based on the data, the module would first be re-run with the command:

$ MXNET_MODULE_SEED=3444154063 nostests --logging-level=DEBUG --verbose -s test_module.py

The log would now include the test seeds for all tests before they are run, so the test could then be run in isolation as before with MXNET_TEST_SEED=2096230603.

Let's assume that test_op3 was altered by increasing a tolerance. How robust is the test now? This can be explored by repeating the test many times as in:

$ MXNET_TEST_COUNT=10000 nostests --logging-level=DEBUG --verbose -s test_module.py:test_op3

Finally, this PR adds the @with_seed() decorator for all tests in modules that use random numbers. Also, it includes many specific test robustness fixes that were exposed once this new methodology was adopted internally by the author.

Checklist

Essentials

[X ] Passed code style checking (make lint)
[X ] Changes are complete (i.e. I finished coding on this PR)
[X ] All changes have test coverage
[X ] For user-facing API changes, API doc string has been updated.
[X ] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

[X ] Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Intersting edge cases to note here

…nism.

… error reproducability.

…t logs.

…m.py.

… more study.

DickJC123 · 2017-11-03T04:26:39Z

I prefer that the original author of the cpu implementation correct the non-robustness of the float16 test. In the meantime, we could have the entire test ignored or put it on a fixed seed, or comment out just the float16 cpu part and file an issue:
#8509

Which approach would you recommend?

marcoabreu · 2017-11-05T22:46:41Z

Hello @DickJC123 , interesting idea! Do you also support the possibility to set a global seed which will be set for each test before it is getting executed? Right now, the only way I see is adding @with_seed(1234) to every single test. My goal is to have a completely deterministic and reproducible behaviour during test execution by using the exact same seed for every single test without having to alter the code.

DickJC123 · 2017-11-06T00:02:11Z

If you execute 'export MXNET_TEST_SEED=42' before running your tests, then every test (except those that override the seed like the 1234 case below) with have the seeds set to 42. If you instead execute 'export MXNET_MODULE_SEED=42' then a random number generator seeded by 42 will provide the sequence of RNG seeds for the tests of each module (i.e. file). If your tests pass in this mode, they will do so repeatedly since every test's seed is deterministic. I'm hoping that most people will want to run with no environment variables set. Yes, that means that the tests will get different seeds each time, but those seeds are known and printed out for any test that throws an exception. Even a core-dump can be reproduced in isolation by rerunning the module first with the same seed in a debug mode. So far, this methodology has made it easy for me to track down and fix many of our test's random failures. Most often this has been a case of too-small tolerances or issues with use of the finite difference method, but not always! I'm hoping this PR will be incorporated and will lead to a stronger framework.

…

----- Original Message ----- From: "Marco de Abreu" <notifications@github.com> To: "apache/incubator-mxnet" <incubator-mxnet@noreply.github.com> Cc: "Dick Carter" <dick.carter@comcast.net>, "Mention" <mention@noreply.github.com> Sent: Sunday, November 5, 2017 2:47:07 PM Subject: Re: [apache/incubator-mxnet] Ci test randomness2 (#8526) Hello @DickJC123 , interesting idea! Do you also support the possibility to set a global seed which will be set for each test before it is getting executed? Right now, the only way I see is adding @with_seed(1234) to every single test. My goal is to have a completely deterministic and reproducible behaviour during test execution by using the exact same seed for every single test without having to alter the code. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or mute the thread .

larroy · 2017-11-06T13:48:33Z

I like this approach, but I would like to have fixed seeds by default instead of setting an environment var, mostly for the cognitive load on test runners, we already have issues running tests on different platforms and all, if we add on top that not defining this variable is going to cause flaky tests or random failures, I think it's better to have it the other way around.

piiswrong · 2017-12-12T22:16:51Z

@DickJC123 Ping. Is this a duplicate? I'm going to close the older one

DickJC123 · 2017-12-12T22:25:49Z

Yes, the 'randomness2' is an update to my thinking, so the old one can be closed. I can rebase this PR if you are interested in accepting it. The feedback was generally favorable, I just got busy so I didn't lobby heavily for its acceptance. I believe it adds new tools for diagnosing CI failures, helping it to pass reliably, yet operate with random data from run to run. -Dick

…

----- Original Message ----- From: "Eric Junyuan Xie" <notifications@github.com> To: "apache/incubator-mxnet" <incubator-mxnet@noreply.github.com> Cc: "Dick Carter" <dick.carter@comcast.net>, "Mention" <mention@noreply.github.com> Sent: Tuesday, December 12, 2017 2:17:18 PM Subject: Re: [apache/incubator-mxnet] Ci test randomness2 (#8526) @DickJC123 Ping. Is this a duplicate? I'm going to close the older one — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or mute the thread .

marcoabreu · 2017-12-12T22:29:54Z

So this PR is complete? I'll see if I got time to look into it more closely. It would be great if you could rebase it in order to test everything on the new CI. We're having a lot of issues with flaky tests, so this could come in handy. Am 12.12.2017 11:26 nachm. schrieb "Dick Carter" <notifications@github.com>:

…

Yes, the 'randomness2' is an update to my thinking, so the old one can be closed. I can rebase this PR if you are interested in accepting it. The feedback was generally favorable, I just got busy so I didn't lobby heavily for its acceptance. I believe it adds new tools for diagnosing CI failures, helping it to pass reliably, yet operate with random data from run to run. -Dick ----- Original Message ----- From: "Eric Junyuan Xie" ***@***.***> To: "apache/incubator-mxnet" ***@***.***> Cc: "Dick Carter" ***@***.***>, "Mention" < ***@***.***> Sent: Tuesday, December 12, 2017 2:17:18 PM Subject: Re: [apache/incubator-mxnet] Ci test randomness2 (#8526) @DickJC123 Ping. Is this a duplicate? I'm going to close the older one — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or mute the thread . — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8526 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ARxB65tXmnp0U3RCOJBFKSzqNAScYs35ks5s_v2EgaJpZM4QQk-h> .

piiswrong · 2018-01-31T19:54:10Z

@DickJC123 ping

marcoabreu · 2018-02-26T18:48:55Z

@DickJC123 any update on this?

piiswrong · 2018-02-27T20:03:20Z

I think this can be closed. @DickJC123 opened another PR for this and it was merged.

DickJC123 and others added 30 commits November 2, 2017 14:05

Expanded atol of depthwise_convolution test and re-randomized seeds.

738480b

Fixed test_broadcast_binary_op min/max gradient calc issues.

3f4e368

Add atol to test_embedding

f11f194

Fixed test_rcbrt_op gradient calc issues.

3f60ede

Added 'with rng_seed(NNN):' facility to isolate tests needing determi…

c3cc2da

…nism.

Fixed test_deformable_psroipooling failure.

3c009d1

Fixing pylint.

c313de1

Fixed test_reciprocal_op failure.

ebdec23

Fixed test_cbrt_op failure.

0df64d7

Exploring having more of test_operator_gpu.py use random data.

e708fc0

Change write of seed to stderr.

8658d35

Change all unittest seed setting to use 'with rng_seed()'.

c0d42a3

Add missing python import.

33bf99d

Fixed test_ndarray.py:test_dot failure.

656be21

Renamed rng_seed -> np_random_seed. Added file-level seed setting for…

75d876e

… error reproducability.

Fixed test_binary_op modulus failure.

6c7f6a1

Fix test_ndarray.py:test_broadcast_binary failure.

d2a477f

Fix test_svmoutput_with_type failure.

f25dd7d

Fix test_softmax on gpu.

24e79d0

Added with_seed() decorator for test rng isolation and debugging.

04c7b08

Fix test_autograd.py:test_function failure. Add MXNET_TEST_COUNT=<NNN>.

b294cd5

Removed xrange use, not in Python3.

812d0a7

Removed mxnet test_utils.py dependency on nose.

8b2fca8

Fix test_relu failure.

8c72522

Keep two new exception-testing tests from adding stacktraces to outpu…

de1cf9c

…t logs.

Fix test_reduce failure.

a0c9bb5

Update mshadow to pick up gpu rng determism, tested now in test_rando…

9e3d123

…m.py.

Fix pylint.

173f37c

Fix test_order failure.

6ec9f93

Fix test_correlation failure.

6b82935

DickJC123 and others added 6 commits November 2, 2017 15:51

Fix test_regression. Set seed for test_deformable_psroipooling, needs…

5b8b968

… more study.

Improved robustness of test_sample_multinomial.

18df038

Improve test_relu robustness.

41c7200

Adding with_seed to 1 test and unsetting seeds in optimizer tests

ee66844

Make test_sequential_warning pass if run repeatedly.

6d7b459

Make test_batchnorm_with_type more robust.

d3d9b35

DickJC123 added 2 commits November 3, 2017 13:32

Changed rng seed from uint32 to int32 for WIN python compatibility.

69b4f6e

Merge branch 'fork_master' into ci_test_randomness2

4200f80

bhavinthaker mentioned this pull request Jan 14, 2018

Flaky Tests Tracking Issue #9412

Closed

DickJC123 mentioned this pull request Feb 14, 2018

CI test randomness 3 #9791

Merged

7 tasks

piiswrong closed this Feb 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ci test randomness2 #8526

Ci test randomness2 #8526

DickJC123 commented Nov 3, 2017

DickJC123 commented Nov 3, 2017

marcoabreu commented Nov 5, 2017

DickJC123 commented Nov 6, 2017 via email

larroy commented Nov 6, 2017

piiswrong commented Dec 12, 2017

DickJC123 commented Dec 12, 2017 via email

marcoabreu commented Dec 12, 2017 via email

piiswrong commented Jan 31, 2018

marcoabreu commented Feb 26, 2018

piiswrong commented Feb 27, 2018

Ci test randomness2 #8526

Ci test randomness2 #8526

Conversation

DickJC123 commented Nov 3, 2017

Description

Checklist

Essentials

Changes

Comments

DickJC123 commented Nov 3, 2017

marcoabreu commented Nov 5, 2017

DickJC123 commented Nov 6, 2017 via email

larroy commented Nov 6, 2017

piiswrong commented Dec 12, 2017

DickJC123 commented Dec 12, 2017 via email

marcoabreu commented Dec 12, 2017 via email

piiswrong commented Jan 31, 2018

marcoabreu commented Feb 26, 2018

piiswrong commented Feb 27, 2018