Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Ci test randomness2 #8526

Closed
wants to merge 38 commits into from
Closed

Conversation

DickJC123
Copy link
Contributor

Description

This PR introduces a new simple level of control over unittest random seeds while providing inter-test random number generator (RNG) isolation. This PR is an improved replacement to the pending PR #8313. The improvements over that PR are:

  1. A unittest that fails via an exception will have its seed reported in the test log. Reproducing the failure with the same-seeded data is simple and immediate.
  2. The mx.random seeds are also set (identically to np.random) giving deterministic behavior and test isolation for mxnet cpu and gpu RNG's.
  3. A unittest failure via a core-dump can also be reproduced after the module test is re-run with debugging enabled.

To provide this functionality, a custom decorator "@with_seed()" was created. This was considered more powerful than the nosetests "@with_setup()" facility, and less disruptive than changing all tests to become methods of a nosetests test class. The proposed new approach is demonstrated on a simple "test_module.py" test file of three tests. Assuming that the second test needs a set seed for robustness, the file might currently appear as:

def test_op1():
    <op1 test>

def test_op2():
    np.random.seed(1234)
    <op2 test>

def test_op3():
    <op3 test>

Even though test_op3() is OK with nondeterministic data, it will have only a single dataset because it is run after test_op2, which sets the seed. Also, if test_op1() were to fail, there would be no way to reproduce the failure, except for running the test individually to produce a new and hopefully similar failure.

With the proposed approach, the test file becomes:

from common import *

@with_seed()
def test_op1():
    <op1 test>

@with_seed(1234)
def test_op2():
    <op2 test>

@with_seed()
def test_op3():
    <op3 test>

By importing unittests/common.py, the seeds of the numpy and mxnet RNGs are set initially to a random "module seed" that is output to the log file. The initial RNGs can be thought of as module-level RNGs that are isolated from the individual tests and that provide the string of "test seeds" that determine the behavior of each test's RNGs. The "@with_seed()" test function decorator requests a test seed from the module RNG, sets the numpy and mxnet seeds appropriately and runs the test. Should the test fail, the seed for that test is output to the log file. Pass or fail, the decorator reinstates the module RNG state before the next test's decorator is executed, effectively isolating the tests. Debugging a failing test_op3 in the example would proceed as follows:

$ nosetests --verbose -s test_module.py
[INFO] Setting module np/mx random seeds, use MXNET_MODULE_SEED=3444154063 to reproduce.
test_module.test_op1 ... ok
test_module.test_op2 ... [INFO] Setting test np/mx random seeds, use MXNET_TEST_SEED=1234 to reproduce.
ok
test_module.test_op3 ... [INFO] Setting test np/mx random seeds, use MXNET_TEST_SEED=2096230603 to reproduce.
FAIL
======================================================================
FAIL: test_module.test_op3
----------------------------------------------------------------------
Traceback (most recent call last):
<stack trace appears here>
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx random seeds, use MXNET_TEST_SEED=2096230603 to reproduce.
--------------------- >> end captured logging << ---------------------
----------------------------------------------------------------------
Ran 3 tests in 1.354s
FAILED (failures=1)

Because test_op3 failed, its seed appeared in the log file. Also, the test_op2 seed was displayed as a reminder that that test needs more work before it is robust enough for random data. The command to reproduce the problem is produced simply by cutting and pasting from the log:

$ MXNET_TEST_SEED=2096230603 nostests --verbose -s test_module.py:test_op3

If test_op3 instead dumped core, the test seed would not be initially apparent. Assuming the core-dump is repeatable based on the data, the module would first be re-run with the command:

$ MXNET_MODULE_SEED=3444154063 nostests --logging-level=DEBUG --verbose -s test_module.py

The log would now include the test seeds for all tests before they are run, so the test could then be run in isolation as before with MXNET_TEST_SEED=2096230603.

Let's assume that test_op3 was altered by increasing a tolerance. How robust is the test now? This can be explored by repeating the test many times as in:

$ MXNET_TEST_COUNT=10000 nostests --logging-level=DEBUG --verbose -s test_module.py:test_op3

Finally, this PR adds the @with_seed() decorator for all tests in modules that use random numbers. Also, it includes many specific test robustness fixes that were exposed once this new methodology was adopted internally by the author.

Checklist

Essentials

  • [X ] Passed code style checking (make lint)
  • [X ] Changes are complete (i.e. I finished coding on this PR)
  • [X ] All changes have test coverage
  • [X ] For user-facing API changes, API doc string has been updated.
  • [X ] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • [X ] Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Intersting edge cases to note here

DickJC123 and others added 30 commits November 2, 2017 14:05
@DickJC123
Copy link
Contributor Author

I prefer that the original author of the cpu implementation correct the non-robustness of the float16 test. In the meantime, we could have the entire test ignored or put it on a fixed seed, or comment out just the float16 cpu part and file an issue:
#8509

Which approach would you recommend?

@marcoabreu
Copy link
Contributor

Hello @DickJC123 , interesting idea! Do you also support the possibility to set a global seed which will be set for each test before it is getting executed? Right now, the only way I see is adding @with_seed(1234) to every single test. My goal is to have a completely deterministic and reproducible behaviour during test execution by using the exact same seed for every single test without having to alter the code.

@DickJC123
Copy link
Contributor Author

DickJC123 commented Nov 6, 2017 via email

@larroy
Copy link
Contributor

larroy commented Nov 6, 2017

I like this approach, but I would like to have fixed seeds by default instead of setting an environment var, mostly for the cognitive load on test runners, we already have issues running tests on different platforms and all, if we add on top that not defining this variable is going to cause flaky tests or random failures, I think it's better to have it the other way around.

@piiswrong
Copy link
Contributor

@DickJC123 Ping. Is this a duplicate? I'm going to close the older one

@DickJC123
Copy link
Contributor Author

DickJC123 commented Dec 12, 2017 via email

@marcoabreu
Copy link
Contributor

marcoabreu commented Dec 12, 2017 via email

@piiswrong
Copy link
Contributor

@DickJC123 ping

@DickJC123 DickJC123 mentioned this pull request Feb 14, 2018
7 tasks
@marcoabreu
Copy link
Contributor

@DickJC123 any update on this?

@piiswrong
Copy link
Contributor

I think this can be closed. @DickJC123 opened another PR for this and it was merged.

@piiswrong piiswrong closed this Feb 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants