Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make test suite more random and add add_noise_to_params, move to Github Actions, etc.. #320

Merged
merged 31 commits into from
Feb 13, 2020

Conversation

tobiasraabe
Copy link
Member

@tobiasraabe tobiasraabe commented Jan 28, 2020

Current behavior

  • The tests take too much time and by that waste ressources.
  • If we generate random models, we reuse the random models generated by seeds 0-20. Thus, some edge cases might stay undetected. We could use some more randomness.

Solution / Implementation

Random tests

  • First, I removed tests test generated by seeds.
  • I added pytest-randomly which generates a timestamp based seed for every test session and the random number generator is reset to this seed before every test. Using this seed we have one random model, but different models for every test session, Small win.
  • At last, I intervene in pytests generation of tests. For every test using the argument "model_or_seed", I rerun the test five times with the seeds [seed + i for i in range(5)] where seed is the seed generated by pytest-randomly.

Thus, we test multiple random models per session and different models across sessions. Thus, we have a better coverage than ever before. Now, it only depends on how diverse are the random models and how good are the tests.

The tests are still reproducible because every test session outputs --randomly-seed=<value> before each session which can be set as a pytest argument to re-run the tests.

Others

  • I also added a function add_noise_to_params() to vary the parameters of an existing model.
  • We know have coverage reports.
  • We moved from Azure to Github Actions which is less error-prone, has better integration.

@codecov
Copy link

codecov bot commented Feb 11, 2020

Codecov Report

Merging #320 into master will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #320   +/-   ##
=======================================
  Coverage   81.67%   81.67%           
=======================================
  Files          38       38           
  Lines        2467     2467           
=======================================
  Hits         2015     2015           
  Misses        452      452           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee4f0df...ee4f0df. Read the comment docs.

@tobiasraabe tobiasraabe changed the title Make test suite more random and add add_noise_to_params. Make test suite more random and add add_noise_to_params, move to Github Actions, etc.. Feb 12, 2020
@@ -35,6 +31,46 @@ def validate_options(o):
)


def validate_params(params, optim_paras):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont get the gain of this function am I missing something or is this style guide stuff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we sorted the shocks within respy. Unfortunately, if we optimize with estimagic, estimagic assumes that the shocks are already in the correct format for the constraint before params even reaches respy. Now, we throw an error if the shock matrix is not correctly sorted.


corrs_flat = []
for i, c_1 in enumerate(choices):
for c_2 in choices[: i + 1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there an easier way to get all combinations permutations of two elements with itertools or a similar package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the different labels for diagonal elements and lower triangular elements I would say no :). Using itertools would probably involve generating more than necessary and deleting some elements. I found this to be cleaner because labels are generated in the same order as required by the index.

respy/pre_processing/model_processing.py Show resolved Hide resolved
>>> np.round(np.bincount(choices), decimals=-3) / n_samples
array([0.4, 0. , 0.6])

"""
if isinstance(choices, int):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHy should we have so many cases/choices at such a low level? WOuldnt it be better to make the container of choices uniform at some point?

Copy link
Member Author

@tobiasraabe tobiasraabe Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the function is low-level, it is actually a vectorized replacement for numpy.random.choice. It is only used in two different places, but needs to support ints and dict keys as choices. tuples and lists are added because it does not change anything. I think one could remove the np.ndarray check and the conversion to scalars at the end and defer some steps to where it is applied, but it reduces code and mental effort in other places if we keep it this way.

@tobiasraabe tobiasraabe merged commit 187bad3 into master Feb 13, 2020
@tobiasraabe tobiasraabe deleted the random-tests branch February 13, 2020 16:10
@tobiasraabe tobiasraabe added this to the 2.0.0 milestone Feb 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants