Regression testing #316

hdoupe · 2017-09-11T14:04:19Z

Addresses PRs #309, #232, and webapp-public issue #539.

The initial commits just get the ball rolling on the regression testing plan. I ran the 10 reforms that @jdebacker listed in this comment. I used the Pool class from the multiprocessing module and was able to run the reforms in parallel. Using my macbook with 4 cores, it took about 12 hours to run all of these reforms. I used a modified version of ogusa/run_examples/run_ogusa_serial.py in order to do this. The outputs are 20 directories, a reform and baseline directory for each reform.

TODO:

Verify that the reforms are created correctly (correct parameters, using puf.csv or cps.csv)
Perform sensitivity analysis to make sure that results are not changing dramatically to small changes in the reforms
Choose tolerance for differences between future results and saved results

I have some concerns with the initial commits. It only took 12 hours to run all 10 reforms and I expected it to take much longer. So, I probably did not set the parameters correctly. Further, I cannot tell whether the puf.csv file or test data was used since I cannot find a place in OG-USA where puf.csv is explicitly passed to taxcalc.Records.

@jdebacker @rickecon @MattHJensen @PeterDSteinberg

jdebacker · 2017-09-12T19:36:15Z

run_examples/run_ogusa_serial.py

-    kwargs={'output_base':output_base, 'baseline_dir':BASELINE_DIR,
-            'test':True, 'time_path':True, 'baseline':False,
+    kwargs={'output_base':output_base, 'baseline_dir':baseline_dir,
+            'test':False, 'time_path':True, 'baseline':False,
            'analytical_mtrs':False, 'age_specific':True,
            'user_params':user_params,'guid':'', 'reform':reform ,
            'run_micro':False, 'small_open': False, 'budget_balance':False,


@hdoupe One setting affecting your run times is the kwarg run_micro:False. run_micro is a boolean that is true if the tax functions are to be estimated. This takes about 2 hours on my MacBook Pro (thought it's embarrassingly parallelizable). You will need this to be true unless you have pickle files with the saved parameters. You can look at the parameters.py script to see how those pickle files are reference in the cast the tax function parameters for the reforms are pre-estimated. They do use the guid, so you'll probably want to set that to be specific to each reform.

Thanks @jdebacker. I'll look into that.

Should small_open be set to True?

Let's set it to false for these runs. I think that's a more interesting case of the model.

And keep budget_balance=False.

Ok sounds good. Thanks

hdoupe · 2017-09-13T21:23:23Z

@jdebacker One more question before I run the reforms again. Where should puf.csv be in the OG-USA` directory? I'm having trouble figuring out where that path is set.

jdebacker · 2017-09-13T21:52:04Z

@hdoupe Good question. I think it needs to be in the directory that you have the run_ogusa_serial.py script.

I don't know Tax-Calc we'll enough - is there anyway we can run this such that it gets puf.csv from the Tax-Calc directory? i.e., so we don't have to have this file in more than one place.

BTW, now the that you are doing things in parallel, you might rename that script to something else like run_ogusa.py or ogusa_regression_parallel.py or similar.

hdoupe · 2017-09-13T22:01:17Z

@jdebacker Ok I'll give that at shot.

I don't know Tax-Calc we'll enough - is there anyway we can run this such that it gets puf.csv from the Tax-Calc directory? i.e., so we don't have to have this file in more than one place.

That would work if I used my local TaxCalc repo but I'm not sure if we can do that with conda install. It looks like get_micro_data.get_calculator takes a keyword argument, data. I'll try to figure out what's getting passed to that.

BTW, now the that you are doing things in parallel, you might rename that script to something else like run_ogusa.py or ogusa_regression_parallel.py or similar.

Ok that makes sense. Now I see that you have run_many_ogusa_parallel.py that's pretty similar to the script in this PR. I'll merge those.

Thanks

hdoupe · 2017-09-19T14:43:42Z

Commit Estimate tax functions, explicitly set puf data incorporates suggestions from @jdebacker. I'm running the reforms now. Hopefully, I'll have results in a day or two.

hdoupe · 2017-09-29T15:28:41Z

@jdebacker I have finally run all of the reforms. It took some time to figure out how to set the path for the tax function parameter files. I had to devote a lot of my time to the webapp the last few weeks. Going forward, I'll get to spend more time on this project. Hopefully, the turn around time on these issues should decrease.

Some of the reforms had some errors but I edited them just to get the model to run. Can you take a look at the edited reforms to make sure that's what you want?

Some of the reforms are slight variations of other reforms. I plotted the percent change for the similar reforms together and the unrelated reforms by themselves. The plots are in run_examples/regression_plots and the file that generated them is run_examples/regression_plots.py. Most of the results look pretty sensible in that the reforms that were similar had similar looking plots and the differences between them made sense. Can you take a look at these results and let me know what you think?

@rickecon @MattHJensen

jdebacker · 2017-09-29T16:16:05Z

@hdoupe This is great - thanks. I'll take a look and get back you to about the results.

rickecon · 2017-10-03T02:23:30Z

@hdoupe @jdebacker . These 10 reforms look good. I @hdoupe and I went over them last Friday, and reforms 0, 1, 2, 6, 7, 8, and 9 checked out. I just finished checking 3, 4, and 5 today, and the results check out.

In verifying the validity of these OG-USA runs, as well as others in the past, a pattern of common steps to follow emerges.

Start with the steady-state to make sure those results make sense. They usually do, but they also include the effects of the budget closure rules.
Look at the differences in time paths of the aggregate variables to see if their results are intuitive. Often, the steady-state is intuitive while the transition path is more difficult to explain.
Test whether the reform has the expected effect on the tax functions. Create a difference in ETR's, MRTx's (labor income), and MTRy's (capital income) from the basellne to the reform and make sure that the change fits the expectation.
Check the baseline and reform time paths of aggregate labor and the underlying difference in the distribution of labor supply over the time path.
Check the baseline and reform time paths of the aggregate capital stock and the underlying difference in the distribution of savings over the time path. This is where some difficulty often arises. You have demographic effects and you have the fact that investment is the change over time in the aggregate capital stock. For example, the difference in the aggregate capital stock can be negative, but the difference in aggregate investment can be positive.

…gression testing to regression

hdoupe · 2017-11-01T21:11:50Z

I think this PR is close to being ready to go, but there are a few things that need to be done first.

First, I made a few changes that really need to be added in separate PRs.

I added a data keyword argument to execute.runner, txfunc.get_tax_func_estimate, txfunc.tax_func_estimate, and get_micro_data.get_data. This allowed me to load a dataframe and pass that to get_micro_data.get_calculator. This isn't necessary since Tax-Calculator looks for puf.csv in the current working directory. However, I think this is a useful capability because it allows you to specify puf.csv.gz or potentially use the CPS file. This should not be an invasive change since we just need to add on a data keyword to the functions listed above.
I ran into some problems when specifying the directory where the results were sent. For example, you can only specify the GUI ID in parameters.get_parameters which is where the tax function variables are read. I wanted to put all of the output files in one directory with a name of my choosing, such as REG_REFORM_2. I made some changes in txfunc.get_tax_func_estimate and parameters.get_parameters that allowed me to do this. I think these changes make setting where the output is stored more explicit and I think that makes the model easier to use.

Second, I'm not sure where the tolerance should be set for each parameter. By default, I set the relative tolerance to 0.001. I think it will take a few modifications to the code and subsequent tests to get the tolerances dialed in. However, if anyone has any recommendations for where they should be set now, then I'm willing to change them.

@jdebacker @rickecon @PeterDSteinberg what are your thoughts on this?

jdebacker · 2017-11-01T23:26:11Z

@hdoupe I like both your additions (1) and (2). Nice work!

Re the tolerance. By relative, do you mean that 0.001 is looks for a 0.1% difference, and if larger, flags this for an issue? If so, I think that's a reasonable starting point. As you said, we may need to adjust this.

hdoupe · 2017-11-02T13:18:25Z

@jdebacker Thanks, I'll try to get those up today.

By relative tolerance I mean 0.1%. I just used rtol in numpy.allclose

hdoupe · 2017-11-14T20:08:18Z

@jdebacker @rickecon do you think the regression data should be compressed? Compressed, all of the regression data is 93 MB.

Also, what are the next steps for this PR? I think the tolerances will need some adjusting, but it's hard to do that without having results to compare to the regression results.

jdebacker · 2017-11-17T23:23:20Z

@hdoupe Sure, let's compress. I don't see the downside.

Shall we go ahead and merge this? We can make any adjustments to the tolerance in another PR.

hdoupe · 2017-11-27T18:19:48Z

@jdebacker Ok, sounds good. The most recent commit compresses the files. This should be good to go.

jdebacker · 2017-11-27T18:25:33Z

thanks @hdoupe ! Merging.

Regression testing

Henry Doupe added 2 commits September 1, 2017 14:03

Initial test run

0f508c7

First pass at regression testing

f43d819

talumbau added the in progress label Sep 11, 2017

jdebacker reviewed Sep 12, 2017

View reviewed changes

Estimate tax functions, explicitly set puf data

863f2ed

Henry Doupe added 2 commits September 29, 2017 09:15

Run regression tests and set path to tax fn estimates

c0c3983

Add temporary files for plotting initial results

43c42bd

Henry Doupe added 2 commits September 29, 2017 14:10

Merge remote-tracking branch 'upstream/master' into regression_testing

5d1104b

Give pckl file name in case where directory is specified but not file

d7a77ff

hdoupe mentioned this pull request Oct 13, 2017

Performance improvement - check file / directory operations' usages #318

Closed

hdoupe mentioned this pull request Oct 26, 2017

Reinstate ogusa ospc-org/ospc.org#710

Open

Move regression testing with jenkins to regression_jenkins and new re…

138e04c

…gression testing to regression

hdoupe mentioned this pull request Oct 27, 2017

Update OG-USA environment #322

Merged

Henry Doupe added 9 commits October 30, 2017 18:05

Use pytest

fc87039

Add test for tpi output

c8b6599

Add test for tax function variables

eeb4df0

Add comments and final cleanup

65b3b79

Fix path setting and rename

7d7e1ac

Add regression directory to pytest testpaths

7570a29

Mark regression

30643a3

Remove fake path setting that was added for test dev

291d5c9

Add argparse to regression reform runner

6ec3370

Merge with master and resolve conflicts

34f5b47

Move regression results into zip file

ffcf5cd

jdebacker merged commit c3d8bfe into PSLmodels:master Nov 27, 2017

jdebacker mentioned this pull request Feb 20, 2018

Budget balance regress test 276 #277

Closed

keiirizawa pushed a commit to keiirizawa/OG-USA that referenced this pull request Mar 20, 2019

Merge pull request PSLmodels#316 from hdoupe/regression_testing

42b39dd

Regression testing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression testing #316

Regression testing #316

hdoupe commented Sep 11, 2017 •

edited

Loading

jdebacker Sep 12, 2017

hdoupe Sep 12, 2017

jdebacker Sep 12, 2017

jdebacker Sep 12, 2017

hdoupe Sep 12, 2017

hdoupe commented Sep 13, 2017

jdebacker commented Sep 13, 2017

hdoupe commented Sep 13, 2017

hdoupe commented Sep 19, 2017

hdoupe commented Sep 29, 2017

jdebacker commented Sep 29, 2017

rickecon commented Oct 3, 2017

hdoupe commented Nov 1, 2017

jdebacker commented Nov 1, 2017

hdoupe commented Nov 2, 2017

hdoupe commented Nov 14, 2017

jdebacker commented Nov 17, 2017

hdoupe commented Nov 27, 2017

jdebacker commented Nov 27, 2017

Regression testing #316

Regression testing #316

Conversation

hdoupe commented Sep 11, 2017 • edited Loading

jdebacker Sep 12, 2017

Choose a reason for hiding this comment

hdoupe Sep 12, 2017

Choose a reason for hiding this comment

jdebacker Sep 12, 2017

Choose a reason for hiding this comment

jdebacker Sep 12, 2017

Choose a reason for hiding this comment

hdoupe Sep 12, 2017

Choose a reason for hiding this comment

hdoupe commented Sep 13, 2017

jdebacker commented Sep 13, 2017

hdoupe commented Sep 13, 2017

hdoupe commented Sep 19, 2017

hdoupe commented Sep 29, 2017

jdebacker commented Sep 29, 2017

rickecon commented Oct 3, 2017

hdoupe commented Nov 1, 2017

jdebacker commented Nov 1, 2017

hdoupe commented Nov 2, 2017

hdoupe commented Nov 14, 2017

jdebacker commented Nov 17, 2017

hdoupe commented Nov 27, 2017

jdebacker commented Nov 27, 2017

hdoupe commented Sep 11, 2017 •

edited

Loading