Add stage 3 adjustment ratios #1193

andersonfrailey · 2017-02-13T19:07:55Z

This PR adds a "stage 3" to the extrapolation/blowup process. This has been discussed extensively in TaxData PR #58 and TaxCalc issue #1110. Results of adding this step can be seen in this notebook.

Key changes are:

Adding puf_ratios.csv. This file contains the adjustment ratios
Modifying records.py to read and apply the adjustment ratios
Update results in pufcsv_xxx_expect.txt files and reform_results.txt
Add --adjust argument to inctax.py and modify incometaxio.py to accept that input

The implementation of stage 3 required adding a new variable to the PUF that indicates what adjustment ratio needs to be applied to each record, so a new PUF file must be issued that contains this. More information on this variable can be found in TaxData issue #58.

@martinholmer @MattHJensen @codykallen

…stment

…stment # Conflicts: # taxcalc/comparison/reform_results.txt # taxcalc/tests/pufcsv_agg_expect.txt # taxcalc/tests/pufcsv_mtr_expect.txt

…stment # Conflicts: # taxcalc/comparison/reform_results.txt # taxcalc/records.py # taxcalc/tests/pufcsv_agg_expect.txt # taxcalc/tests/pufcsv_mtr_expect.txt

…stment

…stment # Conflicts: # taxcalc/comparison/reform_results.txt # taxcalc/tests/pufcsv_agg_expect.txt

codecov-io · 2017-02-13T19:21:41Z

Codecov Report

Merging #1193 into master will decrease coverage by -0.03%.
The diff coverage is 97.14%.

@@            Coverage Diff             @@
##           master    #1193      +/-   ##
==========================================
- Coverage   98.87%   98.85%   -0.03%     
==========================================
  Files          38       38              
  Lines        3023     3054      +31     
==========================================
+ Hits         2989     3019      +30     
- Misses         34       35       +1

Impacted Files	Coverage Δ
taxcalc/incometaxio.py	`97.48% <100%> (+0.13%)`	✅
taxcalc/records.py	`95.41% <95.65%> (+0.02%)`	✅

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc40a1e...2832d85. Read the comment docs.

martinholmer · 2017-02-14T00:22:48Z

@andersonfrailey, All the unit tests (other than the requires_pufcsv tests) passed on GitHub when you created PR#1193, but the code coverage test failed. PR#1193 adds 8 statements that are not tested in the unit tests, which raises the total number of untested statements from 34 on master to 42 on #1193. Four of the eight are in the Records class and four more are in the IncomeTaxIO class.

Use this page to see exactly which statements need to have new/expanded unit tests.

Covering these eight statements will require enhancements to the test_records.py and test_incometaxio.py files.

@MattHJensen

andersonfrailey · 2017-02-14T16:02:14Z

@martinholmer, thanks for the tips. I worked on some enhancements to the two tests mentioned, but I'm not totally sure I got it right. Would you mind reviewing my latest commit?

martinholmer · 2017-02-14T16:34:36Z

@andersonfrailey said:

I worked on some enhancements to the two tests [you] mentioned, but I'm not totally sure I got it right. Would you mind reviewing my latest commit?

Your test enhancements covered seven of the eight previously new uncovered statements.
So that's great. Thanks for the quick work on this.

When I looked at the one remaining new uncovered statement, which was in the records.py file, I found this new private method you are adding to the Records class:

    def _read_adjust(self, adjust_ratios):
        """
        Read Records adjustment ratios from file or uses specified DataFrame
        as data or creates empty DataFrame if None
        """
        if adjust_ratios is None:
            ADJ = pd.DataFrame({'nothing': []})
            setattr(self, 'ADJ', ADJ)
            return
        if isinstance(adjust_ratios, pd.DataFrame):
            ADJ = adjust_ratios
        elif isinstance(adjust_ratios, six.string_types):
            if os.path.isfile(adjust_ratios):
                ADJ = pd.read_csv(adjust_ratios, index_col=0)
                ADJ = ADJ.transpose()
            else:
                ADJ = Records._read_egg_csv('adjust_ratios',
                                            Records.ADJUST_RATIOS_FILENAME)
        else:
            msg = ('adjust_ratios is not None or a string'
                   'or a Pandas DataFrame')
            raise ValueError(msg)
        setattr(self, 'ADJ', ADJ)

The one remaining uncovered statement in this function is the read_egg_csv call. As you've probably noticed when looking at the unit test results, we have never figured out a way to add unit tests for these read_egg_csv calls, which are really needed by TaxBrain (not Tax-Calculator). So, given our limited understanding of this, the addition of another input data file in #1193 involves an unavoidable extra uncovered statement.

However, in looking at this code there is one thing that must be changed. My understanding of the "egg magic" is that the egg contains the exact same puf_ratios.csv file as we have stored on disk in the taxcalc directory (although it might be compressed for efficiency sake). If my understanding is correct, then the file read from the egg must be transposed (just as you have transposed the file read from disk). This means you need to move the ADJ = ADJ.transpose() statement down so it is the last statement in the elif isinstance(adjust_ratios, six.string_types): code block.

Does that make sense? If so, then fix it and then I'll be happy to merge #1193.

Question about test coverage:
Can anyone suggest a way to add unit tests for the read_egg_csv calls in Tax-Calculator?

@MattHJensen @Amy-Xu @PeterDSteinberg @zrisher

PeterDSteinberg · 2017-02-14T17:21:44Z

@martinholmer In the quoted code above:

            if os.path.isfile(adjust_ratios):
                ADJ = pd.read_csv(adjust_ratios, index_col=0)
                ADJ = ADJ.transpose()
            else:
                ADJ = Records._read_egg_csv('adjust_ratios',
                                            Records.ADJUST_RATIOS_FILENAME)

ADJ should be the same whether loaded locally with pd.read_csv or Records._read_egg_csv.

A test of _read_egg_csv might be just a simple test that runs both pd.read_csv and Records._read_egg_csv, as they are called here, then asserts the ADJ output dataframe is identical.

andersonfrailey · 2017-02-14T18:45:41Z

@martinholmer, thanks for the clarification. I've added transpose() to the egg file.

@PeterDSteinberg, are you suggesting something like this in test_records.py:

adj = pd.read_csv(Records.ADJUST_RATIOS_PATH)
adj = adj.transpose()
adj_egg = Records._read_egg_csv('adjust_ratios', Records.ADJUST_RATIOS_FILENAME)
adj_egg = adj_egg.transpose()
assert adj == adj_egg

?

martinholmer · 2017-02-14T19:10:00Z

@andersonfrailey changed the code to be this way:

            if os.path.isfile(adjust_ratios):
                ADJ = pd.read_csv(adjust_ratios, index_col=0)
                ADJ = ADJ.transpose()
            else:
                ADJ = Records._read_egg_csv('adjust_ratios',
                                            Records.ADJUST_RATIOS_FILENAME)
                ADJ = ADJ.transpose()

I'm sorry I wasn't more clear. There is no need for code duplication. I meant to suggest this:

            if os.path.isfile(adjust_ratios):
                ADJ = pd.read_csv(adjust_ratios, index_col=0)
            else:
                ADJ = Records._read_egg_csv('adjust_ratios',
                                            Records.ADJUST_RATIOS_FILENAME)
            ADJ = ADJ.transpose()

martinholmer · 2017-02-14T19:19:50Z

@PeterDSteinberg said:

A test of _read_egg_csv might be just a simple test that runs both pd.read_csv and Records._read_egg_csv, as they are called here, then asserts the ADJ output dataframe is identical.

Thanks for the suggestion. Are you saying the the code below is going to execute just fine in our unit tests? If I have just the Tax-Calculator source code and no EGG, what's going to happen when I try to "grab vname data from EGG distribution"?

    def _read_egg_csv(vname, fpath, **kwargs):
        """
        Read csv file with fpath containing vname data from EGG;
        return dict of vname data.
        """
        try:
            # grab vname data from EGG distribution
            path_in_egg = os.path.join('taxcalc', fpath)
            vname_fname = resource_stream(
                Requirement.parse('taxcalc'), path_in_egg)
            vname_dict = pd.read_csv(vname_fname, **kwargs)
        except (DistributionNotFound, IOError):
            msg = 'could not read {} file from EGG'
            raise ValueError(msg.format(vname))
        return vname_dict

martinholmer · 2017-02-14T20:13:31Z

@andersonfrailey, taxcalc pull request #1193 -- and associated taxdata pull request 58 -- are substantial contributions that open up the possibility of more accurate distributions of income types other than taxable interest. Thank you for all this work.

I'm merging #1193 now, so from now on the new puf.csv file that was distributed by @andersonfrailey on 13-Feb-2017 is needed to get correct answers from Tax-Calculator.

@MattHJensen @feenberg @Amy-Xu @GoFroggyRun @codykallen @PeterDSteinberg

MattHJensen · 2017-02-14T20:54:03Z

Thanks @andersonfrailey!

andersonfrailey added 24 commits January 23, 2017 11:16

Add adjustment factors

aa11508

First round of test changes

adab05d

Second round of test changes

cb317be

Third round of test changes

2eac4fd

Commit before major changes

f86bb8d

Fix test_incometaxio errors

5675868

Fix all but pufcsv_agg erros

dfb2b4d

Merge remote-tracking branch 'open-source-economics/master' into Adju…

03c92e2

…stment

Final commit

e13d2fa

Fix syntax

33a45e5

commit updates

e22b9cb

commit updates

84548a4

Merge remote-tracking branch 'open-source-economics/master' into Adju…

a46076b

…stment # Conflicts: # taxcalc/comparison/reform_results.txt # taxcalc/tests/pufcsv_agg_expect.txt # taxcalc/tests/pufcsv_mtr_expect.txt

Merge remote-tracking branch 'open-source-economics/master' into Adju…

2d85b4a

…stment # Conflicts: # taxcalc/comparison/reform_results.txt # taxcalc/tests/pufcsv_agg_expect.txt # taxcalc/tests/pufcsv_mtr_expect.txt

Merge remote-tracking branch 'open-source-economics/master' into Adju…

1213046

…stment # Conflicts: # taxcalc/comparison/reform_results.txt # taxcalc/records.py # taxcalc/tests/pufcsv_agg_expect.txt # taxcalc/tests/pufcsv_mtr_expect.txt

working commmit

0f50672

Merge remote-tracking branch 'open-source-economics/master' into Adju…

78fa688

…stment

Merge remote-tracking branch 'open-source-economics/master' into Adju…

62cd3b6

…stment

transpose adjustment ratios

f4f0894

update tests

53933e6

Merge remote-tracking branch 'open-source-economics/master' into Adju…

30d5ac9

…stment # Conflicts: # taxcalc/comparison/reform_results.txt # taxcalc/tests/pufcsv_agg_expect.txt

Final tests

aeef63e

fix tests

b76b581

fix tests

a5c2d5a

talumbau added the in progress label Feb 13, 2017

enhance tests

de873ba

transpose adj in read egg

24f9625

transpose adj in read egg

2832d85

martinholmer merged commit bbf7b6e into PSLmodels:master Feb 14, 2017

talumbau removed the in progress label Feb 14, 2017

MattHJensen mentioned this pull request Feb 15, 2017

Release 0.7.6 #1182

Closed

martinholmer mentioned this pull request Apr 12, 2017

Adding an extra step to extraploation/records blowup #1110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add stage 3 adjustment ratios #1193

Add stage 3 adjustment ratios #1193

andersonfrailey commented Feb 13, 2017

codecov-io commented Feb 13, 2017 •

edited

Loading

martinholmer commented Feb 14, 2017 •

edited

Loading

andersonfrailey commented Feb 14, 2017

martinholmer commented Feb 14, 2017 •

edited

Loading

PeterDSteinberg commented Feb 14, 2017

andersonfrailey commented Feb 14, 2017

martinholmer commented Feb 14, 2017

martinholmer commented Feb 14, 2017

martinholmer commented Feb 14, 2017

MattHJensen commented Feb 14, 2017

Add stage 3 adjustment ratios #1193

Add stage 3 adjustment ratios #1193

Conversation

andersonfrailey commented Feb 13, 2017

codecov-io commented Feb 13, 2017 • edited Loading

Codecov Report

martinholmer commented Feb 14, 2017 • edited Loading

andersonfrailey commented Feb 14, 2017

martinholmer commented Feb 14, 2017 • edited Loading

PeterDSteinberg commented Feb 14, 2017

andersonfrailey commented Feb 14, 2017

martinholmer commented Feb 14, 2017

martinholmer commented Feb 14, 2017

martinholmer commented Feb 14, 2017

MattHJensen commented Feb 14, 2017

codecov-io commented Feb 13, 2017 •

edited

Loading

martinholmer commented Feb 14, 2017 •

edited

Loading

martinholmer commented Feb 14, 2017 •

edited

Loading