Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stage 3 adjustment ratios #1193

Merged
merged 27 commits into from
Feb 14, 2017
Merged

Add stage 3 adjustment ratios #1193

merged 27 commits into from
Feb 14, 2017

Conversation

andersonfrailey
Copy link
Collaborator

This PR adds a "stage 3" to the extrapolation/blowup process. This has been discussed extensively in TaxData PR #58 and TaxCalc issue #1110. Results of adding this step can be seen in this notebook.

Key changes are:

  • Adding puf_ratios.csv. This file contains the adjustment ratios
  • Modifying records.py to read and apply the adjustment ratios
  • Update results in pufcsv_xxx_expect.txt files and reform_results.txt
  • Add --adjust argument to inctax.py and modify incometaxio.py to accept that input

The implementation of stage 3 required adding a new variable to the PUF that indicates what adjustment ratio needs to be applied to each record, so a new PUF file must be issued that contains this. More information on this variable can be found in TaxData issue #58.

@martinholmer @MattHJensen @codykallen

…stment

# Conflicts:
#	taxcalc/comparison/reform_results.txt
#	taxcalc/tests/pufcsv_agg_expect.txt
#	taxcalc/tests/pufcsv_mtr_expect.txt
…stment

# Conflicts:
#	taxcalc/comparison/reform_results.txt
#	taxcalc/tests/pufcsv_agg_expect.txt
#	taxcalc/tests/pufcsv_mtr_expect.txt
…stment

# Conflicts:
#	taxcalc/comparison/reform_results.txt
#	taxcalc/records.py
#	taxcalc/tests/pufcsv_agg_expect.txt
#	taxcalc/tests/pufcsv_mtr_expect.txt
…stment

# Conflicts:
#	taxcalc/comparison/reform_results.txt
#	taxcalc/tests/pufcsv_agg_expect.txt
@codecov-io
Copy link

codecov-io commented Feb 13, 2017

Codecov Report

Merging #1193 into master will decrease coverage by -0.03%.
The diff coverage is 97.14%.

@@            Coverage Diff             @@
##           master    #1193      +/-   ##
==========================================
- Coverage   98.87%   98.85%   -0.03%     
==========================================
  Files          38       38              
  Lines        3023     3054      +31     
==========================================
+ Hits         2989     3019      +30     
- Misses         34       35       +1
Impacted Files Coverage Δ
taxcalc/incometaxio.py 97.48% <100%> (+0.13%)
taxcalc/records.py 95.41% <95.65%> (+0.02%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc40a1e...2832d85. Read the comment docs.

@martinholmer
Copy link
Collaborator

martinholmer commented Feb 14, 2017

@andersonfrailey, All the unit tests (other than the requires_pufcsv tests) passed on GitHub when you created PR#1193, but the code coverage test failed. PR#1193 adds 8 statements that are not tested in the unit tests, which raises the total number of untested statements from 34 on master to 42 on #1193. Four of the eight are in the Records class and four more are in the IncomeTaxIO class.

Use this page to see exactly which statements need to have new/expanded unit tests.

Covering these eight statements will require enhancements to the test_records.py and test_incometaxio.py files.

@MattHJensen

@andersonfrailey
Copy link
Collaborator Author

@martinholmer, thanks for the tips. I worked on some enhancements to the two tests mentioned, but I'm not totally sure I got it right. Would you mind reviewing my latest commit?

@martinholmer
Copy link
Collaborator

martinholmer commented Feb 14, 2017

@andersonfrailey said:

I worked on some enhancements to the two tests [you] mentioned, but I'm not totally sure I got it right. Would you mind reviewing my latest commit?

Your test enhancements covered seven of the eight previously new uncovered statements.
So that's great. Thanks for the quick work on this.

When I looked at the one remaining new uncovered statement, which was in the records.py file, I found this new private method you are adding to the Records class:

    def _read_adjust(self, adjust_ratios):
        """
        Read Records adjustment ratios from file or uses specified DataFrame
        as data or creates empty DataFrame if None
        """
        if adjust_ratios is None:
            ADJ = pd.DataFrame({'nothing': []})
            setattr(self, 'ADJ', ADJ)
            return
        if isinstance(adjust_ratios, pd.DataFrame):
            ADJ = adjust_ratios
        elif isinstance(adjust_ratios, six.string_types):
            if os.path.isfile(adjust_ratios):
                ADJ = pd.read_csv(adjust_ratios, index_col=0)
                ADJ = ADJ.transpose()
            else:
                ADJ = Records._read_egg_csv('adjust_ratios',
                                            Records.ADJUST_RATIOS_FILENAME)
        else:
            msg = ('adjust_ratios is not None or a string'
                   'or a Pandas DataFrame')
            raise ValueError(msg)
        setattr(self, 'ADJ', ADJ)

The one remaining uncovered statement in this function is the read_egg_csv call. As you've probably noticed when looking at the unit test results, we have never figured out a way to add unit tests for these read_egg_csv calls, which are really needed by TaxBrain (not Tax-Calculator). So, given our limited understanding of this, the addition of another input data file in #1193 involves an unavoidable extra uncovered statement.

However, in looking at this code there is one thing that must be changed. My understanding of the "egg magic" is that the egg contains the exact same puf_ratios.csv file as we have stored on disk in the taxcalc directory (although it might be compressed for efficiency sake). If my understanding is correct, then the file read from the egg must be transposed (just as you have transposed the file read from disk). This means you need to move the ADJ = ADJ.transpose() statement down so it is the last statement in the elif isinstance(adjust_ratios, six.string_types): code block.

Does that make sense? If so, then fix it and then I'll be happy to merge #1193.

Question about test coverage:
Can anyone suggest a way to add unit tests for the read_egg_csv calls in Tax-Calculator?

@MattHJensen @Amy-Xu @PeterDSteinberg @zrisher

@PeterDSteinberg
Copy link
Contributor

@martinholmer In the quoted code above:

            if os.path.isfile(adjust_ratios):
                ADJ = pd.read_csv(adjust_ratios, index_col=0)
                ADJ = ADJ.transpose()
            else:
                ADJ = Records._read_egg_csv('adjust_ratios',
                                            Records.ADJUST_RATIOS_FILENAME)

ADJ should be the same whether loaded locally with pd.read_csv or Records._read_egg_csv.

A test of _read_egg_csv might be just a simple test that runs both pd.read_csv and Records._read_egg_csv, as they are called here, then asserts the ADJ output dataframe is identical.

@andersonfrailey
Copy link
Collaborator Author

@martinholmer, thanks for the clarification. I've added transpose() to the egg file.

@PeterDSteinberg, are you suggesting something like this in test_records.py:

adj = pd.read_csv(Records.ADJUST_RATIOS_PATH)
adj = adj.transpose()
adj_egg = Records._read_egg_csv('adjust_ratios', Records.ADJUST_RATIOS_FILENAME)
adj_egg = adj_egg.transpose()
assert adj == adj_egg

?

@martinholmer
Copy link
Collaborator

@andersonfrailey changed the code to be this way:

            if os.path.isfile(adjust_ratios):
                ADJ = pd.read_csv(adjust_ratios, index_col=0)
                ADJ = ADJ.transpose()
            else:
                ADJ = Records._read_egg_csv('adjust_ratios',
                                            Records.ADJUST_RATIOS_FILENAME)
                ADJ = ADJ.transpose()

I'm sorry I wasn't more clear. There is no need for code duplication. I meant to suggest this:

            if os.path.isfile(adjust_ratios):
                ADJ = pd.read_csv(adjust_ratios, index_col=0)
            else:
                ADJ = Records._read_egg_csv('adjust_ratios',
                                            Records.ADJUST_RATIOS_FILENAME)
            ADJ = ADJ.transpose()

@martinholmer
Copy link
Collaborator

@PeterDSteinberg said:

A test of _read_egg_csv might be just a simple test that runs both pd.read_csv and Records._read_egg_csv, as they are called here, then asserts the ADJ output dataframe is identical.

Thanks for the suggestion. Are you saying the the code below is going to execute just fine in our unit tests? If I have just the Tax-Calculator source code and no EGG, what's going to happen when I try to "grab vname data from EGG distribution"?

    def _read_egg_csv(vname, fpath, **kwargs):
        """
        Read csv file with fpath containing vname data from EGG;
        return dict of vname data.
        """
        try:
            # grab vname data from EGG distribution
            path_in_egg = os.path.join('taxcalc', fpath)
            vname_fname = resource_stream(
                Requirement.parse('taxcalc'), path_in_egg)
            vname_dict = pd.read_csv(vname_fname, **kwargs)
        except (DistributionNotFound, IOError):
            msg = 'could not read {} file from EGG'
            raise ValueError(msg.format(vname))
        return vname_dict

@martinholmer
Copy link
Collaborator

@andersonfrailey, taxcalc pull request #1193 -- and associated taxdata pull request 58 -- are substantial contributions that open up the possibility of more accurate distributions of income types other than taxable interest. Thank you for all this work.

I'm merging #1193 now, so from now on the new puf.csv file that was distributed by @andersonfrailey on 13-Feb-2017 is needed to get correct answers from Tax-Calculator.

@MattHJensen @feenberg @Amy-Xu @GoFroggyRun @codykallen @PeterDSteinberg

@martinholmer martinholmer merged commit bbf7b6e into PSLmodels:master Feb 14, 2017
@MattHJensen
Copy link
Contributor

Thanks @andersonfrailey!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants