Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUF Bug Fixes #150

Merged
merged 2 commits into from
Feb 13, 2018
Merged

PUF Bug Fixes #150

merged 2 commits into from
Feb 13, 2018

Conversation

andersonfrailey
Copy link
Collaborator

This PR fixes some of the bugs created in PR #114. Namely, it includes n1820 in the final puf.csv file, converts all of the PUF columns to int64 types, and replaces p87521 with e87521.

In add_nonfilers.py, I modified the code to use a deepcopy of the non filer data frame to address a Pandas warning message. It doesn't affect the results in any way.

cc @martinholmer

@martinholmer
Copy link
Contributor

@andersonfrailey said:

This PR fixes some of the bugs created in PR #114. Namely, it includes n1820 in the final puf.csv file, converts all of the PUF columns to int64 types, and replaces p87521 with e87521.

Haven't looked at the code changes, but "converts all of the PUF columns to int64 types" is not the way the 2009 puf.csv file was formatted. Look at this Tax-Calculator code from the Records class constructor:

 def __init__(self,
                 data='puf.csv',
                 exact_calculations=False,
                 gfactors=Growfactors(),
                 weights=PUF_WEIGHTS_FILENAME,
                 adjust_ratios=PUF_RATIOS_FILENAME,
                 benefits=None,
                 start_year=PUFCSV_YEAR):
        # pylint: disable=too-many-arguments
        self.__data_year = start_year
        # read specified data
        self._read_data(data, exact_calculations)
        # check that three sets of split-earnings variables have valid values
        msg = 'expression "{0} == {0}p + {0}s" is not true for every record'
        tol = 0.020001  # handles "%.2f" rounding errors
        if not np.allclose(self.e00200, (self.e00200p + self.e00200s),
                           rtol=0.0, atol=tol):
            raise ValueError(msg.format('e00200'))
        if not np.allclose(self.e00900, (self.e00900p + self.e00900s),
                           rtol=0.0, atol=tol):
            raise ValueError(msg.format('e00900'))
        if not np.allclose(self.e02100, (self.e02100p + self.e02100s),
                           rtol=0.0, atol=tol):
            raise ValueError(msg.format('e02100'))
        # check that ordinary dividends are no less than qualified dividends
        other_dividends = np.maximum(0., self.e00600 - self.e00650)
        if not np.allclose(self.e00600, self.e00650 + other_dividends,
                           rtol=0.0, atol=tol):
            msg = 'expression "e00600 >= e00650" is not true for every record'
            raise ValueError(msg)
        # handle grow factors

So, as you can see, there were a few variables in the 2009 puf.csv file that had :.2f format; the rest were integers. So, with integers I wound expect these allclose calls to return False.

Look back at the content of the 2009 puf.csv and you'll see what I'm talking about.

Also, what's happened with the p25470 variable?
Has it been dropped out of this newest version of the 2011 puf.csv file?

@martinholmer
Copy link
Contributor

@martinholmer asked:

Also, what's happened with the p25470 variable?
Has it been dropped out of this newest version of the 2011 puf.csv file?

So, it's still in the new 2011 PUF file with all zero values.
If it has really been removed from the 2011 IRS-SOI PUF, then don't we need to remove it from our 2011 puf.csv file? Leaving it at all zeros will cause errors in the correlation-coefficient calculations.

@martinholmer
Copy link
Contributor

@andersonfrailey, Seems like I was wrong about the need for the :.2f format for the variables split into p and s subtotals. That part of the new puf.csv works fine. So, the only remaining problem is the inclusion of the e25470 variables that is all zeros.

@andersonfrailey
Copy link
Collaborator Author

So, it's still in the new 2011 PUF file with all zero values.
If it has really been removed from the 2011 IRS-SOI PUF, then don't we need to remove it from our 2011 puf.csv file? Leaving it at all zeros will cause errors in the correlation-coefficient calculations.

It looks like it's been dropped by the IRS. Thanks for pointing this out. Unfortunately there's no direct mention of this change in the documentation it just disappears.

@martinholmer
Copy link
Contributor

@andersonfrailey, Thanks so much for the quick responses on these lingering problems with the 2011 puf.csv file. So, over the course of this work you found two problems in the 2011 IRS-SOI PUF: the undocumented change in the e variable that we use to compute the box 14 amounts and now the unannounced removal of the p25470 variable. Good work!

@martinholmer
Copy link
Contributor

@andersonfrailey, Using the latest version of puf.csv in Tax-Calculator works smoothly.
So, as far as I'm concerned, you've finished a good job and can merge pull request #150.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants