Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dropq test fail like in issue 1367; then fix bug #1387

Merged
merged 9 commits into from
Jun 6, 2017
Merged

Make dropq test fail like in issue 1367; then fix bug #1387

merged 9 commits into from
Jun 6, 2017

Conversation

martinholmer
Copy link
Collaborator

@martinholmer martinholmer commented Jun 3, 2017

Commit 6610a62 in this pull request revises an existing dropq test to have a policy reform like the one supplied by @PeterDSteinberg on June 2, 2017, in issue #1367. The revised test generates a fatal error indicating a bug somewhere in the Tax-Calculator code base. A subsequent commit in this pull request will fix this bug.

The policy reform in the new failing test consists of a single reform provision: an increase in 2017 of the FICA tax rate from 0.124 to 0.200. But other values of the FICA tax rate cause the exact same test to pass as shown below:

0.22        PASS
0.21        FAIL
0.20        FAIL
0.19999     PASS
0.19        PASS

For the record, here is the error traceback report:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

dropq/dropq.py:70: in run_nth_year_tax_calc_model
    itax_sum2, ptax_sum2, comb_sum2) = dropq_summary(rawres1, rawres2, mask)

dropq/dropq_utils.py:293: in dropq_summary
    df1, df2 = drop_records(df1, df2, mask)

dropq/dropq_utils.py:257: in drop_records
    df2['flag_dec'] = gp2_dec['mask'].transform(chooser)

/Users/mrh/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py:2990: in transform
    result.index = self._selected_obj.index

/Users/mrh/anaconda/lib/python2.7/site-packages/pandas/core/generic.py:2983: in __setattr__
    return object.__setattr__(self, name, value)
pandas/_libs/src/properties.pyx:65: in pandas._libs.lib.AxisProperty.__set__ (pandas/_libs/lib.c:45103)
    ???

/Users/mrh/anaconda/lib/python2.7/site-packages/pandas/core/series.py:308: in _set_axis
    self._data.set_axis(axis, labels)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SingleBlockManager
Items: Int64Index([     0,      1,      2,      3,      4, ...219812],
           dtype='int64', length=219813)
BoolBlock: 219813 dtype: bool
axis = 0
new_labels = Int64Index([   108,     46,     54,     26,     53,     68,     28,     24,
  ..., 172375,
            143487, 214120],
           dtype='int64', length=219814)

    def set_axis(self, axis, new_labels):
        new_labels = _ensure_index(new_labels)
        old_len = len(self.axes[axis])
        new_len = len(new_labels)
    
        if new_len != old_len:
            raise ValueError('Length mismatch: Expected axis has %d elements, '
                             'new values have %d elements' %
>                            (old_len, new_len))
E           ValueError: Length mismatch: Expected axis has 219813 elements, new values have 219814 elements

/Users/mrh/anaconda/lib/python2.7/site-packages/pandas/core/internals.py:2836: ValueError

@martinholmer
Copy link
Collaborator Author

martinholmer commented Jun 3, 2017

Commit 7016899 makes the dropq_utils.py dropq_calculate function better able to handle policy reforms that are focused on payroll taxes rather than on income taxes. But these changes do not fix the bug detected by the revised dropq test implemented in commit 6610a62.

The Travis-CI error associated with 6610a62 seems to be a GitHub error: the failure was only on Python 3.5 because the tests never statted. All the tests passed under Python 2.7 and 3.6.

@codecov-io
Copy link

codecov-io commented Jun 3, 2017

Codecov Report

Merging #1387 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1387      +/-   ##
==========================================
+ Coverage   99.64%   99.64%   +<.01%     
==========================================
  Files          38       38              
  Lines        2819     2826       +7     
==========================================
+ Hits         2809     2816       +7     
  Misses         10       10
Impacted Files Coverage Δ
taxcalc/dropq/dropq_utils.py 100% <100%> (ø) ⬆️
taxcalc/utils.py 99.32% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2fadfed...0523649. Read the comment docs.

@martinholmer
Copy link
Collaborator Author

Commit 5a79fa4 fixes the bug. This bug has existed for years, back to the beginning of work on utils.py functions. For example, look at the add_weighted_income_bins function as it existed on February 4, 2016, by following this link.

So, this bug has nothing to do with the recent refactoring of the dropq code or with the changes in pandas when moving from version 0.18 to 0.20. It has appeared now simply because @brittainhard happened to specify a very particular policy reform that revealed the bug.

@PeterDSteinberg @brittainhard @MattHJensen

@martinholmer
Copy link
Collaborator Author

More about the bug in the add_weighted_income_bins Tax-Calculator utility function.

The revisions to the taxcalc/utils.py add_weighted_income_bins function in #1387 corrected two logic problems. The first was the fatal error described in the pull request and in issue #1367. The second was a logic problem that has never caused a fatal error but would have whenever the add_weighted_income_bins function was called with weight_by_income_measure=True as a parameter.

Both problems would cause the add_weighted_income_bins function to fail because not all the filing units are put into histogram bins. Why does this happen? The pandas.cut function converts a continuous variable into a categorical (or discrete-valued) variable using the values of the continuous variable and a set of bin edges. The problem with the old version of the add_weighted_income_bins function was that it incorrectly specified the bin edges.

The first problem was caused by the less-than-perfect precision of floating-point arithmetic performed by computers. The old code counted on perfect precision to get the top edge of the last bin to be precisely equal to the maximum value of the continuous variable. But the less-than-perfect precision of the computer caused a few filing units to not be placed into the last bin. You can see that is the case because commenting out just one of the new lines of code (see below) causes the error to reoccur. The following code works fine:

    bin_width = cumsum_range / float(num_bins)
    bin_edges = [-9e99] + list(np.arange(1, (num_bins + 1)) * bin_width)
    bin_edges[-1] += 9e9  # add to top of last bin to include all observations

while the following code generates a fatal error in the new test:

    bin_width = cumsum_range / float(num_bins)
    bin_edges = [-9e99] + list(np.arange(1, (num_bins + 1)) * bin_width)
    # bin_edges[-1] += 9e9  # add to top of last bin to include all observations

The second problem was that the old add_weighted_income_bins function code assumed incorrectly that every value of the continuous variable was non-negative. So, in the old code the bottom of the first bin was specified to be zero. This is correct whenever the function's weight_by_income_measure parameter is False, which it is by default. But if this parameter is set to True, all the filing units with negative expanded_income will not be placed in the first bin and this situation will generate a fatal error. If you're thinking there can't be any filing units with negative expanded_income, then think again. Here is a simple tabulation under current-law policy:

$ tc puf.csv 2017 --sqldb
You loaded data for 2009.
Tax-Calculator startup automatically extrapolated your data to 2017.
$ sqlite3 puf-17-#-#.db
SQLite version 3.13.0 2016-05-18 10:57:30
Enter ".help" for usage hints.
sqlite> select count(*),round(sum(s006)*1e-6,3) from dump;
219814|170.42
sqlite> select count(*),round(sum(s006)*1e-6,3) from dump where expanded_income<0;
5211|1.475
sqlite> .quit
$ 

So, the puf.csv sample has almost 2.4 percent of unweighted filing units with negative expanded_income. The weighted fraction is slightly less than 0.9 percent. These are small fractions, but with thousands not being put into the ill-defined first bin, the fatal error would have occurred. (I'm assuming that the negative expanded_incomes are usually caused by large capital losses or big business losses, but I really don't know the details of these people's situations.)

@MattHJensen @feenberg @Amy-Xu @andersonfrailey @GoFroggyRun @codykallen
@PeterDSteinberg @brittainhard

@martinholmer
Copy link
Collaborator Author

Note that the Saturday afternoon test failures are not really Tax-Calculator test failures, but the inability of Continuum Analytics web servers to transfer conda packages to Travis-CI. I don't know which end of that transaction is not working.

@martinholmer
Copy link
Collaborator Author

Pull request #1387 changes the basic dropq algorithm in two ways. I want to emphasize this here so that those involved in the original development of that algorithm can give these changes a careful review.

Here is the relevant part of the dropq_calculate function with the two new/revised lines marked:

    # optionally compute mask
    if mask_computed:
        # create pre-reform Calculator instance with extra income
        recs1p = Records(data=taxrec_df.copy(deep=True),
                         gfactors=growfactors_pre)
        # add one dollar to total wages and salaries of each filing unit
        recs1p.e00200 += 1.0  # pylint: disable=no-member
        recs1p.e00200p += 1.0  # pylint: disable=no-member       <---- NEW LINE ----
        policy1p = Policy(gfactors=growfactors_pre)
        # create Calculator with recs1p and calculate for start_year
        calc1p = Calculator(policy=policy1p, records=recs1p,
                            consumption=consump)
        while calc1p.current_year < start_year:
            calc1p.increment_year()
        calc1p.calc_all()
        assert calc1p.current_year == start_year
        # compute mask that shows which of the calc1 and calc1p results differ
        res1 = results(calc1)
        res1p = results(calc1p)
        mask = (res1.combined != res1p.combined)                  <---- REVISED LINE ----
    else:
        mask = None

I added the NEW LINE so that payroll taxes would change as well as income taxes.
Apparently this dropq code was written before there was an appreciation that FICA is an individual tax, not an income-tax filing-unit tax.

I REVISED the other line so that the mask is defined by changes in the combined tax, not just by changes in the income tax amount.

Does this make sense? Or has my lack of knowledge about the dropq algorithm led me to make a mistake?

@MattHJensen @feenberg @talumbau

@talumbau
Copy link
Member

talumbau commented Jun 3, 2017

I don't know the meaning of the e00200p variable, but I can tell you the original intention behind this portion of the code. The idea, as explained to me by @feenberg and @MattHJensen, is to create "noise" in the final results by dropping tax units in a careful way. The tax units that are eligible to be dropped are only units that are impacted by the reform. In other words, if the unit pays the same amount of tax under current policy and under the reform, then it is not eligible to be dropped. This is the purpose behind creating the mask variable. I hope this provides enough detail to help inform whether the changes you make in this PR are the right way to go. I think it will likely be up to your best judgement, as well as @feenberg and @MattHJensen, to determine if this change is best.

@feenberg
Copy link
Contributor

feenberg commented Jun 3, 2017 via email

@martinholmer
Copy link
Collaborator Author

@feenberg said:

I would prefer it if we left the dropq code [that computes the mask variable] dependent on the income tax only. ...[gives reasons]...

OK, I've changed back to the original definition of mask:

        mask = (res1.iitax != res1p.iitax)

@martinholmer martinholmer changed the title Make dropq test fail like in issue #1367 Make dropq test fail like in issue 1367 Jun 4, 2017
@martinholmer martinholmer changed the title Make dropq test fail like in issue 1367 Make dropq test fail like in issue 1367; then fix bug Jun 5, 2017
@MattHJensen
Copy link
Contributor

@martinholmer, thanks for identifying and resolving this problem. I agree w Dan about using iitax only for the mask. The other changes look good to me too. Would be helpful to have @Amy-Xu's review as well.

@martinholmer
Copy link
Collaborator Author

@MattHJensen said:

thanks for identifying and resolving this problem. I agree w Dan about using iitax only for the mask. The other changes look good to me too. Would be helpful to have @Amy-Xu's review as well.

The definition of the mask array has already been switched back to its original definition.
I'll await review by @Amy-Xu before merging #1387.

year_n = analysis_year - start_year
reform = {
'_FICA_ss_trt': [0.2]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: the reason of switching to this reform is that it will change not only iitax but also payroll right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Martin got started on this PR because that reform failed: #1387 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Amy-Xu, @MattHJensen is exactly correct. The new reform (with just a payroll tax rate increase) was the reform reported by @PeterDSteinberg as causing a fatal error. So, I revised the test to be like in his bug report and worked until I found what was causing that test failure. Does this make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that makes sense.

@Amy-Xu
Copy link
Member

Amy-Xu commented Jun 5, 2017

@martinholmer said

The revisions to the taxcalc/utils.py add_weighted_income_bins function in #1387 corrected two logic problems. The first was the fatal error described in the pull request and in issue #1367. The second was a logic problem that has never caused a fatal error but would have whenever the add_weighted_income_bins function was called with weight_by_income_measure=True as a parameter.

All the changes in this PR look good to me but I guess I'm confused on two things. First is regarding the first problem -- if this line of code is a problem, why is it only a problem for payroll reforms but not iitax reforms? Shouldn't it always result in missing the last record? Second is about what records need to be included in bins. I believe we have ran into the second problem before that negative expanded income people are not included in tables, which was discussed in this issue when Joe Sullivan brought it up. After we checked the TPC table example linked in that issue, we concluded it was fine to leave out negative income people. Do we need to refer to TPC tables or we simply want to make everything look consistent? @MattHJensen

@martinholmer
Copy link
Collaborator Author

@Amy-Xu asked first:

... regarding the first problem -- if this line of code is a problem, why is it only a problem for payroll reforms but not iitax reforms? Shouldn't it always result in missing the last record?

I think this a floating-point arithmetic precision issue. I don't think it has anything to do with payroll tax vs income tax reforms. My guess is that the bug report happened to involve a FICA-only reform, but this problem could have arisen with any kink of tax reform.

@Amy-Xu also asked:

Second is about what records need to be included in bins. I believe we have ran into the second problem before that negative expanded income people are not included in tables, which was discussed in this issue when Joe Sullivan brought it up. After we checked the TPC table example linked in that issue, we concluded it was fine to leave out negative income people. Do we need to refer to TPC tables or we simply want to make everything look consistent?

The second logic problem in #1387 has nothing to do with the issue of whether or not TaxBrain tables show people with negative expanded_income. It has to do with avoiding a fatal error in the calculations. I don't think the add_weighted_income_bins utility function is ever called by TaxBrain with weight_by_income_measure=True, which is the only way the second logic problem would have arisen.

Does all this make sense?

@Amy-Xu
Copy link
Member

Amy-Xu commented Jun 5, 2017

@martinholmer You're right this function hasn't been called by TaxBrain with weighted_by_income_measure=True. Sorry I wasn't being clear. It appears to me that your revision would include negative-expanded-income people regardless weighted_by_income_measure is true or false. Is that correct? If so, is that an intended result of this PR?

@martinholmer
Copy link
Collaborator Author

@Amy-Xu said:

You're right this function hasn't been called by TaxBrain with weighted_by_income_measure=True. Sorry I wasn't being clear. It appears to me that your revision would include negative-expanded-income people regardless weighted_by_income_measure is true or false. Is that correct? If so, is that an intended result of this PR?

Yes and Yes. There is no change in the basic logic of the add_weighted_income_bins function when weighted_by_income_measure=False, which is the way it is usually called.

taxcalc/utils.py Outdated
cumsum_range = max_cumsum - min_cumsum
bin_width = cumsum_range / float(num_bins)
bin_edges = [-9e99] + list(np.arange(1, (num_bins + 1)) * bin_width)
bin_edges[-1] = 9e99 # raise top of last bin to include all observations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm there's still a minor issue with weighted_by_income_measure option set to True. Since bin_width is stack on top of zero due to the logic of list(np.arange(1, (num_bins + 1)) * bin_width) snippet, the first bin would be larger than expected when cumulated sum is weighed by income_measure and push all other bin edges to the right by the amount equal to the absolute value of min_cumsum, doesn't it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Amy-Xu said:

Ignore the previous comment. This section looks good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment was a previous one. Sorry about that. Do you think this current comment is valid?

@martinholmer
Copy link
Collaborator Author

martinholmer commented Jun 5, 2017

@Amy-Xu, I appreciate your close review of pull request #1387. Is there anything else you are reviewing, or do you think it is OK to merge it into the master branch?

@MattHJensen @feenberg @andersonfrailey

@Amy-Xu
Copy link
Member

Amy-Xu commented Jun 5, 2017

@martinholmer Could you take a look at my latest comment above? Sorry about back and forth on this -- just want to make sure the bins with weighted_by_income_measure set to True is the same as you expected.

@martinholmer
Copy link
Collaborator Author

@Amy-Xu, You are commenting at two different levels in GitHub, so I have lost track about what you mean by "my latest comment above".

@Amy-Xu
Copy link
Member

Amy-Xu commented Jun 5, 2017

@martinholmer my bad, this one:
screen shot 2017-06-05 at 1 59 49 pm

@martinholmer
Copy link
Collaborator Author

martinholmer commented Jun 5, 2017

@Amy-Xu said about this code:

   cumsum_range = max_cumsum - min_cumsum
   bin_width = cumsum_range / float(num_bins)
   bin_edges = [-9e99] + list(np.arange(1, (num_bins + 1)) * bin_width)
   bin_edges[-1] = 9e99  # raise top of last bin to include all observations

Hmmmm there's still a minor issue with weighted_by_income_measure option set to True. Since bin_width is stacked on top of zero due to the logic of list(np.arange(1, (num_bins + 1)) * bin_width) snippet, the first bin would be larger than expected when cumulated sum is weighed by income_measure and push all other bin edges to the right by the amount equal to the absolute value of min_cumsum, doesn't it?

Let's review the pandas.cut function. Part of the documentation says this:
bins [1,2,3,4] indicate (1,2], (2,3], (3,4]
The first item in the bin_edges list is the bottom of the first bin and the last item in the bin_edges list is the top of the last bin. So, we can set the bottom of the first bin to minus infinity and the top of the last bin to plus infinity, and everything is OK as long as the intermediate values in the bin_edge list are proper. The expression list(np.arange(1, (num_bins + 1)) * bin_width) simply sets the top of the second through the last bin at the appropriate percentile value of the continuous variable being "cut" into bins. And then we make the top of the last bin be positive infinity in order to get all the observations with high values placed in the top bin.

If this explanation is not an effective response to your concerns, then I'm not understanding exactly what your concern is. The code is exactly the same as before except that the bottom of the first bin has been made very small and the top of the last bin has been made very large.

If I'm not being responsive, my apologies. Ask the question in a different way and maybe I'll understand better your concerns.

@Amy-Xu
Copy link
Member

Amy-Xu commented Jun 5, 2017

The code is exactly the same as before except that the bottom of the first bin has been made very small and the top of the last bin has been made very large.

I agree with this part when weighted_by_income_measure is False, but it doesn't seem like so when that option is True, the reason being min_cumsum is very likely negative and therefore each bin is larger than previous version. If max_cumsum is 160 and min_cumsum is -40, then in previous version the bin size was 16 and in this version it would be 20, given 10 bins in total.

I would assume when bin_width is calculated as (max_cumsum - min_cumsum)/num_bins, it suggests that people get cumsum_temp between -40 to -20 are thrown into the lowest bin. If that's the expectation, that doesn't seem consistent with reality. As the bin_edges looks like [-9e99, 20, 40, ..., 140, 9e99], the lowest would include people from -40 to 20 instead of -40 to -20. Similarly the bins afterward would be pushed right by about 40. Is this a clearer way to describe my concern? If this is intended, I have finished review. This PR looks good to me.

@martinholmer
Copy link
Collaborator Author

@Amy-Xu said:

The code is exactly the same as before except that the bottom of the first bin has been made very small and the top of the last bin has been made very large.

I agree with this part when weighted_by_income_measure is False, but it doesn't seem like so when that option is True, the reason being min_cumsum is very likely negative and therefore each bin is larger than previous version. If max_cumsum is 160 and min_cumsum is -40, then in previous version the bin size was 16 and in this version it would be 20, given 10 bins in total.

I would assume when bin_width is calculated as (max_cumsum - min_cumsum)/num_bins, it suggests that people get cumsum_temp between -40 to -20 are thrown into the lowest bin. If that's the expectation, that doesn't seem consistent with reality. As the bin_edges looks like [-9e99, 20, 40, ..., 140, 9e99], the lowest would include people from -40 to 20 instead of -40 to -20. Similarly the bins afterward would be pushed right by about 40. Is this a clearer way to describe my concern? If this is intended, I have finished review. This PR looks good to me.

Thanks for being patient with me in expressing your concerns. I think you are right about this, so I'll change it. I'd appreciate your review of the change I make later this afternoon.

@martinholmer
Copy link
Collaborator Author

@Amy-Xu, I've taken what I understand to be you point and fixed the add_weighted_income_bins utility function logic whenweighted_by_income_measure=True.
I would appreciate it if you could look over commit 0523649 to see if I understood your point correctly.

@Amy-Xu
Copy link
Member

Amy-Xu commented Jun 6, 2017

@martinholmer Thank you very much for the modification. I really appreciate your time on this. My point was a minor issue. Since no one has risen any problem with setting up bins in the current form, I'm happy not to worry anything about it. This PR looks good to me.

@martinholmer
Copy link
Collaborator Author

@Amy-Xu, Thanks again for your careful review of pull request #1387. I'll merge it now.

@martinholmer martinholmer merged commit fda035e into PSLmodels:master Jun 6, 2017
@martinholmer martinholmer deleted the add-dropq-test branch June 6, 2017 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants