Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add row for those with zero income to distribution and difference tables #1917

Merged
merged 6 commits into from
Mar 16, 2018
Merged

Add row for those with zero income to distribution and difference tables #1917

merged 6 commits into from
Mar 16, 2018

Conversation

martinholmer
Copy link
Collaborator

This pull request adds a new row for filing units with zero income (either expanded_income or c00100 AGI depending on the table specified) to the distribution and difference tables. After this change, the bottom decile is split into three subgroups: those with negative, zero, and positive income. This idea was first suggested in the discussion of pull request #1902 by @MaxGhenis, who said this:

what do you think about splitting into two separate excluded groups, zero and negative?

The negative and zero subgroups are not "excluded" but they are show along with those in the bottom decile with positive income. This approach allows users of Tax-Calculator to decide for themselves how they want to handle the different subgroups of the bottom income decile. And it also presents all the sample information, so that parts of the table add up to the totals in the table.

Here is a script that illustrates the new tables results using a reform that introduces a tax-exempt UBI of $10K per person:

from __future__ import print_function  # necessary only if using Python 2.7
from taxcalc import *

# select either CPS data or PUF data
recs = Records.cps_constructor()
# recs = Records()

# specify Calculator object for static analysis of current-law policy
pol = Policy()
calc1 = Calculator(policy=pol, records=recs)

cyr = 2020

# specify Calculator object for static analysis of adding a tax-exempt UBI
reform = {cyr: {'_UBI_u18': [1e4], '_UBI_1820': [1e4], '_UBI_21': [1e4],
                '_UBI_ecrt': [1.0]}}
pol.implement_reform(reform)
if pol.reform_errors:  # check for reform error messages
    print(pol.reform_errors)
    exit(1)
calc2 = Calculator(policy=pol, records=recs)

# calculate baseline and reform Calculator objects
calc1.advance_to_year(cyr)
calc1.calc_all()
calc2.advance_to_year(cyr)
calc2.calc_all()

# generate distribution tables for cyr
dist_table1, dist_table2 = calc1.distribution_tables(calc2)
assert isinstance(dist_table1, pd.DataFrame)
assert isinstance(dist_table2, pd.DataFrame)

# generate difference table for cyr by expanded-income decile
diff_table = calc1.difference_table(calc2)
assert isinstance(diff_table, pd.DataFrame)

# print subset of table columns
dropcols = ['c00100', 'num_returns_StandardDed', 'standard',
            'num_returns_ItemDed', 'c04470', 'c04600', 'c04800',
            'num_returns_AMT', 'taxbc', 'c62100', 'c09600',
            'c05800', 'c07100', 'othertaxes', 'refund',
            'iitax', 'payrolltax', 'combined']
print('\n*** BASELINE DISTRIBUTION TABLE ***')
dist_table1.drop(labels=dropcols, axis=1, inplace=True)
print(dist_table1)
print('\n*** REFORM DISTRIBUTION TABLE ***')
dist_table2.drop(labels=dropcols, axis=1, inplace=True)
print(dist_table2)
print('\n*** DIFFERENCE TABLE ***')
dropcols = ['tax_cut', 'perc_cut', 'tax_inc', 'perc_inc',
            'mean', 'tot_change', 'share_of_change']
diff_table.drop(labels=dropcols, axis=1, inplace=True)
print(diff_table)

Here are the results of that script when using CPS input data:

You loaded data for 2014.
Tax-Calculator startup automatically extrapolated your data to 2014.
You loaded data for 2014.
Tax-Calculator startup automatically extrapolated your data to 2014.

*** BASELINE DISTRIBUTION TABLE ***
                 s006       expanded_income       aftertax_income
0-10n       92,422.73    -25,386,840,359.39    -25,621,040,789.38
0-10z    1,254,276.88                  0.00                  0.00
0-10p   16,104,495.79    106,850,586,379.35    105,393,953,706.93
10-20   17,450,543.25    340,761,873,595.51    329,183,757,619.64
20-30   17,452,009.50    510,910,317,796.80    477,865,530,937.27
30-40   17,450,159.56    679,973,214,937.36    619,940,531,724.07
40-50   17,451,453.28    880,129,928,589.23    784,259,753,027.65
50-60   17,452,337.90  1,131,396,824,809.55    986,438,438,803.72
60-70   17,450,563.88  1,460,707,171,937.62  1,251,481,239,188.16
70-80   17,451,919.32  1,906,629,880,037.62  1,608,376,315,263.33
80-90   17,451,170.45  2,616,889,727,443.82  2,155,369,142,464.47
90-100  17,451,603.11  5,577,989,745,187.23  4,320,455,413,375.51
ALL    174,512,955.65 15,186,852,430,354.71 12,613,143,035,321.36
90-95    8,725,281.60  1,819,102,828,823.43  1,461,263,353,999.40
95-99    6,981,161.16  2,156,791,905,258.89  1,699,415,449,338.07
Top 1%   1,745,160.35  1,602,095,011,104.90  1,159,776,610,038.04

*** REFORM DISTRIBUTION TABLE ***
                 s006       expanded_income       aftertax_income
0-10n       92,422.73    -23,606,098,759.39    -23,840,299,189.38
0-10z    1,254,276.88     22,563,467,900.00     22,563,467,900.00
0-10p   16,104,495.79    309,810,469,479.35    308,353,836,806.93
10-20   17,450,543.25    589,772,709,795.51    578,194,593,819.64
20-30   17,452,009.50    788,891,100,696.80    755,846,313,837.27
30-40   17,450,159.56    981,958,443,637.36    921,925,760,424.07
40-50   17,451,453.28  1,212,394,305,589.23  1,116,524,130,027.65
50-60   17,452,337.90  1,489,412,866,909.55  1,344,454,480,903.72
60-70   17,450,563.88  1,843,890,518,337.62  1,634,664,585,588.16
70-80   17,451,919.32  2,317,236,314,737.62  2,018,982,749,963.33
80-90   17,451,170.45  3,054,153,105,543.82  2,592,632,520,564.47
90-100  17,451,603.11  6,037,459,744,887.23  4,779,925,413,075.51
ALL    174,512,955.65 18,623,936,948,754.71 16,050,227,553,721.36
90-95    8,725,281.60  2,046,124,253,023.43  1,688,284,778,199.40
95-99    6,981,161.16  2,343,605,822,258.89  1,886,229,366,338.07
Top 1%   1,745,160.35  1,647,729,669,604.90  1,205,411,268,538.04

*** DIFFERENCE TABLE ***
                count  pc_aftertaxinc
0-10n       92,422.73           -6.95
0-10z    1,254,276.88             inf
0-10p   16,104,495.79          192.57
10-20   17,450,543.25           75.64
20-30   17,452,009.50           58.17
30-40   17,450,159.56           48.71
40-50   17,451,453.28           42.37
50-60   17,452,337.90           36.29
60-70   17,450,563.88           30.62
70-80   17,451,919.32           25.53
80-90   17,451,170.45           20.29
90-100  17,451,603.11           10.63
ALL    174,512,955.65           27.25
90-95    8,725,281.60           15.54
95-99    6,981,161.16           10.99
Top 1%   1,745,160.35            3.93

And here are the results of that script when using PUF input data:

You loaded data for 2011.
Tax-Calculator startup automatically extrapolated your data to 2013.
You loaded data for 2011.
Tax-Calculator startup automatically extrapolated your data to 2013.

*** BASELINE DISTRIBUTION TABLE ***
                 s006       expanded_income       aftertax_income
0-10n    1,305,562.29   -168,985,861,730.86   -185,570,355,626.48
0-10z    3,487,320.70                  0.00        466,898,919.36
0-10p   12,973,591.82     40,500,522,562.26     38,333,329,520.32
10-20   17,767,337.28    189,509,789,895.09    189,716,486,325.71
20-30   17,766,602.25    323,171,614,660.57    323,991,906,103.52
30-40   17,766,724.44    473,881,613,557.18    452,589,578,444.74
40-50   17,766,699.13    654,981,856,057.53    597,526,153,475.13
50-60   17,767,916.19    881,399,084,437.57    776,547,897,931.80
60-70   17,766,131.41  1,172,008,510,465.20    997,784,680,564.94
70-80   17,767,791.47  1,599,376,350,955.92  1,321,331,242,911.61
80-90   17,766,306.77  2,354,567,388,608.72  1,879,088,540,496.92
90-100  17,767,713.02  7,151,247,935,297.36  5,375,911,015,024.08
ALL    177,669,696.77 14,671,658,804,766.53 11,767,717,374,091.65
90-95    8,883,809.28  1,743,445,233,456.81  1,351,614,149,768.29
95-99    7,106,971.20  2,449,960,405,909.09  1,876,537,221,848.36
Top 1%   1,776,932.54  2,957,842,295,931.46  2,147,759,643,407.43

*** REFORM DISTRIBUTION TABLE ***
                 s006       expanded_income       aftertax_income
0-10n    1,305,562.29   -147,529,937,630.86   -164,114,431,526.48
0-10z    3,487,320.70     52,717,792,500.00     53,184,691,419.36
0-10p   12,973,591.82    205,487,522,962.26    203,320,329,920.32
10-20   17,767,337.28    445,141,386,495.09    445,348,082,925.71
20-30   17,766,602.25    619,993,675,660.57    620,813,967,103.52
30-40   17,766,724.44    782,919,246,257.18    761,627,211,144.74
40-50   17,766,699.13    969,778,970,057.53    912,323,267,475.13
50-60   17,767,916.19  1,206,835,788,437.57  1,101,984,601,931.80
60-70   17,766,131.41  1,510,238,342,865.20  1,336,014,512,964.94
70-80   17,767,791.47  1,969,638,880,855.92  1,691,593,772,811.61
80-90   17,766,306.77  2,788,132,067,508.72  2,312,653,219,396.92
90-100  17,767,713.02  7,640,147,313,697.36  5,864,810,393,424.08
ALL    177,669,696.77 18,043,501,049,666.53 15,139,559,618,991.64
90-95    8,883,809.28  1,986,866,898,056.81  1,595,035,814,368.29
95-99    7,106,971.20  2,645,879,272,309.09  2,072,456,088,248.36
Top 1%   1,776,932.54  3,007,401,143,331.46  2,197,318,490,807.43

*** DIFFERENCE TABLE ***
                count  pc_aftertaxinc
0-10n    1,305,562.29          -11.56
0-10z    3,487,320.70       11,291.05
0-10p   12,973,591.82          430.40
10-20   17,767,337.28          134.74
20-30   17,766,602.25           91.61
30-40   17,766,724.44           68.28
40-50   17,766,699.13           52.68
50-60   17,767,916.19           41.91
60-70   17,766,131.41           33.90
70-80   17,767,791.47           28.02
80-90   17,766,306.77           23.07
90-100  17,767,713.02            9.09
ALL    177,669,696.77           28.65
90-95    8,883,809.28           18.01
95-99    7,106,971.20           10.44
Top 1%   1,776,932.54            2.31

@codecov-io
Copy link

codecov-io commented Mar 11, 2018

Codecov Report

Merging #1917 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1917   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          38      38           
  Lines        3605    3605           
======================================
  Hits         3605    3605
Impacted Files Coverage Δ
taxcalc/calculate.py 100% <ø> (ø) ⬆️
taxcalc/utils.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a92f8aa...5e4d646. Read the comment docs.

@hdoupe
Copy link
Collaborator

hdoupe commented Mar 12, 2018

In #1914, @martinholmer said

@hdoupe, Instead of testing #1914, you should test #1917, which contains all the changes in #1914 plus adds a new row to both the distribution table and the difference table.

Thanks for the heads up. I'll jump on this first thing tomorrow morning.

@hdoupe
Copy link
Collaborator

hdoupe commented Mar 14, 2018

@martinholmer PR #1917 looks good to me. I think updating PolicyBrain to use this new table format should be straightforward. What do you think about changing the order of the data to:

[0-10, ..., 80-90, 90-100, 90-95, 95-100, Top 1%, ALL]?

@martinholmer
Copy link
Collaborator Author

@hdoupe said:

PR #1917 looks good to me. I think updating PolicyBrain to use this new table format should be straightforward. What do you think about changing the order of the data to:

[0-10, ..., 80-90, 90-100, 90-95, 95-100, Top 1%, ALL]?

Why?

@hdoupe
Copy link
Collaborator

hdoupe commented Mar 14, 2018

@martinholmer that order just seemed more intuitive to me. However, I don't have any strong opinions on the order of the list.

@martinholmer
Copy link
Collaborator Author

@hdoupe said:

[0-10, ..., 80-90, 90-100, 90-95, 95-99, Top 1%, ALL] that order just seemed more intuitive to me. However, I don't have any strong opinions on the order of the list.

I don't like that ordering because when I move down the table rows adding up the numbers in a column and I reach the ALL row, I expect my accumulating sum to equal the value on the ALL row. And I expect many other users would have the same expectation. I view the 90-95, 95-99, and Top 1% rows as footnotes, which are shown at the bottom of the table.

Does this make sense? Maybe I'm missing something. What makes putting the "footnote" rows before the ALL row intuitive to you?

@hdoupe
Copy link
Collaborator

hdoupe commented Mar 14, 2018

@martinholmer Ok, now that makes sense. Thanks for explaining the rationale. I was thinking that it was strange to have 'ALL' in a place other than the end. However, I agree with your reasoning.

@MaxGhenis
Copy link
Contributor

An alternative could be indenting the "footnote" rows and placing them between 90-100 and TOTAL, but that would require making labels left-aligned instead of center-aligned.

@MattHJensen
Copy link
Contributor

Over at ospc-org/ospc.org#846 I asked about how PolicyBrain users will understand the meaning of "0-10n, 0-10z, 0-10p." I realize now that this same question applies to Tax-Calculator users and probably belongs here. It would be nice to have a solution that doesn't require a footnote or a link.

One solution would be to replace "0-10n, 0-10z, 0-10p" with "0-10: <$0", "0-10: $0", "0-10: >$0" suggested by @hdoupe.

I am somewhat more fond of "0-10neg, 0-10zero, 0-10pos." Note that 0-10zero is two more characters than 90-100, the second longest row label, so there is some cost to the additional clarity.

@martinholmer, what do you think about this? Any other options? Do you think uses will automatically get what the n z and p mean?

cc @hdoupe @MaxGhenis

@martinholmer
Copy link
Collaborator Author

@MattHJensen asked in the discussion of #1917:

what do you think about this? Any other options? Do you think uses will automatically get what the n, z and p mean?

Yes I think users will understand this immediately. In the distribution table, it is obvious that the sum of expanded_income is negative on the row labeled 0-10n, is zero on the row labeled 0-10z, and positive on the row labeled 0-10p. I can't imagine a user who actually reads the table would be confused.

@martinholmer
Copy link
Collaborator Author

@hdoupe said in PolicyBrain pull request 846:

This PR updates to the tables specified in #1917. The update is fairly straight forward. The table labels still need to be updated. For now, they are just the variable names.

@MattHJensen
Copy link
Contributor

Yes I think users will understand this immediately. In the distribution table, it is obvious that the sum of expanded_income is negative on the row labeled 0-10n, is zero on the row labeled 0-10z, and positive on the row labeled 0-10p. I can't imagine a user who actually reads the table would be confused.

Ok. I'm happy to leave it as is and wait to see if we get any questions.

@hdoupe
Copy link
Collaborator

hdoupe commented Mar 14, 2018

@martinholmer said

Yes I think users will understand this immediately.

I disagree. I have not followed the conversation that brought these changes which probably gives me a similar perspective to regular PolicyBrain users. I didn't realize what they meant until @MaxGhenis suggested labeling them "0-10: <$0", "0-10: $0", "0-10: >$0". A few extra characters seems like a cheap way to eliminate the extra step of reasoning required for one to add up the income in each bin and deduce the meanings of the each respective letter.

@martinholmer
Copy link
Collaborator Author

@hdoupe said:

Yes I think users will understand this immediately.

I disagree. I have not followed the conversation that brought these changes which probably gives me a similar perspective to regular PolicyBrain users. I didn't realize what they meant until @MaxGhenis suggested labeling them "0-10: <$0", "0-10: $0", "0-10: >$0". A few extra characters seems like a cheap way to eliminate the extra step of reasoning required for one to add up the income in each bin and deduce the meanings of the each respective letter.

TaxBrain is free to change the row labels however it wants.

@MattHJensen
Copy link
Contributor

MattHJensen commented Mar 14, 2018

TaxBrain is free to change the row labels however it wants.

This is contrary to the way we would like to organize the projects. We would like TaxBrain to display what is in Tax-Calculator, so that PolicyBrain maintainers aren't responsible for understanding all of the content of the models.

Where the rubber really meets the road, I suppose, is that TaxBrain user questions about these tables are going to be directed to the Tax-Calculator project.

@feenberg
Copy link
Contributor

feenberg commented Mar 15, 2018 via email

@MaxGhenis
Copy link
Contributor

What are the upper and lower bounds of each catgory? Why does each category weem to start with 0?
Do the categories overlap?

@feenberg This breaks out the bottom decile into three MECE groups based on baseline income: negative, zero, and positive. Copying my suggestion from ospc-org/ospc.org#846 (comment) for posterity (even if not used, may be helpful for others to understand):

0-10
  <$0
  $0
  >$0
10-20
20-30
30-40
40-50
50-60
60-70
70-80
80-90
90-100
  90-95
  95-99
  Top 1%
TOTAL

@feenberg
Copy link
Contributor

feenberg commented Mar 15, 2018 via email

@MaxGhenis
Copy link
Contributor

@feenberg these are percentiles which cannot be negative. The only reference to dollar amounts is the three MECE groups {<$0, $0, >$0}.

@MattHJensen
Copy link
Contributor

Earlier I asked about updating 0-10n, 0-10z, 0-10p. In retrospect, I think that it would have been a better discussion for a follow on PR. This PR is obviously an improvement over master, so why let tertiary details slow it down? Everything else looks great, so I am merging this. Thanks very much @martinholmer and everyone else for their review and comments.

@MattHJensen MattHJensen merged commit e977a0a into PSLmodels:master Mar 16, 2018
@martinholmer martinholmer deleted the add-table-row branch March 16, 2018 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants