Add row for those with zero income to distribution and difference tables #1917

martinholmer · 2018-03-11T23:07:49Z

This pull request adds a new row for filing units with zero income (either expanded_income or c00100 AGI depending on the table specified) to the distribution and difference tables. After this change, the bottom decile is split into three subgroups: those with negative, zero, and positive income. This idea was first suggested in the discussion of pull request #1902 by @MaxGhenis, who said this:

what do you think about splitting into two separate excluded groups, zero and negative?

The negative and zero subgroups are not "excluded" but they are show along with those in the bottom decile with positive income. This approach allows users of Tax-Calculator to decide for themselves how they want to handle the different subgroups of the bottom income decile. And it also presents all the sample information, so that parts of the table add up to the totals in the table.

Here is a script that illustrates the new tables results using a reform that introduces a tax-exempt UBI of $10K per person:

from __future__ import print_function  # necessary only if using Python 2.7
from taxcalc import *

# select either CPS data or PUF data
recs = Records.cps_constructor()
# recs = Records()

# specify Calculator object for static analysis of current-law policy
pol = Policy()
calc1 = Calculator(policy=pol, records=recs)

cyr = 2020

# specify Calculator object for static analysis of adding a tax-exempt UBI
reform = {cyr: {'_UBI_u18': [1e4], '_UBI_1820': [1e4], '_UBI_21': [1e4],
                '_UBI_ecrt': [1.0]}}
pol.implement_reform(reform)
if pol.reform_errors:  # check for reform error messages
    print(pol.reform_errors)
    exit(1)
calc2 = Calculator(policy=pol, records=recs)

# calculate baseline and reform Calculator objects
calc1.advance_to_year(cyr)
calc1.calc_all()
calc2.advance_to_year(cyr)
calc2.calc_all()

# generate distribution tables for cyr
dist_table1, dist_table2 = calc1.distribution_tables(calc2)
assert isinstance(dist_table1, pd.DataFrame)
assert isinstance(dist_table2, pd.DataFrame)

# generate difference table for cyr by expanded-income decile
diff_table = calc1.difference_table(calc2)
assert isinstance(diff_table, pd.DataFrame)

# print subset of table columns
dropcols = ['c00100', 'num_returns_StandardDed', 'standard',
            'num_returns_ItemDed', 'c04470', 'c04600', 'c04800',
            'num_returns_AMT', 'taxbc', 'c62100', 'c09600',
            'c05800', 'c07100', 'othertaxes', 'refund',
            'iitax', 'payrolltax', 'combined']
print('\n*** BASELINE DISTRIBUTION TABLE ***')
dist_table1.drop(labels=dropcols, axis=1, inplace=True)
print(dist_table1)
print('\n*** REFORM DISTRIBUTION TABLE ***')
dist_table2.drop(labels=dropcols, axis=1, inplace=True)
print(dist_table2)
print('\n*** DIFFERENCE TABLE ***')
dropcols = ['tax_cut', 'perc_cut', 'tax_inc', 'perc_inc',
            'mean', 'tot_change', 'share_of_change']
diff_table.drop(labels=dropcols, axis=1, inplace=True)
print(diff_table)

Here are the results of that script when using CPS input data:

You loaded data for 2014.
Tax-Calculator startup automatically extrapolated your data to 2014.
You loaded data for 2014.
Tax-Calculator startup automatically extrapolated your data to 2014.

*** BASELINE DISTRIBUTION TABLE ***
                 s006       expanded_income       aftertax_income
0-10n       92,422.73    -25,386,840,359.39    -25,621,040,789.38
0-10z    1,254,276.88                  0.00                  0.00
0-10p   16,104,495.79    106,850,586,379.35    105,393,953,706.93
10-20   17,450,543.25    340,761,873,595.51    329,183,757,619.64
20-30   17,452,009.50    510,910,317,796.80    477,865,530,937.27
30-40   17,450,159.56    679,973,214,937.36    619,940,531,724.07
40-50   17,451,453.28    880,129,928,589.23    784,259,753,027.65
50-60   17,452,337.90  1,131,396,824,809.55    986,438,438,803.72
60-70   17,450,563.88  1,460,707,171,937.62  1,251,481,239,188.16
70-80   17,451,919.32  1,906,629,880,037.62  1,608,376,315,263.33
80-90   17,451,170.45  2,616,889,727,443.82  2,155,369,142,464.47
90-100  17,451,603.11  5,577,989,745,187.23  4,320,455,413,375.51
ALL    174,512,955.65 15,186,852,430,354.71 12,613,143,035,321.36
90-95    8,725,281.60  1,819,102,828,823.43  1,461,263,353,999.40
95-99    6,981,161.16  2,156,791,905,258.89  1,699,415,449,338.07
Top 1%   1,745,160.35  1,602,095,011,104.90  1,159,776,610,038.04

*** REFORM DISTRIBUTION TABLE ***
                 s006       expanded_income       aftertax_income
0-10n       92,422.73    -23,606,098,759.39    -23,840,299,189.38
0-10z    1,254,276.88     22,563,467,900.00     22,563,467,900.00
0-10p   16,104,495.79    309,810,469,479.35    308,353,836,806.93
10-20   17,450,543.25    589,772,709,795.51    578,194,593,819.64
20-30   17,452,009.50    788,891,100,696.80    755,846,313,837.27
30-40   17,450,159.56    981,958,443,637.36    921,925,760,424.07
40-50   17,451,453.28  1,212,394,305,589.23  1,116,524,130,027.65
50-60   17,452,337.90  1,489,412,866,909.55  1,344,454,480,903.72
60-70   17,450,563.88  1,843,890,518,337.62  1,634,664,585,588.16
70-80   17,451,919.32  2,317,236,314,737.62  2,018,982,749,963.33
80-90   17,451,170.45  3,054,153,105,543.82  2,592,632,520,564.47
90-100  17,451,603.11  6,037,459,744,887.23  4,779,925,413,075.51
ALL    174,512,955.65 18,623,936,948,754.71 16,050,227,553,721.36
90-95    8,725,281.60  2,046,124,253,023.43  1,688,284,778,199.40
95-99    6,981,161.16  2,343,605,822,258.89  1,886,229,366,338.07
Top 1%   1,745,160.35  1,647,729,669,604.90  1,205,411,268,538.04

*** DIFFERENCE TABLE ***
                count  pc_aftertaxinc
0-10n       92,422.73           -6.95
0-10z    1,254,276.88             inf
0-10p   16,104,495.79          192.57
10-20   17,450,543.25           75.64
20-30   17,452,009.50           58.17
30-40   17,450,159.56           48.71
40-50   17,451,453.28           42.37
50-60   17,452,337.90           36.29
60-70   17,450,563.88           30.62
70-80   17,451,919.32           25.53
80-90   17,451,170.45           20.29
90-100  17,451,603.11           10.63
ALL    174,512,955.65           27.25
90-95    8,725,281.60           15.54
95-99    6,981,161.16           10.99
Top 1%   1,745,160.35            3.93

And here are the results of that script when using PUF input data:

You loaded data for 2011.
Tax-Calculator startup automatically extrapolated your data to 2013.
You loaded data for 2011.
Tax-Calculator startup automatically extrapolated your data to 2013.

*** BASELINE DISTRIBUTION TABLE ***
                 s006       expanded_income       aftertax_income
0-10n    1,305,562.29   -168,985,861,730.86   -185,570,355,626.48
0-10z    3,487,320.70                  0.00        466,898,919.36
0-10p   12,973,591.82     40,500,522,562.26     38,333,329,520.32
10-20   17,767,337.28    189,509,789,895.09    189,716,486,325.71
20-30   17,766,602.25    323,171,614,660.57    323,991,906,103.52
30-40   17,766,724.44    473,881,613,557.18    452,589,578,444.74
40-50   17,766,699.13    654,981,856,057.53    597,526,153,475.13
50-60   17,767,916.19    881,399,084,437.57    776,547,897,931.80
60-70   17,766,131.41  1,172,008,510,465.20    997,784,680,564.94
70-80   17,767,791.47  1,599,376,350,955.92  1,321,331,242,911.61
80-90   17,766,306.77  2,354,567,388,608.72  1,879,088,540,496.92
90-100  17,767,713.02  7,151,247,935,297.36  5,375,911,015,024.08
ALL    177,669,696.77 14,671,658,804,766.53 11,767,717,374,091.65
90-95    8,883,809.28  1,743,445,233,456.81  1,351,614,149,768.29
95-99    7,106,971.20  2,449,960,405,909.09  1,876,537,221,848.36
Top 1%   1,776,932.54  2,957,842,295,931.46  2,147,759,643,407.43

*** REFORM DISTRIBUTION TABLE ***
                 s006       expanded_income       aftertax_income
0-10n    1,305,562.29   -147,529,937,630.86   -164,114,431,526.48
0-10z    3,487,320.70     52,717,792,500.00     53,184,691,419.36
0-10p   12,973,591.82    205,487,522,962.26    203,320,329,920.32
10-20   17,767,337.28    445,141,386,495.09    445,348,082,925.71
20-30   17,766,602.25    619,993,675,660.57    620,813,967,103.52
30-40   17,766,724.44    782,919,246,257.18    761,627,211,144.74
40-50   17,766,699.13    969,778,970,057.53    912,323,267,475.13
50-60   17,767,916.19  1,206,835,788,437.57  1,101,984,601,931.80
60-70   17,766,131.41  1,510,238,342,865.20  1,336,014,512,964.94
70-80   17,767,791.47  1,969,638,880,855.92  1,691,593,772,811.61
80-90   17,766,306.77  2,788,132,067,508.72  2,312,653,219,396.92
90-100  17,767,713.02  7,640,147,313,697.36  5,864,810,393,424.08
ALL    177,669,696.77 18,043,501,049,666.53 15,139,559,618,991.64
90-95    8,883,809.28  1,986,866,898,056.81  1,595,035,814,368.29
95-99    7,106,971.20  2,645,879,272,309.09  2,072,456,088,248.36
Top 1%   1,776,932.54  3,007,401,143,331.46  2,197,318,490,807.43

*** DIFFERENCE TABLE ***
                count  pc_aftertaxinc
0-10n    1,305,562.29          -11.56
0-10z    3,487,320.70       11,291.05
0-10p   12,973,591.82          430.40
10-20   17,767,337.28          134.74
20-30   17,766,602.25           91.61
30-40   17,766,724.44           68.28
40-50   17,766,699.13           52.68
50-60   17,767,916.19           41.91
60-70   17,766,131.41           33.90
70-80   17,767,791.47           28.02
80-90   17,766,306.77           23.07
90-100  17,767,713.02            9.09
ALL    177,669,696.77           28.65
90-95    8,883,809.28           18.01
95-99    7,106,971.20           10.44
Top 1%   1,776,932.54            2.31

codecov-io · 2018-03-11T23:17:33Z

Codecov Report

Merging #1917 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #1917   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          38      38           
  Lines        3605    3605           
======================================
  Hits         3605    3605

Impacted Files	Coverage Δ
taxcalc/calculate.py	`100% <ø> (ø)`	⬆️
taxcalc/utils.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a92f8aa...5e4d646. Read the comment docs.

hdoupe · 2018-03-12T23:38:27Z

In #1914, @martinholmer said

@hdoupe, Instead of testing #1914, you should test #1917, which contains all the changes in #1914 plus adds a new row to both the distribution table and the difference table.

Thanks for the heads up. I'll jump on this first thing tomorrow morning.

hdoupe · 2018-03-14T14:50:26Z

@martinholmer PR #1917 looks good to me. I think updating PolicyBrain to use this new table format should be straightforward. What do you think about changing the order of the data to:

[0-10, ..., 80-90, 90-100, 90-95, 95-100, Top 1%, ALL]?

martinholmer · 2018-03-14T15:13:56Z

@hdoupe said:

PR #1917 looks good to me. I think updating PolicyBrain to use this new table format should be straightforward. What do you think about changing the order of the data to:

[0-10, ..., 80-90, 90-100, 90-95, 95-100, Top 1%, ALL]?

Why?

hdoupe · 2018-03-14T15:17:14Z

@martinholmer that order just seemed more intuitive to me. However, I don't have any strong opinions on the order of the list.

martinholmer · 2018-03-14T15:31:51Z

@hdoupe said:

[0-10, ..., 80-90, 90-100, 90-95, 95-99, Top 1%, ALL] that order just seemed more intuitive to me. However, I don't have any strong opinions on the order of the list.

I don't like that ordering because when I move down the table rows adding up the numbers in a column and I reach the ALL row, I expect my accumulating sum to equal the value on the ALL row. And I expect many other users would have the same expectation. I view the 90-95, 95-99, and Top 1% rows as footnotes, which are shown at the bottom of the table.

Does this make sense? Maybe I'm missing something. What makes putting the "footnote" rows before the ALL row intuitive to you?

hdoupe · 2018-03-14T15:51:38Z

@martinholmer Ok, now that makes sense. Thanks for explaining the rationale. I was thinking that it was strange to have 'ALL' in a place other than the end. However, I agree with your reasoning.

MaxGhenis · 2018-03-14T15:53:15Z

An alternative could be indenting the "footnote" rows and placing them between 90-100 and TOTAL, but that would require making labels left-aligned instead of center-aligned.

MattHJensen · 2018-03-14T18:49:33Z

Over at ospc-org/ospc.org#846 I asked about how PolicyBrain users will understand the meaning of "0-10n, 0-10z, 0-10p." I realize now that this same question applies to Tax-Calculator users and probably belongs here. It would be nice to have a solution that doesn't require a footnote or a link.

One solution would be to replace "0-10n, 0-10z, 0-10p" with "0-10: <$0", "0-10: $0", "0-10: >$0" suggested by @hdoupe.

I am somewhat more fond of "0-10neg, 0-10zero, 0-10pos." Note that 0-10zero is two more characters than 90-100, the second longest row label, so there is some cost to the additional clarity.

@martinholmer, what do you think about this? Any other options? Do you think uses will automatically get what the n z and p mean?

cc @hdoupe @MaxGhenis

martinholmer · 2018-03-14T18:59:17Z

@MattHJensen asked in the discussion of #1917:

what do you think about this? Any other options? Do you think uses will automatically get what the n, z and p mean?

Yes I think users will understand this immediately. In the distribution table, it is obvious that the sum of expanded_income is negative on the row labeled 0-10n, is zero on the row labeled 0-10z, and positive on the row labeled 0-10p. I can't imagine a user who actually reads the table would be confused.

martinholmer · 2018-03-14T19:52:09Z

@hdoupe said in PolicyBrain pull request 846:

This PR updates to the tables specified in #1917. The update is fairly straight forward. The table labels still need to be updated. For now, they are just the variable names.

MattHJensen · 2018-03-14T20:33:13Z

Yes I think users will understand this immediately. In the distribution table, it is obvious that the sum of expanded_income is negative on the row labeled 0-10n, is zero on the row labeled 0-10z, and positive on the row labeled 0-10p. I can't imagine a user who actually reads the table would be confused.

Ok. I'm happy to leave it as is and wait to see if we get any questions.

hdoupe · 2018-03-14T20:41:13Z

@martinholmer said

Yes I think users will understand this immediately.

I disagree. I have not followed the conversation that brought these changes which probably gives me a similar perspective to regular PolicyBrain users. I didn't realize what they meant until @MaxGhenis suggested labeling them "0-10: <$0", "0-10: $0", "0-10: >$0". A few extra characters seems like a cheap way to eliminate the extra step of reasoning required for one to add up the income in each bin and deduce the meanings of the each respective letter.

martinholmer · 2018-03-14T20:50:54Z

@hdoupe said:

Yes I think users will understand this immediately.

I disagree. I have not followed the conversation that brought these changes which probably gives me a similar perspective to regular PolicyBrain users. I didn't realize what they meant until @MaxGhenis suggested labeling them "0-10: <$0", "0-10: $0", "0-10: >$0". A few extra characters seems like a cheap way to eliminate the extra step of reasoning required for one to add up the income in each bin and deduce the meanings of the each respective letter.

TaxBrain is free to change the row labels however it wants.

MattHJensen · 2018-03-14T21:09:36Z

TaxBrain is free to change the row labels however it wants.

This is contrary to the way we would like to organize the projects. We would like TaxBrain to display what is in Tax-Calculator, so that PolicyBrain maintainers aren't responsible for understanding all of the content of the models.

Where the rubber really meets the road, I suppose, is that TaxBrain user questions about these tables are going to be directed to the Tax-Calculator project.

feenberg · 2018-03-15T01:24:40Z

On Wed, 14 Mar 2018, Martin Holmer wrote: @MattHJensen asked in the discussion of #1917: what do you think about this? Any other options? Do you think uses will automatically get what the n, z and p mean? Yes I think users will understand this immediately. In the distribution table, it is obvious that the sum of expanded_income is negative on the row labeled 0-10n, is zero on the row labeled 0-10z, and positive on the row labeled 0-10p. I can't imagine a user who actually reads the table would be confused.

I am confused. What are the upper and lower bounds of each catgory? Why does each category weem to start with 0? Do the categories overlap? dan

…

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.[AHvQVaGKO7JdIohUU15KT8AOnhDx9Qcjks5teWiKgaJpZM4Sl5zK.gif]

MaxGhenis · 2018-03-15T04:13:10Z

What are the upper and lower bounds of each catgory? Why does each category weem to start with 0?
Do the categories overlap?

@feenberg This breaks out the bottom decile into three MECE groups based on baseline income: negative, zero, and positive. Copying my suggestion from ospc-org/ospc.org#846 (comment) for posterity (even if not used, may be helpful for others to understand):

0-10
  <$0
  $0
  >$0
10-20
20-30
30-40
40-50
50-60
60-70
70-80
80-90
90-100
  90-95
  95-99
  Top 1%
TOTAL

feenberg · 2018-03-15T13:56:24Z

On Thu, 15 Mar 2018, Max Ghenis wrote: What are the upper and lower bounds of each catgory? Why does each category weem to start with 0? Do the categories overlap? @feenberg This breaks out the bottom decile into three MECE groups based on baseline income: negative, zero, and positive. Copying my suggestion from ospc-org/ospc.org#846 (comment) for posterity (even if not used, may be helpful for others to understand): 0-10 <$0 $0 >$0

Why not use:

-10-0 0 0-10 or

-10-(-1) 0 1-10 or

-10->0 0

0-10

Otherwise zero income appears to be included in each of the 3 brackets. dan

MaxGhenis · 2018-03-15T15:10:33Z

@feenberg these are percentiles which cannot be negative. The only reference to dollar amounts is the three MECE groups {<$0, $0, >$0}.

MattHJensen · 2018-03-16T19:58:49Z

Earlier I asked about updating 0-10n, 0-10z, 0-10p. In retrospect, I think that it would have been a better discussion for a follow on PR. This PR is obviously an improvement over master, so why let tertiary details slow it down? Everything else looks great, so I am merging this. Thanks very much @martinholmer and everyone else for their review and comments.

Add row to distribution/difference tables

0c93833

martinholmer mentioned this pull request Mar 11, 2018

The tbi run_nth_year_tax_calc_model function now returns full tables #1914

Merged

martinholmer added the ready label Mar 11, 2018

martinholmer requested a review from hdoupe March 11, 2018 23:18

This was referenced Mar 12, 2018

Revise Calculator decile_graph method to use new distribution table #1918

Merged

How should distribution/difference tables handle those with negative income? #1888

Closed

hdoupe mentioned this pull request Mar 13, 2018

Updates to TC tables in Tax-Calculator PR 1917 ospc-org/ospc.org#846

Closed

Merge branch 'master' into add-table-row

52f6147

Improve docs for distribution_tables and difference_table methods

6413a97

martinholmer added 2 commits March 15, 2018 15:28

Merge branch 'master' into add-table-row

2b0e153

Improve docstrings in utils.py file

53bdbb4

Improve docstrings in calculate.py file

5e4d646

MattHJensen merged commit e977a0a into PSLmodels:master Mar 16, 2018

martinholmer deleted the add-table-row branch March 16, 2018 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add row for those with zero income to distribution and difference tables #1917

Add row for those with zero income to distribution and difference tables #1917

martinholmer commented Mar 11, 2018

codecov-io commented Mar 11, 2018 •

edited

Loading

hdoupe commented Mar 12, 2018

hdoupe commented Mar 14, 2018

martinholmer commented Mar 14, 2018

hdoupe commented Mar 14, 2018

martinholmer commented Mar 14, 2018

hdoupe commented Mar 14, 2018

MaxGhenis commented Mar 14, 2018

MattHJensen commented Mar 14, 2018

martinholmer commented Mar 14, 2018

martinholmer commented Mar 14, 2018

MattHJensen commented Mar 14, 2018

hdoupe commented Mar 14, 2018

martinholmer commented Mar 14, 2018

MattHJensen commented Mar 14, 2018 •

edited

Loading

feenberg commented Mar 15, 2018 via email

MaxGhenis commented Mar 15, 2018

feenberg commented Mar 15, 2018 via email

MaxGhenis commented Mar 15, 2018

MattHJensen commented Mar 16, 2018

Add row for those with zero income to distribution and difference tables #1917

Add row for those with zero income to distribution and difference tables #1917

Conversation

martinholmer commented Mar 11, 2018

codecov-io commented Mar 11, 2018 • edited Loading

Codecov Report

hdoupe commented Mar 12, 2018

hdoupe commented Mar 14, 2018

martinholmer commented Mar 14, 2018

hdoupe commented Mar 14, 2018

martinholmer commented Mar 14, 2018

hdoupe commented Mar 14, 2018

MaxGhenis commented Mar 14, 2018

MattHJensen commented Mar 14, 2018

martinholmer commented Mar 14, 2018

martinholmer commented Mar 14, 2018

MattHJensen commented Mar 14, 2018

hdoupe commented Mar 14, 2018

martinholmer commented Mar 14, 2018

MattHJensen commented Mar 14, 2018 • edited Loading

feenberg commented Mar 15, 2018 via email

MaxGhenis commented Mar 15, 2018

feenberg commented Mar 15, 2018 via email

MaxGhenis commented Mar 15, 2018

MattHJensen commented Mar 16, 2018

codecov-io commented Mar 11, 2018 •

edited

Loading

MattHJensen commented Mar 14, 2018 •

edited

Loading