Fix CPS taxable pensions, qualified dividends, and taxable/non-taxable interest #165

andersonfrailey · 2018-03-03T19:36:38Z

This PR implements the ideas for imputing variables described in issues #159, #166, #167.

martinholmer · 2018-03-05T00:09:49Z

Code looks OK, but does it actually produce the kind of distribution of the e01700-to-e01500 ratio that we are looking for?

andersonfrailey · 2018-03-05T16:00:36Z

@martinholmer asked:

does it actually produce the kind of distribution of the e01700-to-e01500 ratio that we are looking for?

Yes. Here are the ratios for units with a non-zero value for e01500

e01700/e01500	CPS	Raw PUF
1	0.4766	0.4806
0	0.1804	0.1772
!= 1 and != 0	0.3429	0.3422

martinholmer · 2018-03-06T13:18:29Z

@andersonfrailey, I guess I didn't make it clear what the taxable-pension imputation procedure should be.
We should talk on the phone this afternoon.

andersonfrailey · 2018-03-07T20:09:16Z

@martinholmer can you review the methodology in my last commit? I've gotten pensions and qualified dividends to line up nicely, but interest income isn't quite right yet.

You can see the results I have so far in this notebook.

andersonfrailey · 2018-03-07T21:56:06Z

@martinholmer is my last commit more what you had in mind? These are the splits I get:

Goal(PUF)
Taxable/Total Ratio for all units where Total > 0: 0.599
Units with all taxable income: 48.801
Prob: 0.882
Units with no taxable income: 0.505
Prob: 0.009
Units with some taxable interest income: 6.056
Prob: 0.109
Frac: 0.453

CPS
Taxable/Total Ratio for all units where Total > 0: 1.0
Units with all taxable income: 59.064
Prob: 0.882
Units with no taxable income: 0.0
Prob: 0.0
Units with some taxable interest income: 7.899
Prob: 0.118
Frac: 0.567

martinholmer · 2018-03-07T22:06:12Z

@andersonfrailey reported these taxable vs non-taxable interest income results:

CPS
Taxable/Total Ratio for all units where Total > 0: 1.0
Units with all taxable income: 59.064
Prob: 0.882
Units with no taxable income: 0.0
Prob: 0.0
Units with some taxable interest income: 7.899
Prob: 0.118
Frac: 0.567

I don't understand the first result

Taxable/Total Ratio for all units where Total > 0: 1.0

when some filing units have non-taxable interest (namely 11.8 percent).

Surely the aggregate taxable-to-total ratio is less than one.

We want that first ratio result to be around 0.6 (not 1.0).

andersonfrailey · 2018-03-07T22:19:47Z

That's rounding. When I divide all of e00300 by total interest income the result is .9999. My interpretation is that the 88% of all tax units who have only taxable interest income also have almost all of interest income in general. This seems backwards to me given your previous finding that tax units with more interest income tend to have less taxable income, but I'm not sure what the best method for correcting this would be.

martinholmer · 2018-03-07T22:37:30Z

@andersonfrailey, can you post here the Python script that tabulates the interest income statistics from the imputed CPS data?

Is the cps.csv.gz file that is part of this pull request, the file that you are tabulating?

martinholmer · 2018-03-08T13:00:50Z

@andersonfrailey proposed this code in PR #165:

# Split interest income into taxable and tax exempt
slope = 93
ratio = 0.60
prob = 1. - slope * (data.INTST / 1000)
probs = np.random.random(len(prob))
data['e00300'] = np.where(prob < probs, data.INTST, data.INTST * ratio)
data['e00400'] = data['INTST'] - data['e00300']

What's the difference between data.INTST and `data['INTST']?

Did you pick slope = 93 to get an all-taxable probability in the CPS of about 0.88?

How did you arrive at ratio = 0.60?

andersonfrailey · 2018-03-08T14:49:05Z

What's the difference between data.INTST and `data['INTST']?

Nothing. They're just different ways of referencing the INTST column in the data frame.

Did you pick slope = 93 to get an all-taxable probability in the CPS of about 0.88?

Yes. I landed at 93 after trial and error.

How did you arrive at ratio = 0.60?

I just picked it as a starting point. I'm still playing with that variable.

andersonfrailey · 2018-03-08T14:49:50Z

Is the cps.csv.gz file that is part of this pull request, the file that you are tabulating?

Yes, it is.

Here is the code I used:

def grouped_interest_income(df):
    df['int_inc'] = df['e00300'] + df['e00400']
    subdf = df[(df['filer'] == 1) & (df['int_inc'] > 0)].copy()
    taxable_total = (subdf['e00300'] * subdf['s006']).sum() / (subdf['int_inc'] * subdf['s006']).sum()
    print(f'Taxable/Total Ratio for all units where Total > 0: {round(taxable_total, 3)}')
    all_taxable = subdf['s006'][subdf['int_inc'] == subdf['e00300']].sum()
    all_taxable_prob = all_taxable / subdf['s006'].sum()
    print(f'Units with all taxable income: {round(all_taxable * 1e-6, 3)}')
    print(f'\tProb: {round(all_taxable_prob, 3)}')
    zero_taxable = subdf['s006'][subdf['e00300'] == 0.0].sum()
    zero_taxable_prob = zero_taxable / subdf['s006'].sum()
    print(f'Units with no taxable income: {round(zero_taxable * 1e-6, 3)}')
    print(f'\tProb: {round(zero_taxable_prob, 3)}')
    some_taxable = subdf['s006'][(subdf['e00300'] != subdf['int_inc']) & (subdf['e00300'] != 0)].sum()
    sometaxabledf = subdf[(subdf['e00300'] != subdf['int_inc']) & (subdf['e00300'])]
    some_taxable_prob = some_taxable / subdf['s006'].sum()
    frac = ((sometaxabledf['s006'] * sometaxabledf['e00300']).sum() /
            (sometaxabledf['s006'] * sometaxabledf['int_inc']).sum())
    print(f'Units with some taxable interest income: {round(some_taxable * 1e-6, 3)}')
    print(f'\tProb: {round(some_taxable_prob, 3)}')
    print(f'\tFrac: {round(frac, 3)}')

martinholmer · 2018-03-08T15:33:03Z

@andersonfrailey, when you call this function

def grouped_interest_income(df):

where does the df argument come from?

andersonfrailey · 2018-03-08T15:38:15Z

where does the df argument come from?

That is a data frame containing the variables used in the function. The data frame comes from the dataframe() method of the calculator class.

martinholmer · 2018-03-08T16:07:18Z

@andersonfrailey said:

The data frame comes from the dataframe() method of the calculator class.

Can show us the script in which that Calculator object is create and used?

andersonfrailey · 2018-03-08T16:50:52Z

Sure, here is a link to the notebook I was working in with all of the code I used.

The calculator object is created in the sixth cell and all of the code that uses in occurs under the "Interest Income" heading.

martinholmer · 2018-03-08T17:32:42Z

@andersonfrailey said:

Did you pick slope = 93 to get an all-taxable probability in the CPS of about 0.88?

Yes. I landed at 93 after trial and error.

How did you arrive at ratio = 0.60?

I just picked it as a starting point. I'm still playing with that variable.

If you set ratio to zero (leaving slope at 93), what do you get in the imputed CPS?

andersonfrailey · 2018-03-08T18:34:34Z

Setting the ratio to zero results in this:

CPS
Taxable/Total Ratio for all units where Total > 0: 1.0
Units with all taxable income: 59.064
	Prob: 0.882
Units with no taxable income: 7.899
	Prob: 0.118
Units with some taxable interest income: 0.0
	Prob: 0.0
	Frac: nan

Frac is nan because when you divide zero by zero in pandas you get nan

martinholmer · 2018-03-08T19:04:17Z

@andersonfrailey said when he set slope=93 and frac=0.0 he got these results:

CPS
Taxable/Total Ratio for all units where Total > 0: 1.0
Units with all taxable income: 59.064
	Prob: 0.882

So, if 11.8 percent of the filing units do not have fully-taxable interest, and if that group has zero taxable interest (because frac=0), how can it be that the overall taxable-to-total ration is 1.0? What am I missing?

andersonfrailey · 2018-03-08T19:14:17Z

So, if 11.8 percent of the filing units do not have fully-taxable interest, and if that group has zero taxable interest (because frac=0), how can it be that the overall taxable-to-total ration is 1.0? What am I missing?

Overall taxable-to-total ratio is actually 0.9998311310914553. The results just show 1.0 because of rounding.

My interpretation is that the 11.8 percent of units without fully taxable interest also have very, very little interest. I'll try and tabulate exactly how much each group has and post the results here.

martinholmer · 2018-03-08T19:26:25Z

@andersonfrailey said:

My interpretation is that the 11.8 percent of units without fully taxable interest also have very, very little interest. I'll try and tabulate exactly how much each group has and post the results here.

Great! Tabulations would help because those with small total interest income are the one's with a fully-taxable probability close to one. The 11.8 percent should be those with considerable total interest income (because they have the low probability).

andersonfrailey · 2018-03-08T20:39:46Z

The 7.899 million who either have some or no taxable income, depending on how we define the ratio, have about $22 million in total interest income.

martinholmer · 2018-03-08T20:52:22Z

@andersonfrailey, can you merge the tip of the master branch (with the recent #158 PR) into the open PR #165 branch so that that branch is up-to-date? Thanks. Let me know when you do this and then I'll download the updated PR #165 and work with it until I understand better what's going on. Then we'll talk again. Does this make sense?

andersonfrailey · 2018-03-08T20:57:48Z

@martinholmer, just merged in the master branch

martinholmer · 2018-03-08T21:10:06Z

@andersonfrailey said:

just merged in the master branch

Thanks, I'll get to work right away.

andersonfrailey · 2018-03-08T21:14:01Z

Thanks, @martinholmer !

martinholmer · 2018-03-09T00:10:14Z

@andersonfrailey, I found a bug in finalprep.py that was causing the weird results when imputing CPS taxable interest income. Let me explain in detail. I want to discuss these topics:

new target statistics to avoid distortion from the puf_ratios.csv adjustments of taxable interest
correction of the interest imputation code in finalprep.py
new imputation parameters

(1) I've posted new PUF target statistics based on tabulations of raw puf.csv data to avoid any possible distortions caused by Tax-Calculator using puf ratios to adjust taxable (but not nontaxable) interest income.

(2) The bug was an incorrect use of the random numbers when imputing taxable interest. The patch for the finalprep.py code is as follows.

     # Split interest income into taxable and tax exempt
-    slope = 93
-    ratio = 0.60
-    prob = 1. - slope * (data.INTST / 1000)
-    probs = np.random.random(len(prob))
-    data['e00300'] = np.where(prob < probs, data.INTST, data.INTST * ratio)  # <== BUG
+    slope = 0.068
+    ratio = 0.46
+    prob = 1. - slope * (data['INTST'] * 1e-3)
+    uniform_rn = np.random.random(len(prob))
+    data['e00300'] = np.where(uniform_rn < prob,
+                              data['INTST'], data['INTST'] * ratio)
     data['e00400'] = data['INTST'] - data['e00300']

(3) With this code-fix, slope = 0.068 and ratio = 0.46 generate in CPS data for 2015 the target statistics: prob = 0.887 and frac = 0.817.

@andersonfrailey, see if you can replicate these results on your development branch.

Here is the script (tally-interest.py) I wrote to tabulate the target statistics from the CPS file generated by finalprep.py:

from taxcalc import *

calc = Calculator(policy=Policy(),
                  records=Records.cps_constructor(),
                  verbose=False)
cyr = 2015
calc.advance_to_year(cyr)
calc.calc_all()
raw = calc.dataframe(['s006', 'e00300', 'e00400', 'filer'])
raw['total'] = raw['e00300'] + raw['e00400']
df = raw[(raw['filer'] == 1) & (raw['total'] > 0)].copy()
wght = df['s006']
total = df['e00300'] + df['e00400']
taxable = df['e00300']
nontaxable = df['e00400']
assert np.all(total>=taxable)
assert np.all(total>=nontaxable)
assert np.allclose(total, taxable+nontaxable)
sum_total = total.sum()
sum_taxable = taxable.sum()
sum_nontaxable = nontaxable.sum()
prob = wght[taxable==total].sum() / wght.sum()
print 'prob= {:.3f}'.format(prob)
print 'frac= {:.3f}'.format(sum_taxable/sum_total)

And here is the workflow for each imputation iteration:

$ python finalprep.py ; cp cps.csv.gz ../../tax-calculator/taxcalc ; pushd ../../tax-calculator ; python tally-interest.py ; popd

andersonfrailey · 2018-03-09T14:26:47Z

Thanks for finding the bug, @martinholmer! I was able to recreate your probability, but for frac I got 0.617, rather than 0.817. Was 0.817 a typo? In your comment with the new targets you posted that the target frac was 0.617.

martinholmer · 2018-03-09T15:45:41Z

Yes, @andersonfrailey , that was a typo. Sorry about the confusion.

andersonfrailey · 2018-03-09T15:49:47Z

Great. In that case I'll get this updated with your new logic and merged.

randomly determine taxable pensions

336eb71

new methodology

b819bdf

slope method

53c2280

martinholmer changed the title ~~Fix Taxable Pensions - CPS File~~ Fix CPS taxable pensions, qualified dividends, and taxable/non-taxable interest Mar 7, 2018

merge master

91d408b

martinholmer mentioned this pull request Mar 8, 2018

Fix AGI concept used in EITC phase-out logic PSLmodels/Tax-Calculator#1907

Merged

fix interest income logic

f16482b

andersonfrailey merged commit 3a485ee into PSLmodels:master Mar 9, 2018

martinholmer mentioned this pull request Mar 9, 2018

Use new CPS data files with three new benefits and improved imputations PSLmodels/Tax-Calculator#1911

Merged

This was referenced Mar 19, 2018

Targets for CPS imputation of taxable pension income, e01700 #159

Closed

Targets for CPS imputation of qualified dividends, e00650 #166

Closed

Targets for CPS imputation of taxable interest, e00300 #167

Closed

andersonfrailey deleted the assignpensions branch June 13, 2020 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CPS taxable pensions, qualified dividends, and taxable/non-taxable interest #165

Fix CPS taxable pensions, qualified dividends, and taxable/non-taxable interest #165

andersonfrailey commented Mar 3, 2018 •

edited by martinholmer

Loading

martinholmer commented Mar 5, 2018

andersonfrailey commented Mar 5, 2018

martinholmer commented Mar 6, 2018

andersonfrailey commented Mar 7, 2018

andersonfrailey commented Mar 7, 2018

martinholmer commented Mar 7, 2018 •

edited

Loading

andersonfrailey commented Mar 7, 2018

martinholmer commented Mar 7, 2018

martinholmer commented Mar 8, 2018 •

edited

Loading

andersonfrailey commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 9, 2018 •

edited

Loading

andersonfrailey commented Mar 9, 2018

martinholmer commented Mar 9, 2018

andersonfrailey commented Mar 9, 2018

Fix CPS taxable pensions, qualified dividends, and taxable/non-taxable interest #165

Fix CPS taxable pensions, qualified dividends, and taxable/non-taxable interest #165

Conversation

andersonfrailey commented Mar 3, 2018 • edited by martinholmer Loading

martinholmer commented Mar 5, 2018

andersonfrailey commented Mar 5, 2018

martinholmer commented Mar 6, 2018

andersonfrailey commented Mar 7, 2018

andersonfrailey commented Mar 7, 2018

martinholmer commented Mar 7, 2018 • edited Loading

andersonfrailey commented Mar 7, 2018

martinholmer commented Mar 7, 2018

martinholmer commented Mar 8, 2018 • edited Loading

andersonfrailey commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 8, 2018

andersonfrailey commented Mar 8, 2018

martinholmer commented Mar 9, 2018 • edited Loading

andersonfrailey commented Mar 9, 2018

martinholmer commented Mar 9, 2018

andersonfrailey commented Mar 9, 2018

andersonfrailey commented Mar 3, 2018 •

edited by martinholmer

Loading

martinholmer commented Mar 7, 2018 •

edited

Loading

martinholmer commented Mar 8, 2018 •

edited

Loading

martinholmer commented Mar 9, 2018 •

edited

Loading