Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CPS taxable pensions, qualified dividends, and taxable/non-taxable interest #165

Merged
merged 5 commits into from
Mar 9, 2018

Conversation

andersonfrailey
Copy link
Collaborator

@andersonfrailey andersonfrailey commented Mar 3, 2018

This PR implements the ideas for imputing variables described in issues #159, #166, #167.

@martinholmer
Copy link
Contributor

Code looks OK, but does it actually produce the kind of distribution of the e01700-to-e01500 ratio that we are looking for?

@andersonfrailey
Copy link
Collaborator Author

@martinholmer asked:

does it actually produce the kind of distribution of the e01700-to-e01500 ratio that we are looking for?

Yes. Here are the ratios for units with a non-zero value for e01500

e01700/e01500 CPS Raw PUF
1 0.4766 0.4806
0 0.1804 0.1772
!= 1 and != 0 0.3429 0.3422

@martinholmer
Copy link
Contributor

@andersonfrailey, I guess I didn't make it clear what the taxable-pension imputation procedure should be.
We should talk on the phone this afternoon.

@andersonfrailey
Copy link
Collaborator Author

@martinholmer can you review the methodology in my last commit? I've gotten pensions and qualified dividends to line up nicely, but interest income isn't quite right yet.

You can see the results I have so far in this notebook.

@andersonfrailey
Copy link
Collaborator Author

@martinholmer is my last commit more what you had in mind? These are the splits I get:

Goal(PUF)
Taxable/Total Ratio for all units where Total > 0: 0.599
Units with all taxable income: 48.801
Prob: 0.882
Units with no taxable income: 0.505
Prob: 0.009
Units with some taxable interest income: 6.056
Prob: 0.109
Frac: 0.453

CPS
Taxable/Total Ratio for all units where Total > 0: 1.0
Units with all taxable income: 59.064
Prob: 0.882
Units with no taxable income: 0.0
Prob: 0.0
Units with some taxable interest income: 7.899
Prob: 0.118
Frac: 0.567

@martinholmer martinholmer changed the title Fix Taxable Pensions - CPS File Fix CPS taxable pensions, qualified dividends, and taxable/non-taxable interest Mar 7, 2018
@martinholmer
Copy link
Contributor

martinholmer commented Mar 7, 2018

@andersonfrailey reported these taxable vs non-taxable interest income results:

CPS
Taxable/Total Ratio for all units where Total > 0: 1.0
Units with all taxable income: 59.064
Prob: 0.882
Units with no taxable income: 0.0
Prob: 0.0
Units with some taxable interest income: 7.899
Prob: 0.118
Frac: 0.567

I don't understand the first result

Taxable/Total Ratio for all units where Total > 0: 1.0

when some filing units have non-taxable interest (namely 11.8 percent).

Surely the aggregate taxable-to-total ratio is less than one.

We want that first ratio result to be around 0.6 (not 1.0).

@andersonfrailey
Copy link
Collaborator Author

That's rounding. When I divide all of e00300 by total interest income the result is .9999. My interpretation is that the 88% of all tax units who have only taxable interest income also have almost all of interest income in general. This seems backwards to me given your previous finding that tax units with more interest income tend to have less taxable income, but I'm not sure what the best method for correcting this would be.

@martinholmer
Copy link
Contributor

@andersonfrailey, can you post here the Python script that tabulates the interest income statistics from the imputed CPS data?

Is the cps.csv.gz file that is part of this pull request, the file that you are tabulating?

@martinholmer
Copy link
Contributor

martinholmer commented Mar 8, 2018

@andersonfrailey proposed this code in PR #165:

# Split interest income into taxable and tax exempt
slope = 93
ratio = 0.60
prob = 1. - slope * (data.INTST / 1000)
probs = np.random.random(len(prob))
data['e00300'] = np.where(prob < probs, data.INTST, data.INTST * ratio)
data['e00400'] = data['INTST'] - data['e00300']

What's the difference between data.INTST and `data['INTST']?

Did you pick slope = 93 to get an all-taxable probability in the CPS of about 0.88?

How did you arrive at ratio = 0.60?

@andersonfrailey
Copy link
Collaborator Author

What's the difference between data.INTST and `data['INTST']?

Nothing. They're just different ways of referencing the INTST column in the data frame.

Did you pick slope = 93 to get an all-taxable probability in the CPS of about 0.88?

Yes. I landed at 93 after trial and error.

How did you arrive at ratio = 0.60?

I just picked it as a starting point. I'm still playing with that variable.

@andersonfrailey
Copy link
Collaborator Author

Is the cps.csv.gz file that is part of this pull request, the file that you are tabulating?

Yes, it is.

Here is the code I used:

def grouped_interest_income(df):
    df['int_inc'] = df['e00300'] + df['e00400']
    subdf = df[(df['filer'] == 1) & (df['int_inc'] > 0)].copy()
    taxable_total = (subdf['e00300'] * subdf['s006']).sum() / (subdf['int_inc'] * subdf['s006']).sum()
    print(f'Taxable/Total Ratio for all units where Total > 0: {round(taxable_total, 3)}')
    all_taxable = subdf['s006'][subdf['int_inc'] == subdf['e00300']].sum()
    all_taxable_prob = all_taxable / subdf['s006'].sum()
    print(f'Units with all taxable income: {round(all_taxable * 1e-6, 3)}')
    print(f'\tProb: {round(all_taxable_prob, 3)}')
    zero_taxable = subdf['s006'][subdf['e00300'] == 0.0].sum()
    zero_taxable_prob = zero_taxable / subdf['s006'].sum()
    print(f'Units with no taxable income: {round(zero_taxable * 1e-6, 3)}')
    print(f'\tProb: {round(zero_taxable_prob, 3)}')
    some_taxable = subdf['s006'][(subdf['e00300'] != subdf['int_inc']) & (subdf['e00300'] != 0)].sum()
    sometaxabledf = subdf[(subdf['e00300'] != subdf['int_inc']) & (subdf['e00300'])]
    some_taxable_prob = some_taxable / subdf['s006'].sum()
    frac = ((sometaxabledf['s006'] * sometaxabledf['e00300']).sum() /
            (sometaxabledf['s006'] * sometaxabledf['int_inc']).sum())
    print(f'Units with some taxable interest income: {round(some_taxable * 1e-6, 3)}')
    print(f'\tProb: {round(some_taxable_prob, 3)}')
    print(f'\tFrac: {round(frac, 3)}')

@martinholmer
Copy link
Contributor

@andersonfrailey, when you call this function

def grouped_interest_income(df):

where does the df argument come from?

@andersonfrailey
Copy link
Collaborator Author

where does the df argument come from?

That is a data frame containing the variables used in the function. The data frame comes from the dataframe() method of the calculator class.

@martinholmer
Copy link
Contributor

@andersonfrailey said:

The data frame comes from the dataframe() method of the calculator class.

Can show us the script in which that Calculator object is create and used?

@andersonfrailey
Copy link
Collaborator Author

Sure, here is a link to the notebook I was working in with all of the code I used.

The calculator object is created in the sixth cell and all of the code that uses in occurs under the "Interest Income" heading.

@martinholmer
Copy link
Contributor

@andersonfrailey said:

Did you pick slope = 93 to get an all-taxable probability in the CPS of about 0.88?

Yes. I landed at 93 after trial and error.

How did you arrive at ratio = 0.60?

I just picked it as a starting point. I'm still playing with that variable.

If you set ratio to zero (leaving slope at 93), what do you get in the imputed CPS?

@andersonfrailey
Copy link
Collaborator Author

Setting the ratio to zero results in this:

CPS
Taxable/Total Ratio for all units where Total > 0: 1.0
Units with all taxable income: 59.064
	Prob: 0.882
Units with no taxable income: 7.899
	Prob: 0.118
Units with some taxable interest income: 0.0
	Prob: 0.0
	Frac: nan

Frac is nan because when you divide zero by zero in pandas you get nan

@martinholmer
Copy link
Contributor

@andersonfrailey said when he set slope=93 and frac=0.0 he got these results:

CPS
Taxable/Total Ratio for all units where Total > 0: 1.0
Units with all taxable income: 59.064
	Prob: 0.882

So, if 11.8 percent of the filing units do not have fully-taxable interest, and if that group has zero taxable interest (because frac=0), how can it be that the overall taxable-to-total ration is 1.0? What am I missing?

@andersonfrailey
Copy link
Collaborator Author

So, if 11.8 percent of the filing units do not have fully-taxable interest, and if that group has zero taxable interest (because frac=0), how can it be that the overall taxable-to-total ration is 1.0? What am I missing?

Overall taxable-to-total ratio is actually 0.9998311310914553. The results just show 1.0 because of rounding.

My interpretation is that the 11.8 percent of units without fully taxable interest also have very, very little interest. I'll try and tabulate exactly how much each group has and post the results here.

@martinholmer
Copy link
Contributor

@andersonfrailey said:

My interpretation is that the 11.8 percent of units without fully taxable interest also have very, very little interest. I'll try and tabulate exactly how much each group has and post the results here.

Great! Tabulations would help because those with small total interest income are the one's with a fully-taxable probability close to one. The 11.8 percent should be those with considerable total interest income (because they have the low probability).

@andersonfrailey
Copy link
Collaborator Author

The 7.899 million who either have some or no taxable income, depending on how we define the ratio, have about $22 million in total interest income.

@martinholmer
Copy link
Contributor

@andersonfrailey, can you merge the tip of the master branch (with the recent #158 PR) into the open PR #165 branch so that that branch is up-to-date? Thanks. Let me know when you do this and then I'll download the updated PR #165 and work with it until I understand better what's going on. Then we'll talk again. Does this make sense?

@andersonfrailey
Copy link
Collaborator Author

@martinholmer, just merged in the master branch

@martinholmer
Copy link
Contributor

@andersonfrailey said:

just merged in the master branch

Thanks, I'll get to work right away.

@andersonfrailey
Copy link
Collaborator Author

Thanks, @martinholmer !

@martinholmer
Copy link
Contributor

martinholmer commented Mar 9, 2018

@andersonfrailey, I found a bug in finalprep.py that was causing the weird results when imputing CPS taxable interest income. Let me explain in detail. I want to discuss these topics:

  1. new target statistics to avoid distortion from the puf_ratios.csv adjustments of taxable interest
  2. correction of the interest imputation code in finalprep.py
  3. new imputation parameters

(1) I've posted new PUF target statistics based on tabulations of raw puf.csv data to avoid any possible distortions caused by Tax-Calculator using puf ratios to adjust taxable (but not nontaxable) interest income.

(2) The bug was an incorrect use of the random numbers when imputing taxable interest. The patch for the finalprep.py code is as follows.

     # Split interest income into taxable and tax exempt
-    slope = 93
-    ratio = 0.60
-    prob = 1. - slope * (data.INTST / 1000)
-    probs = np.random.random(len(prob))
-    data['e00300'] = np.where(prob < probs, data.INTST, data.INTST * ratio)  # <== BUG
+    slope = 0.068
+    ratio = 0.46
+    prob = 1. - slope * (data['INTST'] * 1e-3)
+    uniform_rn = np.random.random(len(prob))
+    data['e00300'] = np.where(uniform_rn < prob,
+                              data['INTST'], data['INTST'] * ratio)
     data['e00400'] = data['INTST'] - data['e00300']

(3) With this code-fix, slope = 0.068 and ratio = 0.46 generate in CPS data for 2015 the target statistics: prob = 0.887 and frac = 0.817.

@andersonfrailey, see if you can replicate these results on your development branch.

Here is the script (tally-interest.py) I wrote to tabulate the target statistics from the CPS file generated by finalprep.py:

from taxcalc import *

calc = Calculator(policy=Policy(),
                  records=Records.cps_constructor(),
                  verbose=False)
cyr = 2015
calc.advance_to_year(cyr)
calc.calc_all()
raw = calc.dataframe(['s006', 'e00300', 'e00400', 'filer'])
raw['total'] = raw['e00300'] + raw['e00400']
df = raw[(raw['filer'] == 1) & (raw['total'] > 0)].copy()
wght = df['s006']
total = df['e00300'] + df['e00400']
taxable = df['e00300']
nontaxable = df['e00400']
assert np.all(total>=taxable)
assert np.all(total>=nontaxable)
assert np.allclose(total, taxable+nontaxable)
sum_total = total.sum()
sum_taxable = taxable.sum()
sum_nontaxable = nontaxable.sum()
prob = wght[taxable==total].sum() / wght.sum()
print 'prob= {:.3f}'.format(prob)
print 'frac= {:.3f}'.format(sum_taxable/sum_total)

And here is the workflow for each imputation iteration:

$ python finalprep.py ; cp cps.csv.gz ../../tax-calculator/taxcalc ; pushd ../../tax-calculator ; python tally-interest.py ; popd

@andersonfrailey
Copy link
Collaborator Author

Thanks for finding the bug, @martinholmer! I was able to recreate your probability, but for frac I got 0.617, rather than 0.817. Was 0.817 a typo? In your comment with the new targets you posted that the target frac was 0.617.

@martinholmer
Copy link
Contributor

Yes, @andersonfrailey , that was a typo. Sorry about the confusion.

@andersonfrailey
Copy link
Collaborator Author

Great. In that case I'll get this updated with your new logic and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants