-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CPS taxable pensions, qualified dividends, and taxable/non-taxable interest #165
Fix CPS taxable pensions, qualified dividends, and taxable/non-taxable interest #165
Conversation
Code looks OK, but does it actually produce the kind of distribution of the e01700-to-e01500 ratio that we are looking for? |
@martinholmer asked:
Yes. Here are the ratios for units with a non-zero value for
|
@andersonfrailey, I guess I didn't make it clear what the taxable-pension imputation procedure should be. |
@martinholmer can you review the methodology in my last commit? I've gotten pensions and qualified dividends to line up nicely, but interest income isn't quite right yet. You can see the results I have so far in this notebook. |
@martinholmer is my last commit more what you had in mind? These are the splits I get:
|
@andersonfrailey reported these taxable vs non-taxable interest income results:
I don't understand the first result
when some filing units have non-taxable interest (namely 11.8 percent). Surely the aggregate taxable-to-total ratio is less than one. We want that first ratio result to be around 0.6 (not 1.0). |
That's rounding. When I divide all of |
@andersonfrailey, can you post here the Python script that tabulates the interest income statistics from the imputed CPS data? Is the |
@andersonfrailey proposed this code in PR #165:
What's the difference between Did you pick How did you arrive at |
Nothing. They're just different ways of referencing the
Yes. I landed at 93 after trial and error.
I just picked it as a starting point. I'm still playing with that variable. |
Yes, it is. Here is the code I used: def grouped_interest_income(df):
df['int_inc'] = df['e00300'] + df['e00400']
subdf = df[(df['filer'] == 1) & (df['int_inc'] > 0)].copy()
taxable_total = (subdf['e00300'] * subdf['s006']).sum() / (subdf['int_inc'] * subdf['s006']).sum()
print(f'Taxable/Total Ratio for all units where Total > 0: {round(taxable_total, 3)}')
all_taxable = subdf['s006'][subdf['int_inc'] == subdf['e00300']].sum()
all_taxable_prob = all_taxable / subdf['s006'].sum()
print(f'Units with all taxable income: {round(all_taxable * 1e-6, 3)}')
print(f'\tProb: {round(all_taxable_prob, 3)}')
zero_taxable = subdf['s006'][subdf['e00300'] == 0.0].sum()
zero_taxable_prob = zero_taxable / subdf['s006'].sum()
print(f'Units with no taxable income: {round(zero_taxable * 1e-6, 3)}')
print(f'\tProb: {round(zero_taxable_prob, 3)}')
some_taxable = subdf['s006'][(subdf['e00300'] != subdf['int_inc']) & (subdf['e00300'] != 0)].sum()
sometaxabledf = subdf[(subdf['e00300'] != subdf['int_inc']) & (subdf['e00300'])]
some_taxable_prob = some_taxable / subdf['s006'].sum()
frac = ((sometaxabledf['s006'] * sometaxabledf['e00300']).sum() /
(sometaxabledf['s006'] * sometaxabledf['int_inc']).sum())
print(f'Units with some taxable interest income: {round(some_taxable * 1e-6, 3)}')
print(f'\tProb: {round(some_taxable_prob, 3)}')
print(f'\tFrac: {round(frac, 3)}') |
@andersonfrailey, when you call this function
where does the |
That is a data frame containing the variables used in the function. The data frame comes from the |
@andersonfrailey said:
Can show us the script in which that Calculator object is create and used? |
Sure, here is a link to the notebook I was working in with all of the code I used. The calculator object is created in the sixth cell and all of the code that uses in occurs under the "Interest Income" heading. |
@andersonfrailey said:
If you set |
Setting the ratio to zero results in this:
Frac is nan because when you divide zero by zero in pandas you get nan |
@andersonfrailey said when he set
So, if 11.8 percent of the filing units do not have fully-taxable interest, and if that group has zero taxable interest (because |
Overall taxable-to-total ratio is actually 0.9998311310914553. The results just show 1.0 because of rounding. My interpretation is that the 11.8 percent of units without fully taxable interest also have very, very little interest. I'll try and tabulate exactly how much each group has and post the results here. |
@andersonfrailey said:
Great! Tabulations would help because those with small total interest income are the one's with a fully-taxable probability close to one. The 11.8 percent should be those with considerable total interest income (because they have the low probability). |
The 7.899 million who either have some or no taxable income, depending on how we define the ratio, have about $22 million in total interest income. |
@andersonfrailey, can you merge the tip of the master branch (with the recent #158 PR) into the open PR #165 branch so that that branch is up-to-date? Thanks. Let me know when you do this and then I'll download the updated PR #165 and work with it until I understand better what's going on. Then we'll talk again. Does this make sense? |
@martinholmer, just merged in the master branch |
@andersonfrailey said:
Thanks, I'll get to work right away. |
Thanks, @martinholmer ! |
@andersonfrailey, I found a bug in
(1) I've posted new PUF target statistics based on tabulations of raw (2) The bug was an incorrect use of the random numbers when imputing taxable interest. The patch for the
(3) With this code-fix, @andersonfrailey, see if you can replicate these results on your development branch. Here is the script (
And here is the workflow for each imputation iteration:
|
Thanks for finding the bug, @martinholmer! I was able to recreate your probability, but for frac I got 0.617, rather than 0.817. Was 0.817 a typo? In your comment with the new targets you posted that the target frac was 0.617. |
Yes, @andersonfrailey , that was a typo. Sorry about the confusion. |
Great. In that case I'll get this updated with your new logic and merged. |
This PR implements the ideas for imputing variables described in issues #159, #166, #167.