Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CPS Housing Benefits #1941

Merged
merged 2 commits into from
Mar 30, 2018
Merged

Conversation

andersonfrailey
Copy link
Collaborator

This PR updates the housing benefits in the CPS file after @Amy-Xu found a bug in the imputations. The bug has been fixed in TaxData PR #175, which will be merged in the morning, assuming no one objects.

cc @Amy-Xu @martinholmer @MattHJensen

@martinholmer
Copy link
Collaborator

@andersonfrailey, when I unzip your proposed cps.csv.gz file and list the non-zero values for the first filing unit, I see this:

1 age_head 48
2 age_spouse 45
3 e00200p 36428.0
6 e00200s 5464.2
9 a_lineno 1
14 s006 230.01666667
16 h_seq 2
17 ffpos 1
18 fips 23
23 n21 2
33 XTOT 2
34 filer 1
35 FLPDYR 2014
36 MARS 2
43 e19200 6217.626973
44 e18500 1939.2896277
46 e17500 2805.8533049
47 RECID 1
48 e18400 2302.8235294
49 e00200 41892.2
53 e00300 14.2194133758
56 e19800 4765.24274236
57 e20100 1045.10774154
66 agi_bin 8

I don't understand why there is such excessive precision in many of the variable values.
If you want to show amounts down to the penny, fine. But values like 56 e19800 4765.24274236 are unnecessarily precise and just bloat the size of the cps.csv.gz file and the size of the taxcalc packages.
Remember that IRS allows rounding to the nearest dollar. We want amounts to the nearest penny for variables that have taxpayer/spouse splits (like e00200), but the other variables could be integers

Use your own judgement when fixing this.

@feenberg
Copy link
Contributor

feenberg commented Mar 30, 2018 via email

@martinholmer
Copy link
Collaborator

@feenberg said:

I don't know what happens in the calculator, but if data values are
rounded to integers, the calculation needs to be at full precision so that
marginal tax rate calculations by finite differences are accurate.

When data are aged, it should be on the unrounded values.

Of course. All the variables you're talking about are 64-bit floating-point numbers inside Tax-Calculator.
My comment is about the format of the data input files.

@andersonfrailey
Copy link
Collaborator Author

@martinholmer, what you've said makes sense. I'll round everything to the nearest dollar, like the PUF, and add the new file to this PR and the related TaxData PR.

@codecov-io
Copy link

codecov-io commented Mar 30, 2018

Codecov Report

Merging #1941 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1941   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          38      38           
  Lines        3629    3726   +97     
======================================
+ Hits         3629    3726   +97
Impacted Files Coverage Δ
taxcalc/utils.py 100% <0%> (ø) ⬆️
taxcalc/taxcalcio.py 100% <0%> (ø) ⬆️
taxcalc/calculate.py 100% <0%> (ø) ⬆️
taxcalc/growfactors.py 100% <0%> (ø) ⬆️
taxcalc/records.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1046c2c...36db3c2. Read the comment docs.

@martinholmer
Copy link
Collaborator

@andersonfrailey, Thanks for the more compact cps.csv.gz file. The file is now 9.3MB smaller than before, which is a 38 percent reduction in byte size. This size reduction is large relative to the total size of the last taxcalc package (for release 0.17.0), which was 47.4MB.

@martinholmer martinholmer merged commit 711abbd into PSLmodels:master Mar 30, 2018
@andersonfrailey andersonfrailey deleted the housingcps branch April 16, 2019 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants