Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAXSIM-32 validation update #2453

Closed
wants to merge 48 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
ea8ce3e
initial commit
chusloj Aug 6, 2020
02b3388
update taxsim input file for taxsim32
chusloj Aug 6, 2020
2315751
updated input file for taxsim32 and generated output files
chusloj Aug 11, 2020
7df42fb
Merge branch 'master' into taxsim_update
chusloj Aug 11, 2020
302ffd8
Merge branch 'master' into taxsim_update
chusloj Aug 12, 2020
1ae8ff9
add input_setup.py
chusloj Aug 12, 2020
93b3f31
updated taxsim32
chusloj Aug 12, 2020
e8ef612
updated README
chusloj Aug 12, 2020
2b7e737
updated input files
chusloj Aug 13, 2020
9b7bc86
updated .py files to accommodate TAXSIM-32 parameters
chusloj Aug 13, 2020
981db18
simplified input_setup,py
chusloj Aug 13, 2020
dcfb40c
finalized input variables and input_setup.py
chusloj Aug 17, 2020
11aa280
updated taxsim_input.py
chusloj Aug 17, 2020
98e4bba
fixed PEP8 errors and updated tests_32.sh
chusloj Aug 17, 2020
e69e752
fixed bugs caused by PEP8 compliance
chusloj Aug 17, 2020
9cfc6fd
upated scripts & input files to use 2018/2019
chusloj Aug 17, 2020
c22e8d8
Merge branch 'master' into taxsim_update
chusloj Aug 25, 2020
4903013
updated variable mapping
chusloj Sep 10, 2020
a5d9dbb
added curl to environment.yml
chusloj Sep 10, 2020
8cca211
Merge remote-tracking branch 'upstream/master' into taxsim_update
chusloj Jan 5, 2021
8aeafcd
Merge remote-tracking branch 'upstream/master' into taxsim_update
chusloj Jan 11, 2021
b7d031c
Merge branch 'master' into taxsim_update
chusloj Jan 11, 2021
3a66f03
address merge conflict
chusloj Jan 12, 2021
f508ea6
update max vals for QBI vars and calc for SSTB
chusloj Jan 13, 2021
fb09769
update calc for e00900 in taxcalc prep file
chusloj Jan 13, 2021
8c680b8
include 2019 expect file and archive 2017 expect files
chusloj Jan 13, 2021
5c3cba7
python diff file with correct # of diff records
chusloj Jan 15, 2021
05ce15a
max function now uses generator object
chusloj Jan 15, 2021
a7c9ec9
uncommented lines
chusloj Jan 15, 2021
3cf5d11
update validation processing scripts to python
chusloj Jan 21, 2021
88fc1fc
specify diff df assumption set and year at runtime
chusloj Jan 21, 2021
551d58e
add conda package to enable colored font
chusloj Jan 25, 2021
218617e
address merge conflicts
chusloj Feb 25, 2021
d1645ae
remove old .expect files
chusloj Feb 25, 2021
2d8414c
update reform file to include QBI switch
chusloj Feb 25, 2021
7952457
remove colored text
chusloj Feb 25, 2021
9740041
small changes to runner file
chusloj Feb 25, 2021
6125e08
address merge conflicts
chusloj Feb 26, 2021
129744c
updated packages in environment
chusloj Feb 26, 2021
7493681
remove steps to delete taxsim header and zip files
chusloj Feb 26, 2021
4c07145
add taxsim_emulation file
chusloj Feb 26, 2021
38bcfc2
csv expect files + put old expect files in new dir
chusloj Feb 26, 2021
657b8c8
update readme
chusloj Feb 26, 2021
f0e7bfe
remove old bash scripts
chusloj Feb 26, 2021
fc2d011
updated formatting for main comparison files
chusloj Feb 26, 2021
a23d8d9
edit taxsim emulation
chusloj Feb 26, 2021
fd038bd
stop rerun of test if TAXSIM32 files exist
chusloj Feb 26, 2021
3ef21a4
address merge conflicts
chusloj Apr 27, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions conda.recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ requirements:
- numba
- "paramtools>=0.18.0"
- aiohttp
- curl

run:
- python
Expand All @@ -26,6 +27,7 @@ requirements:
- numba
- "paramtools>=0.18.0"
- aiohttp
- curl

test:
commands:
Expand Down
3 changes: 2 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ channels:
- conda-forge
dependencies:
- python
- curl
- "numpy>=1.14"
- "pandas>=1.2.0"
- "bokeh>=1.4.0"
Expand All @@ -20,4 +21,4 @@ dependencies:
- pip
- pip:
- jupyter-book
- pytest_harvest
- pytest_harvest
2 changes: 1 addition & 1 deletion taxcalc/validation/taxsim27/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,4 +136,4 @@ different.)

Validation results using the then current-version of TAXSIM-27 on these dates:
1. 2019-03-30 : same results except for 327 itax diffs with largest being $13.00
2. 2019-06-05 : same results (other dependent credit now included in ovar 22)
2. 2019-06-05 : same results (other dependent credit now included in ovar 22)
Binary file removed taxcalc/validation/taxsim27/output-taxsim.zip
Binary file not shown.
133 changes: 133 additions & 0 deletions taxcalc/validation/taxsim32/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
Validation of Tax-Calculator against Internet TAXSIM-32
=======================================================

The general cross-model validation process described
[here](https://github.com/PSLmodels/Tax-Calculator/blob/master/taxcalc/validation/README.md#validation-of-tax-calculator-logic)
is being executed in this directory using
[TAXSIM-32](https://users.nber.org/~taxsim/taxsim27/).

We are in the process of comparing Tax-Calculator and TAXSIM-32
results generated from several assumption sets in the `taxsim_input.py`
script for years beginning with 2018. Each INPUT file is
used to generate a TAXSIM-32 OUTPUT file by uploading it to the
TAXSIM-32 website and requesting detailed intermediate calculations.
And each INPUT file is translated into a CSV-formatted input file that
is read by the Tax-Calculator `tc` CLI tool to generate output that is
then transformed into an OUTPUT file having the TAXSIM-32 format.
Finally, these two OUTPUT files are compared using the `main_comparison.py`
script. See the `tests_32.py` script in this directory for more details.

The following results are for INPUT files containing 100,000
randomly-generated filing units for a given year. The random sampling
is such that a different sample is drawn for each year. In each INPUT
file three state-related variables are set to zero for every filing
unit, one variable specifies the year, and another specifies a filing
unit id, which leaves twenty-two input variables that are set to
randomly-generated values.

In order to handle known differences in assumptions between the two
models, we use the `taxsim_emulation.json` "reform" file to make
Tax-Calculator operate like TAXSIM-32. See the
[`taxsim_emulation.json`
file](https://github.com/PSLmodels/Tax-Calculator/blob/master/taxcalc/validation/taxsim32/taxsim_emulation.json)
for details.

In the following results, when we say "same results" we mean that the
federal individual income tax liabilities and payroll tax liabilities
being compared have differences of no larger than one cent.

For information on the variable names illustrated in `taxsim_input.py`,
the document that generates data for input into TAXSIM-32, see the TAXSIM-32 website listed above.


Instructions
------------------
1. Navigate to `taxcalc/validation/taxsim32` and run the Python script `tests_32.py`.
2. If you would like to generate new input files and and get new files from TAXSIM-32,
just delete all of the `.in.out-taxsim` files. On Mac/Linux, this can be done with
`rm -f *.in.out-taxsim`.


Troubleshooting
------------------
If the TAXSIM-32 validation code throws errors such as `.in files not found`,
`.out files not found` or that any parameter within `policy_current_law.json`
does not exist, please try these 2 steps:

1. Make sure that the `taxcalc` conda package is installed
2. If you have Tax-Calculator downloaded locally, navigate to the root directory
and run `pip install -e .` This will install the current source code into the `taxcalc`
CLI.


Validation Results
------------------

**a18 ASSUMPTION SET**:

2018 INPUT file that specifies the first twelve of the TAXSIM-32
input variables, which include demographic variables and labor income,
but sets to zero all the TAXSIM-32 input variables numbered from 13
through 27.

Validation results using the then current-version of TAXSIM-32 on these dates:

**b18 ASSUMPTION SET**:

2018 INPUT file that specifies the first twenty-one of the TAXSIM-32
input variables, which include demographic variables, labor income,
capital income, and federally-taxable benefits, but set to zero all
the other six TAXSIM-32 input variables except variables 28-32,
which are the variables representing the new QBI-related variables.
Two of those six are always set to zero because they specify transfer income
that is not taxed under the federal income tax or because they specify rent paid that
does not affect federal income tax liability. Three of the remaining
four input variables are itemized expense amounts and the fourth is
child-care expenses.

Validation results using the then current-version of TAXSIM-32 on these dates:

**c18 ASSUMPTION SET**:

2018 INPUT file that specifies all the non-state TAXSIM-32 input
variables to be randomly generated values.

Validation results using the then current-version of TAXSIM-32 on these dates:

**a19 ASSUMPTION SET**:

2019 INPUT file that specifies the first twelve of the TAXSIM-32
input variables, which include demographic variables and labor income,
but sets to zero all the TAXSIM-32 input variables numbered from 13
through 27. (This is the same logic as used to generate the **a17**
sample except that a different stream of random numbers is used so that
the 100,000 filing units are completely different.)

Validation results using the then current-version of TAXSIM-32 on these dates:

**b19 ASSUMPTION SET**:

2019 INPUT file that specifies the first twenty-one of the TAXSIM-32
input variables, which include demographic variables, labor income,
capital income, and federally-taxable benefits, but set to zero all
the other six TAXSIM-32 input variables except variables 28-32,
which are the variables representing the new QBI-related variables.
Two of those six are always set to zero because they specify transfer income
that is not taxed under the federal income tax or because they specify rent paid that
does not affect federal income tax liability. Three of the remaining
four input variables are itemized expense amounts and the fourth is
child-care expenses. (This is the same logic as used to generate the
**b17** sample except that a different stream of random numbers is
used so that the 100,000 filing units are completely different.)

Validation results using the then current-version of TAXSIM-32 on these dates:

**c19 ASSUMPTION SET**:

2019 INPUT file that specifies all the non-state TAXSIM-32 input
variables to be randomly generated values. (This is the same logic as
used to generate the **c17** sample except that a different stream of
random numbers is used so that the 100,000 filing units are completely
different.)

Validation results using the then current-version of TAXSIM-32 on these dates:
26 changes: 26 additions & 0 deletions taxcalc/validation/taxsim32/a18-taxdiffs-expect.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
,# of differing records,max_diff,max_diff_index,max_diff_taxsim_val,max_diff_taxcalc_val
fiitax,49,-0.00999999999999801,12135,-20.98,-20.99
siitax,0,0.0,no diff,no diff,no diff
fica,0,0.0,no diff,no diff,no diff
frate,0,0.0,no diff,no diff,no diff
srate,0,0.0,no diff,no diff,no diff
ficar,124,0.8999999999999999,172,2.9,3.8
v10,0,0.0,no diff,no diff,no diff
v11,0,0.0,no diff,no diff,no diff
v12,0,0.0,no diff,no diff,no diff
v13,100000,-26600.0,12,26600.0,0.0
v14,0,0.0,no diff,no diff,no diff
v15,0,0.0,no diff,no diff,no diff
v16,0,0.0,no diff,no diff,no diff
v17,0,0.0,no diff,no diff,no diff
v18,0,0.0,no diff,no diff,no diff
v19,0,0.0,no diff,no diff,no diff
v20,0,0.0,no diff,no diff,no diff
v21,0,0.0,no diff,no diff,no diff
v22,0,0.0,no diff,no diff,no diff
v23,0,0.0,no diff,no diff,no diff
v24,0,0.0,no diff,no diff,no diff
v25,46,0.00999999999999801,12135,20.98,20.99
v26,0,0.0,no diff,no diff,no diff
v27,0,0.0,no diff,no diff,no diff
v28,0,0.0,no diff,no diff,no diff
26 changes: 26 additions & 0 deletions taxcalc/validation/taxsim32/a19-taxdiffs-expect.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
,# of differing records,max_diff,max_diff_index,max_diff_taxsim_val,max_diff_taxcalc_val
fiitax,71,-0.010000000000005116,12332,-119.72,-119.73
siitax,0,0.0,no diff,no diff,no diff
fica,0,0.0,no diff,no diff,no diff
frate,0,0.0,no diff,no diff,no diff
srate,0,0.0,no diff,no diff,no diff
ficar,119,0.8999999999999999,2226,2.9,3.8
v10,0,0.0,no diff,no diff,no diff
v11,0,0.0,no diff,no diff,no diff
v12,0,0.0,no diff,no diff,no diff
v13,100000,-27000.0,9,27000.0,0.0
v14,0,0.0,no diff,no diff,no diff
v15,0,0.0,no diff,no diff,no diff
v16,0,0.0,no diff,no diff,no diff
v17,0,0.0,no diff,no diff,no diff
v18,0,0.0,no diff,no diff,no diff
v19,0,0.0,no diff,no diff,no diff
v20,0,0.0,no diff,no diff,no diff
v21,0,0.0,no diff,no diff,no diff
v22,0,0.0,no diff,no diff,no diff
v23,0,0.0,no diff,no diff,no diff
v24,0,0.0,no diff,no diff,no diff
v25,71,0.010000000000005116,4164,119.72,119.73
v26,0,0.0,no diff,no diff,no diff
v27,0,0.0,no diff,no diff,no diff
v28,0,0.0,no diff,no diff,no diff
26 changes: 26 additions & 0 deletions taxcalc/validation/taxsim32/b18-taxdiffs-expect.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
,# of differing records,max_diff,max_diff_index,max_diff_taxsim_val,max_diff_taxcalc_val
fiitax,100000,-130262.98999999999,75368,306837.54,176574.55
siitax,0,0.0,no diff,no diff,no diff
fica,67638,-0.3000000000029104,14298,38376.98,38376.68
frate,46049,-55.0,91965,40.0,-15.0
srate,0,0.0,no diff,no diff,no diff
ficar,0,0.0,no diff,no diff,no diff
v10,99931,-350001.9800000002,99037,1402920.62,1052918.64
v11,0,0.0,no diff,no diff,no diff
v12,5,-10980.150000000001,86601,39100.0,28119.85
v13,100000,-26600.0,12,26600.0,0.0
v14,0,0.0,no diff,no diff,no diff
v15,0,0.0,no diff,no diff,no diff
v16,0,0.0,no diff,no diff,no diff
v17,0,0.0,no diff,no diff,no diff
v18,100000,-402620.99,27455,653482.24,250861.25
v19,100000,-134001.72999999998,27455,181167.43,47165.7
v20,0,0.0,no diff,no diff,no diff
v21,0,0.0,no diff,no diff,no diff
v22,9067,8500.0,888,0.0,8500.0
v23,8,2157.88,42055,0.0,2157.88
v24,0,0.0,no diff,no diff,no diff
v25,0,0.0,no diff,no diff,no diff
v26,100000,-350000.0,44030,685450.0,335450.0
v27,5513,13628.83,99719,0.0,13628.83
v28,100000,-130941.73,27455,178107.43,47165.7
26 changes: 26 additions & 0 deletions taxcalc/validation/taxsim32/b19-taxdiffs-expect.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
,# of differing records,max_diff,max_diff_index,max_diff_taxsim_val,max_diff_taxcalc_val
fiitax,100000,-138933.06,58416,284963.75,146030.69
siitax,0,0.0,no diff,no diff,no diff
fica,67719,-0.3000000000029104,21816,39016.15,39015.85
frate,48029,-64.44,82573,49.44,-15.0
srate,0,0.0,no diff,no diff,no diff
ficar,0,0.0,no diff,no diff,no diff
v10,99999,-380001.98,99411,1619677.51,1239675.53
v11,0,0.0,no diff,no diff,no diff
v12,13,-24155.15,52779,46750.0,22594.85
v13,100000,-27000.0,9,27000.0,0.0
v14,0,0.0,no diff,no diff,no diff
v15,0,0.0,no diff,no diff,no diff
v16,0,0.0,no diff,no diff,no diff
v17,0,0.0,no diff,no diff,no diff
v18,100000,-411751.91000000003,34865,645007.02,233255.11
v19,100000,-140755.06,58416,286785.75,146030.69
v20,0,0.0,no diff,no diff,no diff
v21,0,0.0,no diff,no diff,no diff
v22,10340,8500.0,2256,0.0,8500.0
v23,28,3397.83,82573,0.0,3397.83
v24,0,0.0,no diff,no diff,no diff
v25,0,0.0,no diff,no diff,no diff
v26,100000,-378605.36,56063,753678.58,375073.22
v27,6590,16327.24,37042,0.0,16327.24
v28,100000,-137185.06,58416,283215.75,146030.69
26 changes: 26 additions & 0 deletions taxcalc/validation/taxsim32/c18-taxdiffs-expect.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
,# of differing records,max_diff,max_diff_index,max_diff_taxsim_val,max_diff_taxcalc_val
fiitax,100000,-132150.84999999998,90016,150067.43,17916.58
siitax,0,0.0,no diff,no diff,no diff
fica,67746,-0.2900000000008731,573,18578.54,18578.25
frate,46631,56.85,12070,36.74,93.59
srate,0,0.0,no diff,no diff,no diff
ficar,0,0.0,no diff,no diff,no diff
v10,99919,-350001.9800000002,69402,1546224.37,1196222.39
v11,0,0.0,no diff,no diff,no diff
v12,1,-11650.0,9032,27200.0,15550.0
v13,32064,-26600.0,12,26600.0,0.0
v14,0,0.0,no diff,no diff,no diff
v15,0,0.0,no diff,no diff,no diff
v16,0,0.0,no diff,no diff,no diff
v17,1414,26000.0,661,0.0,26000.0
v18,100000,-393130.3,81574,671568.48,278438.18
v19,100000,-133445.18,81574,187859.34,54414.16
v20,0,0.0,no diff,no diff,no diff
v21,0,0.0,no diff,no diff,no diff
v22,9041,8500.0,4303,0.0,8500.0
v23,34,3799.6,73248,0.0,3799.6
v24,16,-600.19,9032,600.19,0.0
v25,0,0.0,no diff,no diff,no diff
v26,100000,-355450.88999999996,17060,570001.09,214550.2
v27,3942,10623.65,14603,0.0,10623.65
v28,100000,-131575.16999999998,81574,185989.34,54414.17
26 changes: 26 additions & 0 deletions taxcalc/validation/taxsim32/c19-taxdiffs-expect.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
,# of differing records,max_diff,max_diff_index,max_diff_taxsim_val,max_diff_taxcalc_val
fiitax,100000,-141609.81999999998,71749,164446.08,22836.26
siitax,0,0.0,no diff,no diff,no diff
fica,67822,-0.3000000000029104,48837,38263.66,38263.36
frate,48751,-75.15,69110,60.15,-15.0
srate,0,0.0,no diff,no diff,no diff
ficar,0,0.0,no diff,no diff,no diff
v10,99997,-380001.98,43179,1390514.98,1010513.0
v11,0,0.0,no diff,no diff,no diff
v12,11,-15775.25,19813,31450.0,15674.75
v13,32257,-27000.0,9,27000.0,0.0
v14,0,0.0,no diff,no diff,no diff
v15,0,0.0,no diff,no diff,no diff
v16,0,0.0,no diff,no diff,no diff
v17,1834,27000.0,595,0.0,27000.0
v18,100000,-411471.92000000004,24420,683044.92,271573.0
v19,100000,-142118.92,57949,308485.83,166366.91
v20,0,0.0,no diff,no diff,no diff
v21,0,0.0,no diff,no diff,no diff
v22,10166,8500.0,3265,0.0,8500.0
v23,78,4640.22,36893,0.0,4640.22
v24,12,-820.96,30022,1200.0,379.04
v25,0,0.0,no diff,no diff,no diff
v26,100000,-376000.93,71828,506700.93,130700.0
v27,4416,10468.79,27286,0.0,10468.79
v28,100000,-138202.08,75423,290379.91,152177.83
62 changes: 62 additions & 0 deletions taxcalc/validation/taxsim32/input_setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
"""
Generates TAXSIM-32 `.in` input files, downloads `.in.out-taxsim` output files,
prepares files for Tax Calculator and zips them
"""
import pandas as pd
import os
import glob
from zipfile import ZipFile

# requires curl
def get_inputs():
"""
Runs taxsim_input.py for all combinations of year and assumption sets
"""
letters = ["a", "b", "c"]
years = ["2018", "2019"]

name_list = [str(y + " " + x) for x in letters for y in years]

for name in name_list:
command = str("python taxsim_input.py " + name)
os.system(command)


def get_ftp_output():
"""
Uses `curl` to upload assumption set input files
and save taxsim-32 output files
"""
letters = ["a", "b", "c"]
years = ["18", "19"]
file_list = [str(x + y + ".in") for x in letters for y in years]

for f in file_list:
file_out = f + ".out-taxsim"
os.system(f"curl -u taxsim:02138 -T {f} ftp://taxsimftp.nber.org/tmp/userid")
c_out = str(
"curl -u taxsim:02138 "
+ "ftp://taxsimftp.nber.org/tmp/userid.txm32 -o "
+ file_out
)
os.system(c_out)


def change_delim():
for file in glob.glob("*.in.out-taxsim"):
# Read in the file
with open(file, "r") as fin:
filedata = fin.read()

# Replace the target string
filedata = filedata.replace(",", " ")

# Write the file out again
with open(file, "w") as fout:
fout.write(filedata)


def main():
get_inputs()
get_ftp_output()
change_delim()
Loading