TAXSIM-32 validation update #2453

chusloj · 2020-08-11T19:15:43Z

This PR updates the validation code for TAXSIM-27 to use the new TAXSIM-32 model.

chusloj · 2020-08-17T16:17:26Z

The prep files have been modified to accommodate TAXSIM-32.

Notes:

taxsim_input.py was updated to include the 5 new variables (vars 28-32) from TAXSIM-32
prepare_taxcalc_input.py only had variables 28 (S-corp QBI income) and 30 (SSTB flag) mapped to Tax-Calculator input, but I'd like some input on modifying variables to adjust for QBI deductions. calcfunctions.py shows that QBI deductions are calculated using output variables.

A new shell file for TAXSIM-32 was also created for testing, and input_setup.py was created to automate the FTP retrieval of the TAXSIM-32 output, data cleaning and zipping the files.

For both the 2017 and 2018 assumption sets, set b now includes the SSTB flag for the PT_SSTB_income variable, set c includes that SSTB flag and S-corp QBI income under the e26270 variable. Still thinking about how to incorporate the S-Corp QBI deductions using only the input variables.

Finally, the taxdiff-actual and README files haven't been updated to reflect the status of the diffs or the introduction of new variables into the assumption sets. I'd like to update those after review.

@Peter-Metz

codecov · 2020-08-17T16:42:45Z

Codecov Report

Merging #2453 (3ef21a4) into master (cb0297d) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #2453   +/-   ##
=======================================
  Coverage   98.46%   98.46%           
=======================================
  Files          14       14           
  Lines        2611     2611           
=======================================
  Hits         2571     2571           
  Misses         40       40

Flag	Coverage Δ
unittests	`98.46% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Peter-Metz · 2020-08-17T19:24:27Z

@chusloj thanks a lot of the PR. I think the first step is to lock down the translation from taxsim32 variables (in particular variables 28-32) to Tax-Calculator variables. Here is my first stab (building off what you have already done):

Taxsim32	Tax-Calculator
scorp	e26270 (and PT_SSTB_income=1)
pbusinc	e00900p
sbusinc	e00900s
if (pprofinc + sprofinc)>0	PT_SSTB_income=1, else 0

Maybe @MattHJensen or @kpomerleau have some input?

Taxsim32 definitions:

scorp: Active S-Corp income (is QBI). (Guaranteed S-corp partner profits and limited partner compensation are taxed as wages, not here).
pbusinc: Primary Taxpayer's Qualified Business Income (QBI) subject to a preferential rate without phaseout. Subject to SECA and Medicare additional Earnings Tax.
pprofinc: Primary Taxpayer's Specialized Service Trade or Business service (SSTB) with a preferential rate subject to claw-back. Subject to SECA and Medicare Additional Earnings Tax.
sbusinc: Spouse's QBI. Must be zero for non-joint returns.
sprofinc: Spouse's SSTB. Must be zero for non-joint returns.

feenberg · 2020-08-17T19:36:10Z

If this is about my online tax calculator, I have some updated instructions at: http://www.nber.org/taxsim/taxsim32/taxsim-ftp.html that describes the ftp, ssh and http interfaces to taxsim32. There is also http://www.nber.org/taxsim/taxsim32/install.html that describes how to install a local version. This may be of interest. Call or write if you have questions or suggestions. Daniel Feenberg 617-863-0343

…

On Mon, 17 Aug 2020, Jacob C wrote: The prep files have been modified to accommodate TAXSIM-32. Notes: * taxsim_input.py was updated to include the 5 new variables (vars 28-32) from TAXSIM-32 * prepare_taxcalc_input.py only had variables 28 (S-corp QBI income) and 30 (SSTB flag) mapped to Tax-Calculator input, but I'd like some input on modifying variables to adjust for QBI deductions. calcfunctions.py shows that QBI deductions are calculated using output variables. A new shell file for TAXSIM-32 was also created for testing, and input_setup.py was created to automate the FTP retrieval of the TAXSIM-32 output, data cleaning and zipping the files. For both the 2017 and 2018 assumption sets, set b now includes the SSTB flag for the PT_SSTB_income variable, and S-corp QBI income under the e26270 variable. Still thinking about how to incorporate the S-Corp QBI deductions using only the input variables. Finally, the taxdiff-actual and README files haven't been updated to reflect the status of the diffs or the introduction of new variables into the assumption sets. I'd like to update those after review. @Peter-Metz ? You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, orunsubscribe.[AB55AVOB7DNQMSEKSUDKZUDSBFJ2NA5CNFSM4P3NKZP2YY3PNVWWK3TUL52HS4 DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFA5U3QI.gif]

MattHJensen · 2020-09-08T11:10:56Z

@Peter-Metz, did your recent Tax-Cruncher work give you any further insights here?

Peter-Metz · 2020-09-08T13:01:14Z

I think the mapping in this comment still makes sense.

My remaining question is how TAXSIM-32 calculates the phase-out of the QBI deduction, which can be dependent on W-2 wages paid by the filer's business and/or the filer's amount of qualified property. Maybe @feenberg has insight?

feenberg · 2020-09-08T14:10:54Z

Taxsim32 assumes sufficient wage and capital income to avoid the phaseout. That is made clear in the documentation only in the last week - we hadn't even thought much about the issue but Martin raised it and I wrote to CBO for their preference. I go this back from Ed Harris:

Hi Dan. I asked a couple of colleagues about the binary choice between assuming 1) no taxpayer is subject to the wage/capital restriction or whether 2) all taxpayers are fully subject to the limitations (i.e, treat them as having zero wage expense and no capital). The consensus here is that underestimate from option #2 would be larger than the overestimate from option #1.

I picked #1 largely because I don't want to add any more variables that almost no user will have information about. Dan

…

On Tue, 8 Sep 2020, Peter Metz wrote: I think the mapping in this comment still makes sense. My remaining question is how TAXSIM-32 calculates the phase-out of the QBI deduction, which can be dependent on W-2 wages paid by the filer's business and/or the filer's amount of qualified property. Maybe @feenberg has insight? ? You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, orunsubscribe.[AB55AVJKZOODGUJSC7U6TILSEYTKXA5CNFSM4P3NKZP2YY3PNVWWK3TUL52HS4DFVRE XG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFEHQJTA.gif]

Peter-Metz · 2020-09-08T18:01:25Z

Thanks very much for the explanation @feenberg.

@MattHJensen -- Relatedly, I think offering both wage/capital options could be a good Tax-Calculator feature. Currently, Tax-Calculator assumes option 2) (i.e. all taxpayers are fully subject to wage and capital limitations). Instead of imputing values for these parameters, we could have a new boolean parameter that allows users to choose between the two options in Dan's comment.

chusloj · 2021-02-26T19:20:05Z

@Peter-Metz,

are we comparing calculated qbid amounts as part of the validation

No, the TAXSIM-32 suite doesn't currently compare calculated qbid amounts (using qbided). I think this is a good addition to the validation suite, but seeing as this PR has been open for quite a while I think it should be merged and then I'll work on qbid amounts in a successive PR.

Also, now that the comparison code is in human-readable Python, it'll be easier for all contributors to chime in.

chusloj · 2021-02-26T19:56:19Z

@MattHJensen All tests passing.

chusloj · 2021-02-26T22:26:32Z

The main time consumer in the validation suite is the FTP file retrieval, so now the file retrieval process will not run during a validation session if any .in.out-taxsim files already exist.

chusloj · 2021-04-27T21:21:58Z

@MattHJensen @jdebacker This PR is up to date with all tests passing and is ready for review & merge.

jdebacker · 2021-07-20T14:56:37Z

@hdoupe I like this addition to provide validation against TaxSim-32. The PR generally looks good to me. Would you mind giving this a thorough review?

jdebacker · 2021-07-20T14:57:00Z

@chusloj You noted this is ready for review. Can I remove the [WIP] in the title of this PR?

chusloj · 2021-07-20T15:41:57Z

@jdebacker Yes, the branch is no longer WIP.

martinholmer · 2021-07-22T13:52:35Z

A few days ago, when @jdebacker closed Tax-Calculator issue #2513,
@donboyd5 suggested the need for an increased commitment to quality
control in the TaxData process that generates data used by
Tax-Calculator. He suggested developing methods for "(a) comparing
TaxData to published and forecasted values, and (b) bringing TaxData
(puf.csv) in line with those [published] values".

This comment echos Don's suggestion with respect to testing the
accuracy of the tax calculation logic in Tax-Calculator. There are
different ways of doing this testing, but one of the most effective is
comparing, record by record, calculated taxes from Tax-Calculator with
calculated taxes for the same sample from NBER's TAXSIM. This sort of
testing has been done for years using an older version called TAXSIM27.

The objective of this pull request is to update that testing
capability to use TAXSIM32. This pull request has been open for
almost a year, but the results I present below suggest that completing
this pull request should have a high priority. Testing Tax-Calculator
against TAXSIM32 is likely to reveal many cases in which the income
tax liability computed by Tax-Calculator differs significantly from
the tax computed by TAXSIM32. I will present the details of one such
comparison below.

Let me make it clear that I have not reviewed the changes proposed in
this pull request. But the information I present below could be used
to assess the effectiveness of the new testing tools being proposed in
this pull request.

DETAILS: What I have done is adapt validation tools that I use in my
own work so that those tools can compare, for the same input sample,
the taxes calculated by the Tax-Calculator 3.2.0 CLI and by the
2021-07-01 version of TAXSIM32. First, I show the summary testing
results, and then, I describe how to download the files needed to
generate the Tax-Calculator and TAXSIM32 results summarized below.

I have used my validation tools to generate a non-representative
sample of 100,000 2019 taxpayers suitable for TAXSIM32 input
(b19.i32.csv). Then I convert that file so that it can be used as
Tax-Calculator input (b19.itc.csv). And I also convert b19.i32.csv
into a file that can be used as input by my own model.

TAXSIM32 output is generated by the following command:

(py37) validation% taxsim32 < b19.i32.csv > b19.o32.csv

And Tax-Calculator output is generate by this command:

(py37) validation% tc b19.itc.csv 2019 --reform taxsim_emulation_tc.json

Here are the summary results from generating output from the two
models and comparing the output from each against the TAXSIM32 output.
(The expected comparison results, when using my model, are in the
b19.diffs-expect file.)

(py37) validation% ./test.sh b19 save
Using local TAXSIM32 executable: Compiled on 07/01/21
BEGIN taxsim execution at Thu Jul 22 08:37:00 EDT 2021
  END taxsim execution at Thu Jul 22 08:37:05 EDT 2021
Validating usita for USI-Tax-Analyzer 2.25.0
BEGIN usita execution at Thu Jul 22 08:37:09 EDT 2021
  END usita execution at Thu Jul 22 08:37:12 EDT 2021
PASSED b19

(py37) validation% cat b19.diffs-expect 
DIFFERENCES FOR OUTVAR ptax=fica  FICA liability
   97490 with 0 cent difference
    2510 with 1 cent difference
DIFFERENCES FOR OUTVAR itax=fitax Federal income tax liability
   98262 with 0 cent difference
    1738 with 1 cent difference

(py37) validation% VALIDATE_TC=1 ./test.sh b19 save
Using existing b19.i32.csv file
Using local TAXSIM32 executable: Compiled on 07/01/21
BEGIN taxsim execution at Thu Jul 22 08:38:05 EDT 2021
  END taxsim execution at Thu Jul 22 08:38:09 EDT 2021
Validating Tax-Calculator 3.2.0 using tc CLI
BEGIN tc execution at Thu Jul 22 08:38:14 EDT 2021
Tax-Calculator startup did not extrapolate your data.
  END tc execution at Thu Jul 22 08:38:46 EDT 2021
Files b19.diffs-actual and b19.diffs-expect differ
FAILED b19

(py37) validation% cat b19.diffs-actual            
DIFFERENCES FOR OUTVAR ptax=fica  FICA liability
   97490 with 0 cent difference
    2510 with 1 cent difference
DIFFERENCES FOR OUTVAR itax=fitax Federal income tax liability
   79645 with 0 cent difference
     753 with 1 cent difference
   19602 with 2+ cent difference
         Record id 87711 has abs(diff)>0.01 with diff= 17143.39
         Record id 37578 has abs(diff)>0.01 with diff= 15881.78
         Record id 42132 has abs(diff)>0.01 with diff= 15537.23
         Record id 51825 has abs(diff)>0.01 with diff= 15480.72
         Record id 91305 has abs(diff)>0.01 with diff= 15443.67
         Record id 70542 has abs(diff)>0.01 with diff= 15372.39
         Record id 4911 has abs(diff)>0.01 with diff= 15135.66
         Record id 98536 has abs(diff)>0.01 with diff= 15098.69
         Record id 33391 has abs(diff)>0.01 with diff= 15024.10

So, Tax-Calculator generates an income tax liability that differs from
that generated by TAXSIM32 in almost 20% of the cases. And many of
the differences are quite large, with some being in the $15,000 to
$17,000 range. This is why I'm suggesting that testing the accuracy
of the tax calculation logic in Tax-Calculator be given a high priority.

Let me say that, even though I have checked my own work several times,
there could be flaws in the method I developed to translate TAXSIM32
input into Tax-Calculator input. Please let me know if you find any
problems in the input data translation after you download and inspect
the b19.i32.csv and b19.itc.csv files. Also, be sure to check that
the taxsim_emulation_tc.json file I used has been correctly specified
for Tax-Calculator.

All three of the files mentioned in the previous paragraph are in a
validation.zip file that can be downloaded by clicking on this
link. Here are the contents of that zip file:

(py37) validation% unzip -v validation.zip
Archive:  validation.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
12416945  Defl:N  2792324  78% 07-22-2021 08:37 c3058081  b19.i32.csv
13497136  Defl:N  3234787  76% 07-22-2021 08:38 a872fa12  b19.itc.csv
    2285  Defl:N      966  58% 07-19-2021 07:52 25cdecd3  taxsim_emulation_tc.json
--------          -------  ---                            -------
25916366          6028077  77%                            3 files

Let me know if you have any problems downloading this zip file.

MattHJensen · 2021-07-29T20:52:51Z

@martinholmer, thanks very much for your note and the effort that you put into the cross-model comparison. We are investigating. If you find anything further, please let us know.

jdebacker · 2022-03-14T17:23:45Z

Closing in favor of #2619

Add TAXSIM-35 Validation: To replace PR #2453

chusloj added 8 commits August 6, 2020 10:10

initial commit

ea8ce3e

update taxsim input file for taxsim32

02b3388

updated input file for taxsim32 and generated output files

2315751

Merge branch 'master' into taxsim_update

7df42fb

Merge branch 'master' into taxsim_update

302ffd8

add input_setup.py

1ae8ff9

updated taxsim32

93b3f31

updated README

e8ef612

chusloj marked this pull request as draft August 13, 2020 14:42

chusloj added 5 commits August 13, 2020 11:31

updated input files

2b7e737

updated .py files to accommodate TAXSIM-32 parameters

9b7bc86

simplified input_setup,py

981db18

finalized input variables and input_setup.py

dcfb40c

updated taxsim_input.py

11aa280

chusloj marked this pull request as ready for review August 17, 2020 16:05

fixed PEP8 errors and updated tests_32.sh

98e4bba

fixed bugs caused by PEP8 compliance

e69e752

upated scripts & input files to use 2018/2019

9cfc6fd

MattHJensen changed the title ~~TAXSIM-32 validation update [WIP]~~ [WIP] TAXSIM-32 validation update Aug 17, 2020

Merge branch 'master' into taxsim_update

c22e8d8

chusloj added 2 commits September 10, 2020 15:08

updated variable mapping

4903013

added curl to environment.yml

a5d9dbb

chusloj added 9 commits February 26, 2021 09:37

address merge conflicts

6125e08

updated packages in environment

129744c

remove steps to delete taxsim header and zip files

7493681

add taxsim_emulation file

4c07145

csv expect files + put old expect files in new dir

38bcfc2

update readme

657b8c8

remove old bash scripts

f0e7bfe

updated formatting for main comparison files

fc2d011

edit taxsim emulation

a23d8d9

stop rerun of test if TAXSIM32 files exist

fd038bd

address merge conflicts

3ef21a4

MattHJensen changed the title ~~TAXSIM-32 validation update~~ [WIP] TAXSIM-32 validation update May 4, 2021

MattHJensen added the help wanted label May 4, 2021

jdebacker requested a review from hdoupe July 20, 2021 14:55

jdebacker changed the title ~~[WIP] TAXSIM-32 validation update~~ TAXSIM-32 validation update Jul 20, 2021

jdebacker added the ready label Jul 20, 2021

jdebacker mentioned this pull request Sep 6, 2021

Add TAXSIM-35 Validation: To replace PR #2453 #2619

Merged

9 tasks

jdebacker closed this Mar 14, 2022

martinholmer mentioned this pull request May 26, 2022

Add test of TaxInc function #2618

Merged

jdebacker added a commit that referenced this pull request May 5, 2023

Merge pull request #2619 from jdebacker/pr-2453

c5a6271

Add TAXSIM-35 Validation: To replace PR #2453

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TAXSIM-32 validation update #2453

TAXSIM-32 validation update #2453

chusloj commented Aug 11, 2020 •

edited

Loading

chusloj commented Aug 17, 2020 •

edited

Loading

codecov bot commented Aug 17, 2020 •

edited

Loading

Peter-Metz commented Aug 17, 2020 •

edited

Loading

feenberg commented Aug 17, 2020 via email

MattHJensen commented Sep 8, 2020

Peter-Metz commented Sep 8, 2020

feenberg commented Sep 8, 2020 via email

Peter-Metz commented Sep 8, 2020

chusloj commented Feb 26, 2021

chusloj commented Feb 26, 2021

chusloj commented Feb 26, 2021 •

edited

Loading

chusloj commented Apr 27, 2021

jdebacker commented Jul 20, 2021

jdebacker commented Jul 20, 2021

chusloj commented Jul 20, 2021

martinholmer commented Jul 22, 2021

MattHJensen commented Jul 29, 2021 •

edited

Loading

jdebacker commented Mar 14, 2022

TAXSIM-32 validation update #2453

TAXSIM-32 validation update #2453

Conversation

chusloj commented Aug 11, 2020 • edited Loading

chusloj commented Aug 17, 2020 • edited Loading

codecov bot commented Aug 17, 2020 • edited Loading

Codecov Report

Peter-Metz commented Aug 17, 2020 • edited Loading

feenberg commented Aug 17, 2020 via email

MattHJensen commented Sep 8, 2020

Peter-Metz commented Sep 8, 2020

feenberg commented Sep 8, 2020 via email

Peter-Metz commented Sep 8, 2020

chusloj commented Feb 26, 2021

chusloj commented Feb 26, 2021

chusloj commented Feb 26, 2021 • edited Loading

chusloj commented Apr 27, 2021

jdebacker commented Jul 20, 2021

jdebacker commented Jul 20, 2021

chusloj commented Jul 20, 2021

martinholmer commented Jul 22, 2021

MattHJensen commented Jul 29, 2021 • edited Loading

jdebacker commented Mar 14, 2022

chusloj commented Aug 11, 2020 •

edited

Loading

chusloj commented Aug 17, 2020 •

edited

Loading

codecov bot commented Aug 17, 2020 •

edited

Loading

Peter-Metz commented Aug 17, 2020 •

edited

Loading

chusloj commented Feb 26, 2021 •

edited

Loading

MattHJensen commented Jul 29, 2021 •

edited

Loading