Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAXSIM-32 validation update #2453

Closed
wants to merge 48 commits into from
Closed

Conversation

chusloj
Copy link
Contributor

@chusloj chusloj commented Aug 11, 2020

This PR updates the validation code for TAXSIM-27 to use the new TAXSIM-32 model.

@chusloj chusloj marked this pull request as draft August 13, 2020 14:42
@chusloj chusloj marked this pull request as ready for review August 17, 2020 16:05
@chusloj
Copy link
Contributor Author

chusloj commented Aug 17, 2020

The prep files have been modified to accommodate TAXSIM-32.

Notes:

  • taxsim_input.py was updated to include the 5 new variables (vars 28-32) from TAXSIM-32
  • prepare_taxcalc_input.py only had variables 28 (S-corp QBI income) and 30 (SSTB flag) mapped to Tax-Calculator input, but I'd like some input on modifying variables to adjust for QBI deductions. calcfunctions.py shows that QBI deductions are calculated using output variables.

A new shell file for TAXSIM-32 was also created for testing, and input_setup.py was created to automate the FTP retrieval of the TAXSIM-32 output, data cleaning and zipping the files.

For both the 2017 and 2018 assumption sets, set b now includes the SSTB flag for the PT_SSTB_income variable, set c includes that SSTB flag and S-corp QBI income under the e26270 variable. Still thinking about how to incorporate the S-Corp QBI deductions using only the input variables.

Finally, the taxdiff-actual and README files haven't been updated to reflect the status of the diffs or the introduction of new variables into the assumption sets. I'd like to update those after review.

@Peter-Metz

@codecov
Copy link

codecov bot commented Aug 17, 2020

Codecov Report

Merging #2453 (3ef21a4) into master (cb0297d) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2453   +/-   ##
=======================================
  Coverage   98.46%   98.46%           
=======================================
  Files          14       14           
  Lines        2611     2611           
=======================================
  Hits         2571     2571           
  Misses         40       40           
Flag Coverage Δ
unittests 98.46% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@Peter-Metz
Copy link
Contributor

Peter-Metz commented Aug 17, 2020

@chusloj thanks a lot of the PR. I think the first step is to lock down the translation from taxsim32 variables (in particular variables 28-32) to Tax-Calculator variables. Here is my first stab (building off what you have already done):

Taxsim32 Tax-Calculator
scorp e26270 (and PT_SSTB_income=1)
pbusinc e00900p
sbusinc e00900s
if (pprofinc + sprofinc)>0 PT_SSTB_income=1, else 0

Maybe @MattHJensen or @kpomerleau have some input?

Taxsim32 definitions:

scorp: Active S-Corp income (is QBI). (Guaranteed S-corp partner profits and limited partner compensation are taxed as wages, not here).
pbusinc: Primary Taxpayer's Qualified Business Income (QBI) subject to a preferential rate without phaseout. Subject to SECA and Medicare additional Earnings Tax.
pprofinc: Primary Taxpayer's Specialized Service Trade or Business service (SSTB) with a preferential rate subject to claw-back. Subject to SECA and Medicare Additional Earnings Tax.
sbusinc: Spouse's QBI. Must be zero for non-joint returns.
sprofinc: Spouse's SSTB. Must be zero for non-joint returns.

@feenberg
Copy link
Contributor

feenberg commented Aug 17, 2020 via email

@MattHJensen MattHJensen changed the title TAXSIM-32 validation update [WIP] [WIP] TAXSIM-32 validation update Aug 17, 2020
@MattHJensen
Copy link
Contributor

@Peter-Metz, did your recent Tax-Cruncher work give you any further insights here?

@Peter-Metz
Copy link
Contributor

I think the mapping in this comment still makes sense.

My remaining question is how TAXSIM-32 calculates the phase-out of the QBI deduction, which can be dependent on W-2 wages paid by the filer's business and/or the filer's amount of qualified property. Maybe @feenberg has insight?

@feenberg
Copy link
Contributor

feenberg commented Sep 8, 2020 via email

@Peter-Metz
Copy link
Contributor

Thanks very much for the explanation @feenberg.

@MattHJensen -- Relatedly, I think offering both wage/capital options could be a good Tax-Calculator feature. Currently, Tax-Calculator assumes option 2) (i.e. all taxpayers are fully subject to wage and capital limitations). Instead of imputing values for these parameters, we could have a new boolean parameter that allows users to choose between the two options in Dan's comment.

@chusloj
Copy link
Contributor Author

chusloj commented Feb 26, 2021

@Peter-Metz,

are we comparing calculated qbid amounts as part of the validation

No, the TAXSIM-32 suite doesn't currently compare calculated qbid amounts (using qbided). I think this is a good addition to the validation suite, but seeing as this PR has been open for quite a while I think it should be merged and then I'll work on qbid amounts in a successive PR.

Also, now that the comparison code is in human-readable Python, it'll be easier for all contributors to chime in.

@chusloj
Copy link
Contributor Author

chusloj commented Feb 26, 2021

@MattHJensen All tests passing.

@chusloj
Copy link
Contributor Author

chusloj commented Feb 26, 2021

The main time consumer in the validation suite is the FTP file retrieval, so now the file retrieval process will not run during a validation session if any .in.out-taxsim files already exist.

@chusloj
Copy link
Contributor Author

chusloj commented Apr 27, 2021

@MattHJensen @jdebacker This PR is up to date with all tests passing and is ready for review & merge.

@MattHJensen MattHJensen changed the title TAXSIM-32 validation update [WIP] TAXSIM-32 validation update May 4, 2021
@jdebacker jdebacker requested a review from hdoupe July 20, 2021 14:55
@jdebacker
Copy link
Member

@hdoupe I like this addition to provide validation against TaxSim-32. The PR generally looks good to me. Would you mind giving this a thorough review?

@jdebacker
Copy link
Member

@chusloj You noted this is ready for review. Can I remove the [WIP] in the title of this PR?

@chusloj
Copy link
Contributor Author

chusloj commented Jul 20, 2021

@jdebacker Yes, the branch is no longer WIP.

@jdebacker jdebacker changed the title [WIP] TAXSIM-32 validation update TAXSIM-32 validation update Jul 20, 2021
@martinholmer
Copy link
Collaborator

A few days ago, when @jdebacker closed Tax-Calculator issue #2513,
@donboyd5 suggested the need for an increased commitment to quality
control in the TaxData process that generates data used by
Tax-Calculator. He suggested developing methods for "(a) comparing
TaxData to published and forecasted values, and (b) bringing TaxData
(puf.csv) in line with those [published] values".

This comment echos Don's suggestion with respect to testing the
accuracy of the tax calculation logic in Tax-Calculator. There are
different ways of doing this testing, but one of the most effective is
comparing, record by record, calculated taxes from Tax-Calculator with
calculated taxes for the same sample from NBER's TAXSIM. This sort of
testing has been done for years using an older version called TAXSIM27.

The objective of this pull request is to update that testing
capability to use TAXSIM32. This pull request has been open for
almost a year, but the results I present below suggest that completing
this pull request should have a high priority. Testing Tax-Calculator
against TAXSIM32 is likely to reveal many cases in which the income
tax liability computed by Tax-Calculator differs significantly from
the tax computed by TAXSIM32. I will present the details of one such
comparison below.

Let me make it clear that I have not reviewed the changes proposed in
this pull request. But the information I present below could be used
to assess the effectiveness of the new testing tools being proposed in
this pull request.

DETAILS: What I have done is adapt validation tools that I use in my
own work so that those tools can compare, for the same input sample,
the taxes calculated by the Tax-Calculator 3.2.0 CLI and by the
2021-07-01 version of TAXSIM32. First, I show the summary testing
results, and then, I describe how to download the files needed to
generate the Tax-Calculator and TAXSIM32 results summarized below.

I have used my validation tools to generate a non-representative
sample of 100,000 2019 taxpayers suitable for TAXSIM32 input
(b19.i32.csv). Then I convert that file so that it can be used as
Tax-Calculator input (b19.itc.csv). And I also convert b19.i32.csv
into a file that can be used as input by my own model.

TAXSIM32 output is generated by the following command:

(py37) validation% taxsim32 < b19.i32.csv > b19.o32.csv

And Tax-Calculator output is generate by this command:

(py37) validation% tc b19.itc.csv 2019 --reform taxsim_emulation_tc.json

Here are the summary results from generating output from the two
models and comparing the output from each against the TAXSIM32 output.
(The expected comparison results, when using my model, are in the
b19.diffs-expect file.)

(py37) validation% ./test.sh b19 save
Using local TAXSIM32 executable: Compiled on 07/01/21
BEGIN taxsim execution at Thu Jul 22 08:37:00 EDT 2021
  END taxsim execution at Thu Jul 22 08:37:05 EDT 2021
Validating usita for USI-Tax-Analyzer 2.25.0
BEGIN usita execution at Thu Jul 22 08:37:09 EDT 2021
  END usita execution at Thu Jul 22 08:37:12 EDT 2021
PASSED b19

(py37) validation% cat b19.diffs-expect 
DIFFERENCES FOR OUTVAR ptax=fica  FICA liability
   97490 with 0 cent difference
    2510 with 1 cent difference
DIFFERENCES FOR OUTVAR itax=fitax Federal income tax liability
   98262 with 0 cent difference
    1738 with 1 cent difference

(py37) validation% VALIDATE_TC=1 ./test.sh b19 save
Using existing b19.i32.csv file
Using local TAXSIM32 executable: Compiled on 07/01/21
BEGIN taxsim execution at Thu Jul 22 08:38:05 EDT 2021
  END taxsim execution at Thu Jul 22 08:38:09 EDT 2021
Validating Tax-Calculator 3.2.0 using tc CLI
BEGIN tc execution at Thu Jul 22 08:38:14 EDT 2021
Tax-Calculator startup did not extrapolate your data.
  END tc execution at Thu Jul 22 08:38:46 EDT 2021
Files b19.diffs-actual and b19.diffs-expect differ
FAILED b19

(py37) validation% cat b19.diffs-actual            
DIFFERENCES FOR OUTVAR ptax=fica  FICA liability
   97490 with 0 cent difference
    2510 with 1 cent difference
DIFFERENCES FOR OUTVAR itax=fitax Federal income tax liability
   79645 with 0 cent difference
     753 with 1 cent difference
   19602 with 2+ cent difference
         Record id 87711 has abs(diff)>0.01 with diff= 17143.39
         Record id 37578 has abs(diff)>0.01 with diff= 15881.78
         Record id 42132 has abs(diff)>0.01 with diff= 15537.23
         Record id 51825 has abs(diff)>0.01 with diff= 15480.72
         Record id 91305 has abs(diff)>0.01 with diff= 15443.67
         Record id 70542 has abs(diff)>0.01 with diff= 15372.39
         Record id 4911 has abs(diff)>0.01 with diff= 15135.66
         Record id 98536 has abs(diff)>0.01 with diff= 15098.69
         Record id 33391 has abs(diff)>0.01 with diff= 15024.10

So, Tax-Calculator generates an income tax liability that differs from
that generated by TAXSIM32 in almost 20% of the cases. And many of
the differences are quite large, with some being in the $15,000 to
$17,000 range. This is why I'm suggesting that testing the accuracy
of the tax calculation logic in Tax-Calculator be given a high priority.

Let me say that, even though I have checked my own work several times,
there could be flaws in the method I developed to translate TAXSIM32
input into Tax-Calculator input. Please let me know if you find any
problems in the input data translation after you download and inspect
the b19.i32.csv and b19.itc.csv files. Also, be sure to check that
the taxsim_emulation_tc.json file I used has been correctly specified
for Tax-Calculator.

All three of the files mentioned in the previous paragraph are in a
validation.zip file that can be downloaded by clicking on this
link
. Here are the contents of that zip file:

(py37) validation% unzip -v validation.zip
Archive:  validation.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
12416945  Defl:N  2792324  78% 07-22-2021 08:37 c3058081  b19.i32.csv
13497136  Defl:N  3234787  76% 07-22-2021 08:38 a872fa12  b19.itc.csv
    2285  Defl:N      966  58% 07-19-2021 07:52 25cdecd3  taxsim_emulation_tc.json
--------          -------  ---                            -------
25916366          6028077  77%                            3 files

Let me know if you have any problems downloading this zip file.

@MattHJensen
Copy link
Contributor

MattHJensen commented Jul 29, 2021

@martinholmer, thanks very much for your note and the effort that you put into the cross-model comparison. We are investigating. If you find anything further, please let us know.

@jdebacker
Copy link
Member

Closing in favor of #2619

@jdebacker jdebacker closed this Mar 14, 2022
jdebacker added a commit that referenced this pull request May 5, 2023
Add TAXSIM-35 Validation: To replace PR #2453
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants