Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding an extra step to extraploation/records blowup #1110

Closed
andersonfrailey opened this issue Dec 19, 2016 · 14 comments
Closed

Adding an extra step to extraploation/records blowup #1110

andersonfrailey opened this issue Dec 19, 2016 · 14 comments

Comments

@andersonfrailey
Copy link
Collaborator

To account for changes in the distribution of wages, CBO has an extra step in their blowup process to adjust wages to match their targeted distribution. The process goes something like this:

  • For each year:
    • Multiply the wages of each tax unit by that year's blowup factor
    • Find the sum of wages across all tax units
    • Find adjustment factors so that the top decile hits their wage share target
    • Find the total additional wages going to the top decile after applying adjustment factors
    • Find adjustment factors for all tax units below the top decile based on the amount of additional wages going to the top so that total wages does not change
    • Multiply wages for each tax unit by the appropriate adjustment factor based on percentile ranking

My idea is to essentially add a stage 3 to our extrapolation process that finds these adjustment factors before hand to be read and applied in records.py. In addition to having the current blowup factor applied, each record would be multiplied by an adjustment factor. This would add some runtime to TaxCalc, but if the adjustment factors are computed before hand like the blowup factors are, I believe it would be minimal and given how skewed some of our distribution are (see interest income here) the time tradeoff could be worth it.

There are a couple questions I can think of right off the bat that would need addressing. First, how far off does our distribution need to be to merit adding this additional step? Second, what would the target distribution be for the various sources of income?. In the notebook linked above I compared against SOI data which is broken down by level of AGI, but there could be better options I am not familiar with.

I'd love some feedback on the general idea along with any ideas for implementation you may have.

@martinholmer @feenberg @MattHJensen @codykallen @Amy-Xu @GoFroggyRun

@codykallen
Copy link
Contributor

@andersonfrailey, I think this is a valuable addition to Tax-Calculator, and it is worth the additional runtime. I think this adjustment should (at first) by applied to the most inaccurate distributions: taxable interest, itemized deductions, and perhaps pass-through income. I would prefer to target SOI distributions, but I am open to alternatives.

@andersonfrailey
Copy link
Collaborator Author

Update on this issue:

I have worked out an initial solution to implementing this into both TaxData and TaxCalc. Here is an overview of steps I've taken and some initial observations.

Process

TaxData:

  1. Using IRS-SOI data, find the distribution of interest income by AGI bin (I'm using interest income here for simplicities sake. These steps can be used for any other source of income). This is the target distribution. Up to 2014 it mirrors whatever the distribution of interest income is in the data. From 2014 to 2026 I assume the 2014 distribution holds.
  2. Find the total amount of interest income in each year of the PUF, then using the target distribution for that year find how much of that income should be in each AGI bin. This is the target amount for each bin.
  3. Determine how much income is actually in each of those AGI bins and find the factor that can be applied to each record based on their AGI so that the total will match each AGI bin's target amount.
  4. Each record is assigned a factor and they are aggregated in one CSV file, similar to WEIGHTS.csv, which is read into TaxCalc and applied during the increment year process.

TaxCalc:

  1. In records.py add an adjustment function which applies the adjustment factor to interest income.
  2. Call that function in def increment_year after calling def _blowup.

I assign each record an adjustment factor rather than simply creating a file with a few factors and adding logic to TaxCalc to determine which to apply because it is based on AGI and this way TaxCalc does not lose the ability to advance to a future year without having to calculate AGI for each year in-between. There are two obvious drawbacks to this:

  1. If you look at AGI levels produced by TaxCalc throughout the years, you'll see that it tends to increase, which results in some moving into higher AGI bins. However, this process relies solely on 2009 levels of AGI as found in the PUF before that variable is dropped. Because this bin creep is not accounted for, the distribution may not be perfectly aligned wit the goal distribution when the factors are applied in TaxCalc. If anyone has a solution to this or an idea for a different way to judge distributions I'm open to suggestions.
  2. The additional file does increase the size of the overall TaxCalc package. With only the adjustment factors for one variable it is 54.3MB. I don't know how much of an issue this is, but it's worth noting.

On the other hand, because all the logic for determining which factor is associated with which record is handled outside of TaxCalc, the additional runtime from this step is minimal. As with the first issue, I'm open to suggestions on other ways to implement this.

Observations from initial testing

Total interest income does decrease by a few decimal points:

Year With Adjustment Without Adjustment Difference
2013 100,727,563,812.39 100,727,563,812.50 -0.11
2014 101,836,370,195.36 101,836,370,195.41 -0.05
2015 106,566,991,885.92 106,566,991,885.99 -0.07
2016 111,220,611,949.97 111,220,611,950.07 -0.10
2017 116,196,161,782.86 116,196,161,782.87 -0.01
2018 120,771,446,746.39 120,771,446,746.33 0.06
2019 124,945,022,242.55 124,945,022,242.57 -0.02
2020 129,519,617,132.46 129,519,617,132.51 -0.05
2021 134,736,982,139.63 134,736,982,139.69 -0.06
2022 140,357,307,602.01 140,357,307,602.26 -0.26
2023 146,134,503,079.86 146,134,503,080.12 -0.26
2024 152,315,177,216.31 152,315,177,216.63 -0.32
2025 158,816,070,050.08 158,816,070,050.42 -0.34
2026 165,637,860,591.87 165,637,860,592.15 -0.28

Overall the distribution looks significantly more like that in the SOI data than previously. Total income tax liabilities also increases somewhat while total AGI actually drops a little. These results and a few other observations can be seen here. This notebook uses the weights and blowup factors currently used by TaxCalc. Not those uploaded in PR #1105.

@martinholmer @MattHJensen @codykallen @Amy-Xu

@Amy-Xu
Copy link
Member

Amy-Xu commented Jan 12, 2017

👍

@martinholmer
Copy link
Collaborator

@andersonfrailey proposed in issue #1110:

(1) In records.py add an adjustment function which applies the adjustment factor to interest income.
(2) Call that function in def increment_year after calling def _blowup.

I assign each record an adjustment factor rather than simply creating a file with a few factors and adding logic to TaxCalc to determine which to apply because it is based on AGI and this way TaxCalc does not lose the ability to advance to a future year without having to calculate AGI for each year in-between. There are two obvious drawbacks to this:

... [first drawback] ...

The additional file does increase the size of the overall TaxCalc package. With only the adjustment factors for one variable it is 54.3MB. I don't know how much of an issue this is, but it's worth noting.

Several questions:

(a) How many variables (including interest income used in your example) do you envision adjusting? Which ones?

(b) Wound all those adjustment factors go into the same new file? So that there was only one new file in the taxcalc package.

(c) Why not put the new code that applies these adjustments at the end of the _blowup function? What's the argument for a new function?

@andersonfrailey
Copy link
Collaborator Author

@martinholmer asked:

How many variables (including interest income used in your example) do you envision adjusting? Which ones?

I want to keep this focused on variables where the distribution is significantly off. Interest income is the only one I have set in stone, but @codykallen has told me the distribution of itemized deductions also needs improvement so I'm planning on looking into the individual components of that go into that calculation see if any improvements can be made. I'm open to adding any variables that are deemed necessary by contributors.

Wound all those adjustment factors go into the same new file? So that there was only one new file in the taxcalc package.

Yes. All the factors would go into the same file.

Why not put the new code that applies these adjustments at the end of the _blowup function? What's the argument for a new function?

The new code could be added to the end of the _blowup function. The only reason I created a new one was to make a clear distinction that these were two different steps as the number of variables that gets adjusted gets longer. Other than that there's no need for a new function.

@martinholmer
Copy link
Collaborator

@andersonfrailey, Thanks for the prompt and clear answers to my questions about issue #1110.

@MattHJensen
Copy link
Contributor

@andersonfrailey, does the stage 3 influence how close we are to any stage 2 targets?

@andersonfrailey
Copy link
Collaborator Author

@MattHJensen, can you clarify your question a bit? Are you asking if stage 3 changes aggregate totals of any of the variables targeted in stage 2? If so, it does not. All aggregate totals remain the same (with exception to the slight change in the targeted variables as noted above)

@andersonfrailey
Copy link
Collaborator Author

Looking at each of the components in itemized deductions, it seems the one that is the most unlike its SOI distribution is non-cash contributions (see here). The difference is particularly noticeable for those with AGI above $10M. The other deduction items seem relatively close to their actual distribution so I'm not sure how beneficial adding an adjustment would be.

The notebook does not have comparisons for the medical and casualty or theft loss deductions because the SOI data did not provide totals for each AGI bin for disclosure purposes.

cc: @codykallen

@martinholmer
Copy link
Collaborator

@andersonfrailey, In a notebook you reference in #1110 you read in a file called soi_data.csv. Where is that file? And exactly where in the 2014 SOI tables do you get the taxable interest income totals by AGI stratum?

@codykallen
Copy link
Contributor

@martinholmer, that data can be found here, in the section "All Returns: Sources of Income, Adjustments Deductions and Exemptions, and Tax Items." If you download the 2014 file under that section, taxable interest can be found in column I of the Excel file.

@andersonfrailey
Copy link
Collaborator Author

@martinholmer asked

In a notebook you reference in #1110 you read in a file called soi_data.csv. Where is that file? And exactly where in the 2014 SOI tables do you get the taxable interest income totals by AGI stratum?

soi_data.csv is just a file I made that contains the data @codykallen provided a link to. It is the amount of the various types of income looked at in the notebook in each AGI bin. I can add a download link to it if you want.

@martinholmer
Copy link
Collaborator

@codykallen and @andersonfrailey, Thanks for the pointers to the 2014 IRS-SOI data.

@martinholmer
Copy link
Collaborator

The adjustment_ratios envisioned in issue #1110 have been implemented in pull request #1193.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants