Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impute elective DC pension contributions in PUF data #279

Merged
merged 13 commits into from
Sep 5, 2018
Merged

Impute elective DC pension contributions in PUF data #279

merged 13 commits into from
Sep 5, 2018

Conversation

martinholmer
Copy link
Contributor

@martinholmer martinholmer commented Aug 29, 2018

This pull request does what the title says. The fact that PUF e00200 earnings variables are net of defined-contribution (DC) pension contributions and that payroll taxes are calculated on gross earnings means that payroll tax liability has been under estimated when using the PUF data and that income tax liability uses the correct earnings concept. (The e00200 variables in CPS data are gross earnings, which means that payroll tax liability is correct but income tax liability is over estimated. This pull request does nothing to fix that CPS data problem, although if the imputation procedure used here is well received it could be applied to the CPS data.) All this and other closely related topics were discussed at length in Tax-Calculator issue 1549 (opened on 2017-Sep-14) and before that in Tax-Calculator issue 1156 (opened on 2017-Jan-25).

The amount of the pension contributions in 2011, which is the amount by which PUF earnings used in payroll tax calculations are under estimated, is about $220 billion according to recently published IRS W-2 data tabulations. So, we are talking about imputing a non-trivial amount of "missing" earnings in the PUF data. The same IRS tabulations show almost 47 million individuals (not filing units) making DC pension contributions in 2011. This implies a mean (positive) pension contribution of about $4,700 per person.

The details of the imputation procedure are discussed in the docstring at the top of the new puf_data/impute_pencon.py file. The basic idea is to use the W-2 data to compute the probability of a positive pension contribution for each age-wage cell and the pension contribution rate (as a fraction of wages) for each age-wage cell.

@MattHJensen @feenberg @andersonfrailey @hdoupe @Amy-Xu @donboyd5

@martinholmer
Copy link
Contributor Author

@andersonfrailey, After commit a500802, the puf.csv file I generate under PR #279 on my computer has this info:

puf_data$ ls -l puf.csv
-rw-r--r--  1 mrh  staff  56415698 Aug 31 09:59 puf.csv
puf_data$ md5 puf.csv
MD5 (puf.csv) = a10091a770472254c50f8985d8839162

Can you generate the same puf.csv file on your computer?

@andersonfrailey
Copy link
Collaborator

@martinholmer I was able to create a PUF with the same MD5 as you after your latest commit.

@martinholmer
Copy link
Contributor Author

@andersonfrailey said:

I was able to create a PUF with the same MD5 as you after your latest commit [in PR #279].

Thanks for checking, @andersonfrailey. That's good to know.
I plan to leave taxdata PR #279 open for review until next week. I'd value your comments on #279.
Meanwhile, I'll prepare a Tax-Calculator PR that incorporates the new puf.csv file and makes logic changes required by the data changes in #279.

@andersonfrailey
Copy link
Collaborator

@martinholmer, just reviewed the details of the PR and it looks good. Could you go into a little more detail about the hand calibration behind the HIWAGE variables?

@martinholmer
Copy link
Contributor Author

@andersonfrailey said:

Could you go into a little more detail about the hand calibration behind the HIWAGE variables?

Are the changes in commit 706830c sufficient?

@martinholmer
Copy link
Contributor Author

martinholmer commented Sep 1, 2018

@andersonfrailey said:

just reviewed the details of [#279] and it looks good.

I also thought that a few days ago, but now I'm not so sure. Here is my concern.

For the old PUF data, the e00200 variable contained for each filing unit wages and salaries net of pension contributions (because that's what's on Form 1040). So, the computation of income taxes with PUF data is conceptually OK, but the computation of payroll taxes on wages and salaries is incorrect (because the payroll tax base is gross earnings, which is the sum of Form 1040 wages and salaries (as included in the e00200* PUF variables) and the new pencon_* variables.

PR #279 attempts to fix this problem by adding pension contribution logic as the last step in the final preparation of the puf.csv file. But I'm now wondering if this is the correct placement. Maybe the imputation of pension contributions should be the last step in the preparation of the cps-matched-puf.csv file. Whether of not this is true depends on the exact definition of the following variables in the Stage_II_targets.csv file:

"Wages and Salaries: $1 Less Than $10,000",...
"Wages and Salaries: $10,000 Less Than $20,000",...
"Wages and Salaries: $20,000 Less Than $30,000",...
"Wages and Salaries: $30,000 Less Than $40,000",...
"Wages and Salaries: $40,000 Less Than $50,000",...
"Wages and Salaries: $50,000 Less Than $75,000",...
"Wages and Salaries: $75,000 Less Than $100,000",...
"Wages and Salaries: $100,000 Less Than $200,000",...
"Wages and Salaries: $200,000 Less Than $500,000",...
"Wages and Salaries: $500,000 Less Than $1 Million",...
Wages and Salaries: $1 Million and Over,...

The numbers on these rows seem to come from the puf_stage1/SOI_estimates.csv file, right?
If so, I guess they are wages and salaries net of pension contributions, right?
If the answer to both questions is "yes", then it seems as if PR #279 is correct in assuming that the imputed amounts of pension contributions will not affect the calculation of the PUF weights. Is that the way you see it?

@MattHJensen @feenberg

@andersonfrailey
Copy link
Collaborator

@martinholmer asked:

The numbers on these rows seem to come from the puf_stage1/SOI_estimates.csv file, right?

and

If so, I guess they are wages and salaries net of pension contributions, right?

The answer to your first question is yes. I believe the answer to your second questions is also yes. I don't have firm confirmation of that at this time, but it's logical that the IRS would report wages in their estimates as they appear on the 1040 (net of contributions).

With that in mind, I'd say this PR is correct in calculating the contributions after we've created cps-matched-puf.csv and that it will not have an affect on the calculation of the PUF weights.

@martinholmer
Copy link
Contributor Author

@andersonfrailey gave detailed responses to my questions in taxdata PR #279 and then concluded:

I'd say this PR is correct in calculating the contributions after we've created cps-matched-puf.csv and that it will not have an affect on the calculation of the PUF weights.

Thanks for your response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants