Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace filer variable with data_source #296

Merged
merged 2 commits into from
Jan 10, 2019

Conversation

andersonfrailey
Copy link
Collaborator

This PR resolves issue #288. It renames filer to data_source in puf.csv and removes filer from cps.csv.gz completely.

@martinholmer
Copy link
Contributor

@andersonfrailey said in taxdata pull request #296:

This PR resolves issue #288. It renames filer to data_source in puf.csv and removes filer from cps.csv.gz completely.

Perhaps I'm confused, but I thought the filer variable was meant to indicate whether or not the filing unit was required to file an income tax return, and thus, some CPS-derived units could have a value of 1 for filer. Am I confused? Are you saying that in the old puf.csv that filer equals one if and only if the unit was mainly derived from the SOI-PUF data set? In other words, are you saying that the old filer variable in the puf.csv file was equal to zero if and only if the unit was derived from the CPS?

@andersonfrailey
Copy link
Collaborator Author

@martinholmer asked:

In other words, are you saying that the old filer variable in the puf.csv file was equal to zero if and only if the unit was derived from the CPS?

Correct. Any CPS units that were flagged as having to file were used in the statistical match and would end up being part of a unit primarily derived from the PUF.

@martinholmer
Copy link
Contributor

@andersonfrailey said in taxdata pull request #296:

In other words, are you saying that the old filer variable in the puf.csv file was equal to zero if and only if the unit was derived from the CPS?

Correct. Any CPS units that were flagged as having to file were used in the statistical match and would end up being part of a unit primarily derived from the PUF.

Thanks for the clarification. I'd like to see if I can replicate the results of #296 on my computer.
What's the bytesize and MD5 checksum for the new puf.csv file generated with #296?
Assuming I can get the same two data files, pull request #296 will be ready for merging.

@andersonfrailey
Copy link
Collaborator Author

@martinholmer
MD5: 4aa15435c319bf5e4d3427faf52384c0
Bytesize: 56415704

@martinholmer
Copy link
Contributor

@andersonfrailey described the puf.csv file generated by PR #296 as follows:

MD5: 4aa15435c319bf5e4d3427faf52384c0
Bytesize: 56415704

I can get the same puf.csv file on my computer and the new cps.csv.gz file is the same as the file included in this PR. So, everything looks good to me. I suggest you merge #296 today. Then I'll prepare a Tax-Calculator pull request that incorporates these two new data input files.

@martinholmer
Copy link
Contributor

@andersonfrailey, What the time schedule on merging taxdata PR #296?

@andersonfrailey
Copy link
Collaborator Author

andersonfrailey commented Jan 10, 2019

@martinholmer, I'll merge now and distribute the new PUF shortly.

Edit: by shortly I mean when you have the associated Tax-Calculator PR ready and merged as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants