-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Records variable metadata in evars.json #1179
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1179 +/- ##
==========================================
+ Coverage 98.83% 98.83% +<.01%
==========================================
Files 39 39
Lines 3002 3007 +5
==========================================
+ Hits 2967 2972 +5
Misses 35 35
Continue to review full report at Codecov.
|
@zrisher, is there any other meta data that you are planning to add to this file in the short term? For example, would it contain form and line numbers correspondences to each years' tax forms? I think we'll probably want a more general name than |
@MattHJensen asked about @zrisher pull request #1179:
I agree. The proposed JSON file contains information about both input and output variables. @MattHJensen also asked:
The form/line numbers are in both the Another problem is that too many "description" values are empty in the current version of the new JSON file. Can the best descriptions from |
file name inclusion of form line correspondence info descriptions |
@zrisher first said:
I have a strong preference for @zrisher then said:
I think that would be extremely important to include the Form/Line# in the new @zrisher finally said:
That would be very good. And if there is a difference in descriptions between |
@zrisher said:
Such a test is an excellent idea for the transition work, but isn't the idea of #1179 to replace both the And such a test is problematic when you start merging and choosing between descriptions in those two old files. What, exactly, is the goal of #1179. |
@zrisher, If the new |
@martinholmer said:
Yep, the goal is to provide a single source of truth for record variable metadata.
That is why I removed the test in the final commit c01e237. 😄
Well anything that can parse JSON could construct more user-centric documentation using the file. Clearly python can parse it, so we could build a python notebook similar to those in /docs/notebooks. |
How about simply |
Hey @martinholmer sorry I missed your first comment.
Sounds great to me!
Thinking about it more, I agree. After #1184 is polished and merged I will amend its direct mapping logic to pull data from
Will do. |
^ Rebased to latest master & fixed conflicts. |
Done.
Done.
Here are the descriptions that differ between the above that which I'm not sure how to merge:
Note that the placement of the source form name in the description is inconsistent between the variables, e.g. "General business credit (Form 3800)", "Taxes: Recapture tax - Form 4255", "Sch A: Medical and dental expenses". But since we plan to provide the tax form mapping directly via additional metadata I think we can ignore those for now. |
@andersonfrailey , could you give @zrisher guidance when you have a chance. Not urgent, but it could be a nice break from other work. |
@zrisher, Thanks for the insightful idea behind PR#1179. It does make much more sense to centralize the metadata about the variables read into and calculated by the Records class in a json file separate from the Records class source code. Because we have been so slow to act on your pull request, it has needed some updating in order to be merged into the master Tax-Calculator branch. I've done that on the martinholmer:pr-1179 branch, which appears on GitHub as Tax-Calculator pull request #1285. Thanks again for all your work on this. It is a very helpful contribution to Tax-Calculator. |
Prepare zrisher PR #1179 for merging into master branch
Harmonize records_variables.json descriptions per #1179 discussion
This PR introduces a new file
taxcalc/evars.json
that can act as a single source of truth for records variables, similar to howtaxcalc/current_law_policy.json
describes policy parameters.While declaring the variables directly as sets in the records class is simpler (and clearly doesn't take nearly as many lines), metadata about those variables has to be described separately, e.g. descriptions in
e_variable_info.csv
and types represented by holding the names in different variables.By declaring records variable data in a json file, we improve its accessibility:
This PR changes the records class to pull its variable data from
evars.json
. Tests are included that confirm the generated variable lists are the same, but they're removed in the final commit. To run the tests, checkout my branchevars-json-test
.These tests also confirm that the descriptions provided exactly match those from
e_variable_info.csv
. I believe that once this PR is incorporated,stats_summary.py
could be refactored to derive its information fromevars.json
instead and these issues could be closed: #1089, #633.Obviously the amount of provided descriptions is inadequate. We can fix that over time, but I wanted to have them exactly match
e_variable_info
initially.Tax logic is unaffected by this PR, all tests pass on my machine, and I believe coverage should be unaffected.