Update data #2599

andersonfrailey · 2021-06-17T22:07:06Z

This PR updates all of the data in Tax-Calc to be up to date with TaxData release 0.3.0. It also extends the projections out to 2031. I had to make a few changes to the tests to get (almost) all of them passing:

The tolerance for closeness of the full and subsample PUF and CPS files had to be increased
The growth factor (AINTS) for e00300 is now > 1 in 2015 (a result of updating the SOI estimates) so the relative value test had to be flipped
Pretty much all of the reform files had new results, though the differences didn't look to huge
The PUF now has a few new variables. See taxdata for more details there

There are still a couple errors failing that I don't know how to fix. The first is in test_benefits.py:

    @pytest.mark.benefits
    def test_benefits(tests_path, cps_fullsample):
        """
        Test CPS benefits.
        """
        # pylint: disable=too-many-locals
        benefit_names = ['ssi', 'mcare', 'mcaid', 'snap', 'wic',
                         'tanf', 'vet', 'housing']
        # write benefits_actual.csv file
        recs = Records.cps_constructor(data=cps_fullsample)
        start_year = recs.current_year
        calc = Calculator(policy=Policy(), records=recs, verbose=False)
        assert calc.current_year == start_year
        year_list = list()
        bname_list = list()
        benamt_list = list()
        bencnt_list = list()
        benavg_list = list()
        for year in range(start_year, Policy.LAST_BUDGET_YEAR + 1):
            calc.advance_to_year(year)
            size = calc.array('XTOT')
            wght = calc.array('s006')
            # compute benefit aggregate amounts and head counts and average benefit
            # (head counts include all members of filing unit receiving a benefit,
            #  which means benavg is f.unit benefit amount divided by f.unit size)
            for bname in benefit_names:
                ben = calc.array('{}_ben'.format(bname))
                benamt = round((ben * wght).sum() * 1e-9, 3)
                bencnt = round((size[ben > 0] * wght[ben > 0]).sum() * 1e-6, 3)
>               benavg = round(benamt / bencnt, 1)
E               FloatingPointError: invalid value encountered in double_scalars

And the second is in testpolicy.py:

    def test_apply_cpi_offset(self):
        """
        Test applying the parameter_indexing_CPI_offset parameter
        without any other parameters.
        """
        pol1 = Policy()
        pol1.implement_reform(
            {"parameter_indexing_CPI_offset": {2021: -0.001}}
        )
    
        pol2 = Policy()
        pol2.adjust(
            {"parameter_indexing_CPI_offset": [
                {"year": 2021, "value": -0.001}
            ]}
        )
    
        cmp_policy_objs(pol1, pol2)
    
        pol0 = Policy()
        pol0.implement_reform({"parameter_indexing_CPI_offset": {2021: 0}})
    
        init_rates = pol0.inflation_rates()
        new_rates = pol2.inflation_rates()
    
        start_ix = 2021 - pol2.start_year
    
        exp_rates = copy.deepcopy(new_rates)
        exp_rates[start_ix:] -= pol2._parameter_indexing_CPI_offset[start_ix:]
        np.testing.assert_allclose(init_rates, exp_rates)
    
        # make sure values prior to 2021 were not affected.
        cmp_policy_objs(pol0, pol2, year_range=range(pol2.start_year, 2021))
    
        pol2.set_state(year=[2022, 2023])
>       np.testing.assert_equal(
            (pol2.EITC_c[1] / pol2.EITC_c[0] - 1).round(4),
            pol0.inflation_rates(year=2022) + (-0.001),
        )
E       AssertionError: 
E       Arrays are not equal
E       
E       Mismatched elements: 4 / 4 (100%)
E       Max absolute difference: 3.46944695e-18
E       Max relative difference: 1.75224594e-16
E        x: array([0.0198, 0.0198, 0.0198, 0.0198])
E        y: array(0.0198)

I think this last one is just an issue with array sizes.

Let me know if there are any questions or tips on fixing the failing tests.

cc @MattHJensen @jdebacker

jdebacker · 2021-06-17T22:58:36Z

@andersonfrailey Thanks for this PR!

I haven't run into these errors before, but they both look related to the data, so I'd start looking through the objects in these functions, to identify where things take on unexpected shapes/values.

andersonfrailey · 2021-06-18T15:44:50Z

Figured out the test failures. The benefits test failed because benefits growth factors were missing from growfactors.csv for 2031 so all the benefit variables were replaced with NaN values in that year. I've opened a PR up in taxdata to fix this issue, so technically this PR will bring us to taxdata version 0.3.1 after I push the taxdata bug fix.

The policy test failures were a rounding issue. Here's the test currently:

np.testing.assert_equal(
   (pol2.EITC_c[1] / pol2.EITC_c[0] - 1).round(4),
   pol0.inflation_rates(year=2022) + (-0.001),
)

And the actual values for each element in that comparison:

In [7]: pol0.inflation_rates(year=2022) + (-0.001)
Out[7]: 0.019799999999999998
In [9]: (pol2.EITC_c[1] / pol2.EITC_c[0] - 1).round(4)
Out[9]: array([0.0198, 0.0198, 0.0198, 0.0198])

To fix the bug, I just need to round pol0.inflation_rates(year=2022) + (-0.001) to 4 decimal places, like we already round (pol2.EITC_c[1] / pol2.EITC_c[0] - 1). This is the test with rounding:

np.testing.assert_equal(
   (pol2.EITC_c[1] / pol2.EITC_c[0] - 1).round(4),
   (pol0.inflation_rates(year=2022) + (-0.001)).round(4),
)

codecov · 2021-06-18T15:58:58Z

Codecov Report

Merging #2599 (3e38302) into master (7d96420) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #2599   +/-   ##
=======================================
  Coverage   98.46%   98.46%           
=======================================
  Files          14       14           
  Lines        2611     2611           
=======================================
  Hits         2571     2571           
  Misses         40       40

Flag	Coverage Δ
unittests	`98.46% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
taxcalc/growdiff.py	`100.00% <100.00%> (ø)`
taxcalc/policy.py	`100.00% <100.00%> (ø)`

jdebacker · 2021-06-24T13:15:24Z

@andersonfrailey I wanted to check in on the status of this PR -- we are waiting on this to be updated to TaxData 0.3.1, correct?

andersonfrailey · 2021-06-24T13:41:35Z

@jdebacker, the last commit updated it. It's ready to go now

jdebacker · 2021-06-24T15:06:07Z

@andersonfrailey What is the reason for making CPS-specific variables such as line and sequence numbers available for taxdata_puf?

andersonfrailey · 2021-06-24T21:08:39Z

@jdebacker, adding those gives users the option to find the individual CPS records that have been matched to the PUF in the raw CPS file. To date, most users have wanted the identifiers to link tax units from taxdata_cps to the raw CPS, but with the TaxData refactor adding identifiers to the PUF was easy so I figured it'd be better to just have those available in the PUF as well.

jdebacker · 2021-06-25T00:30:17Z

@andersonfrailey Thanks for the previous reply. I now understand and I think that makes sense to add those variables to the PUF. To be sure -- is the TaxData documentation clear in noting that these are statistically matched records and not direct matches of the same individual in the two datasets?

For the other variables now available in the PUF, such as housing_ben, tanf_ben, ssi_ben, vet_ben, wic_ben --- do you have results showing if aggregate amounts (using the PUF weights) match administrative totals?

andersonfrailey · 2021-06-25T21:24:50Z

To be sure -- is the TaxData documentation clear in noting that these are statistically matched records and not direct matches of the same individual in the two datasets?

Yes, we have documentation that explains that these are two different datasets created in two different ways.

For the other variables now available in the PUF, such as housing_ben, tanf_ben, ssi_ben, vet_ben, wic_ben --- do you have results showing if aggregate amounts (using the PUF weights) match administrative totals?

I don't offhand, but I'll get those together and post them.

andersonfrailey · 2021-07-06T21:45:53Z

Latest commit removes benefit variables from the PUF.

jdebacker · 2021-07-13T18:27:56Z

@andersonfrailey Thanks for the updates to this PR. It looks good to me and I'll plan to merge tomorrow after a final review unless there are suggestions otherwise.

cc @MattHJensen @Peter-Metz

jdebacker · 2021-07-14T17:26:47Z

Thank you for the contribution, @andersonfrailey. Merging.

Update data

552a4da

andersonfrailey added data help wanted in progress labels Jun 17, 2021

andersonfrailey added 2 commits June 18, 2021 11:12

fix growth factors bug

20e961f

fix policy test failure

b37eb8f

remove PUF benefits

3e38302

jdebacker mentioned this pull request Jul 14, 2021

Release notes for next taxcalc release #2597

Closed

MattHJensen merged commit 72d8e91 into PSLmodels:master Jul 17, 2021

jdebacker mentioned this pull request Dec 6, 2022

CBO baseline update #2662

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update data #2599

Update data #2599

andersonfrailey commented Jun 17, 2021

jdebacker commented Jun 17, 2021

andersonfrailey commented Jun 18, 2021

codecov bot commented Jun 18, 2021 •

edited

Loading

jdebacker commented Jun 24, 2021

andersonfrailey commented Jun 24, 2021

jdebacker commented Jun 24, 2021

andersonfrailey commented Jun 24, 2021

jdebacker commented Jun 25, 2021

andersonfrailey commented Jun 25, 2021

andersonfrailey commented Jul 6, 2021

jdebacker commented Jul 13, 2021

jdebacker commented Jul 14, 2021

Update data #2599

Update data #2599

Conversation

andersonfrailey commented Jun 17, 2021

jdebacker commented Jun 17, 2021

andersonfrailey commented Jun 18, 2021

codecov bot commented Jun 18, 2021 • edited Loading

Codecov Report

jdebacker commented Jun 24, 2021

andersonfrailey commented Jun 24, 2021

jdebacker commented Jun 24, 2021

andersonfrailey commented Jun 24, 2021

jdebacker commented Jun 25, 2021

andersonfrailey commented Jun 25, 2021

andersonfrailey commented Jul 6, 2021

jdebacker commented Jul 13, 2021

jdebacker commented Jul 14, 2021

codecov bot commented Jun 18, 2021 •

edited

Loading