Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply Policy parameter processing/validation logic to Behavior class #1952

Merged
merged 55 commits into from
Apr 14, 2018

Conversation

hdoupe
Copy link
Collaborator

@hdoupe hdoupe commented Apr 5, 2018

This PR tries to resolve #1951 by applying the Policy parameter validation logic to the Behavior class. Fortunately, the validation code in the Policy "just worked" in the Behavior class. The attributes and methods that were copied over are:

  • implement_reform method was implemented as update_behavior
  • _validate_parameter_names_types
    • I found an interesting case where the user could pass in a boolean value for a parameter when the JSON document specified boolean_type and int_type as false. It turns out isinstance(True, (int, float)) is True. I'm not sure if this was desired behavior or not
    • The CPI logic in this method doesn't apply to the Behavior class. So, it could be removed.
  • _validate_parameter_values
  • current_law_version as baseline_version
  • reform_errors as behavior_errors
  • reform_warnings as behavior_warnings
  • _ignore_errors

The behaviors.json file had to be updated so that each parameter had the range and out_of_range messages and action. Please double check my work. I'm not very familiar with these parameters. Also, the behaviors.json document indicates that _BE_subinc_wrt_earnings is a boolean parameter, but the current validation logic seems to indicate that it could be a float or boolean or integer:

elif param == '_BE_subinc_wrt_earnings':
    if val < 0 or val > 1:
    raise ValueError(msg.format(param, nob, cyr, val))

The tbi.reform_warnings_errors function was updated to validate the behavior parameters in the user_mods dictionary. This function now returns a dictionary that looks like:

{
    'policy': {'warnings': '...', 'errors': '...'}, 
    'behavior': {'warnings': '...', 'errors': '...'}
}

I'll make the changes for this part on the PolicyBrain side.

I have a few more questions that I will ask in code comments.

@martinholmer I'm interested in your thoughts on this PR. If you agree with this approach, what do you think about refactoring the _validate_parameter_names_types and _validate_parameter_values methods into the ParametersBase class? If we do that, perhaps, it wouldn't be too hard to apply it to the other classes such as GrowFactors.

@hdoupe hdoupe changed the title Apply Policy parameter processing/validation logic to Behavior class [WIP] Apply Policy parameter processing/validation logic to Behavior class Apr 5, 2018
@hdoupe
Copy link
Collaborator Author

hdoupe commented Apr 5, 2018

I forgot to start on a clean branch. Thus, the first commit "Adds ability to use operator when validating w.r.t. other parameters" is an experiment I did a couple weeks ago and is unrelated to this PR.

"range": {"min": false, "max": true},
"out_of_range_minmsg": "",
"out_of_range_maxmsg": "",
"out_of_range_action": "stop"
},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this parameter be boolean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which parameter are you talking about?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. I'm referring to _BE_subinc_wrt_earnings.

last_reform_year = max(reform_years)
if last_reform_year > self.end_year:
msg = 'ERROR: {} YEAR reform provision in YEAR > end_year={}'
raise ValueError(msg.format(last_reform_year, self.end_year))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the year validation logic may be overkill for the behavior class. However, I thought it would be interesting to show how much of the Policy code could be copied over.

###################
if not self._ignore_errors and self.behavior_errors:
raise ValueError(self.behavior_errors)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Policy doesn't check whether there are error messages after doing value validation in _validate_parameter_values. Why doesn't Policy do this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking is the responsibility of the calling script.
See, for example, the basic recipe in the Cookbook.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Thanks.

'ERROR: ' + msg.format(year, name) + '\n'
)
########################

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be used by the Behavior class, but again, I thought it would be interesting to show how much code from the Policy class could simply be copied over.

'ERROR: ' +
msg.format(year, pname, pvalue[idx]) +
'\n'
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case:

# test_behavior.py
def test_validate_param_names_types_errors():
...
+    behv1 = Behavior()
+    specs1 = {2019: {'_BE_inc': [True]}} # _BE_inc should be a float (int_type = boolean_type = False)
+    with pytest.raises(ValueError):
+        behv1.update_behavior(specs1) 

did not throw a ValueError before the second part of the conditional statement was added. See policy.py for original conditional statement.

@martinholmer martinholmer changed the title [WIP] Apply Policy parameter processing/validation logic to Behavior class Apply Policy parameter processing/validation logic to Behavior class Apr 6, 2018
@martinholmer
Copy link
Collaborator

@hdoupe said:

I found an interesting case where the user could pass in a boolean value for a parameter when the JSON document specified boolean_type and int_type as false. It turns out isinstance(True, (int, float)) is True. I'm not sure if this was desired behavior or not.

Thanks for the bug report, @hdoupe. Can you provide detail in a new issue?

@hdoupe
Copy link
Collaborator Author

hdoupe commented Apr 13, 2018

All tests are passing except for the test coverage checks. There are three spots that are no longer hit by the test suite. It looks like there is not a path to them now that the _dict keywords have been removed from the Behavior/Policy constructors. I'll leave comments on my reasoning in the changes in commit 5ee4353.

@@ -94,9 +94,6 @@ def set_default_vals(self, known_years=999999):
"""
if hasattr(self, '_vals'):
for name, data in self._vals.items():
if not isinstance(name, six.string_types):
msg = 'parameter name {} is not a string'
raise ValueError(msg.format(name))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paths to this section of code:

Policy.__init__ --> ParametersBase.initialize --> Parameters.set_default_vals
Behavior.__init__ --> ParametersBase.initialize --> Parameters.set_default_vals

In both of these cases, the values are read directly from the defaults specified in current_law_policy.json and behavior.json. Thus, it is impossible to have an incorrect name unless the name specified in the defaults files is a number or something like that. A test could potentially be added for this possibility.

Policy.implement_reform -->

        self._validate_parameter_names_types(reform)
        ....
        # optionally apply cpi_offset to inflation_rates and re-initialize
        if Policy._cpi_offset_in_reform(reform):
            known_years = self._apply_reform_cpi_offset(reform)
            self.set_default_vals(known_years=known_years)

--> ParametersBase.set_default_vals

However, Policy._validate_parameter_names_types was called directly before set_default_vals. Thus, we can be certain that the names are strings and valid otherwise.

msg = 'parameter name {} not in parameter values dictionary'
raise ValueError(msg.format(name))
vals_indexed = self._vals[name].get('cpi_inflated', False)
integer_values = self._vals[name].get('integer_value')
name_plus_cpi = name + '_cpi'
Copy link
Collaborator Author

@hdoupe hdoupe Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paths to this section of code:

Policy.implement_reform --> ParametersBase._update
Behavior.update_behavior --> ParametersBase._update

These are both called after calling the Policy and Behavior classes' respective _validate_param_names_types methods. Thus, we can be certain that the name is in self._vals. Also, I think an equivalent condition pname not in self._vals is already in Policy._validate_param_names_typeslines 464-468 and was added to Behavior._validate_param_names_types.

@@ -387,9 +380,6 @@ def _update(self, year_mods):
for name in unused_names:
used_names.add(name)
pname = name[:-4] # root parameter name
if pname not in self._vals:
msg = 'root parameter name {} not in values dictionary'
raise ValueError(msg.format(pname))
pindexed = year_mods[year][name]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paths to this section of code:

Policy.implement_reform --> ParametersBase._update

Policy._validate_param_names_values is called before ParametersBase._update. Policy._validate_param_names_values has a similar check:

pname = name[:-4]  # root parameter name
if pname not in param_names:
    msg = '{} {} unknown parameter name'
    self.reform_errors += (
        'ERROR: ' + msg.format(year, name) + '\n'
)

@hdoupe
Copy link
Collaborator Author

hdoupe commented Apr 13, 2018

I said:

All tests are passing except for the test coverage checks. There are three spots that are no longer hit by the test suite. It looks like there is not a path to them now that the _dict keywords have been removed from the Behavior/Policy constructors. I'll leave comments on my reasoning in the changes in commit 5ee4353.

After further consideration, it looks like these lines of code could still be hit by GrowFactors or GrowDiff since they have a _dict argument in their constructors and do not have any validation code.

@martinholmer what do you think about this?

@martinholmer
Copy link
Collaborator

@hdoupe said:

In both of these cases, the values are read directly from the defaults specified in current_law_policy.json and behavior.json. Thus, it is impossible to have an incorrect name unless the name specified in the defaults files is a number or something like that.

I agree. Now that the parameters are being input only by reading files, it seems as there is no need for this kind of run-time checking.

However, we do need to be sure that the unit tests include checks of the content of the JSON files, so that a developer does not mistakenly specify invalid data in the JSON files. Do we now have that kind of testing of the content of the current_law_policy.json and behavior.json files? Or do we need to add some testing?

@martinholmer
Copy link
Collaborator

@hdoupe said:

After further consideration, it looks like these lines of code [in the ParametersBase class] could still be hit by Growfactors or Growdiff since they have a _dict argument in their constructors and do not have any validation code.

Actually, I think the constructor for the Growfactors class does not have a dictionary argument.
But the Growdiff class does have one and does very much need one.

Does it make sense to proceed this way?
Go ahead with deleting the ParametersBase code.
This will have no effect on the Growfactors class.
And soon I'll add whatever validation checking is needed by the Growdiff class to the Growdiff class.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Apr 13, 2018

@martinholmer asked:

However, we do need to be sure that the unit tests include checks of the content of the JSON files, so that a developer does not mistakenly specify invalid data in the JSON files. Do we now have that kind of testing of the content of the current_law_policy.json and behavior.json files? Or do we need to add some testing?

I do not see any testing that does this. So we should probably add it. I can go ahead and add this. We just want to read in the defaults files and make sure that the keys are all strings, correct?

@hdoupe
Copy link
Collaborator Author

hdoupe commented Apr 13, 2018

@martinholmer said:

Actually, I think the constructor for the Growfactors class does not have a dictionary argument.

Yep, this is correct. Thanks for pointing this out.

Does it make sense to proceed this way?
Go ahead with deleting the ParametersBase code.
This will have no effect on the Growfactors class.
And soon I'll add whatever validation checking is needed by the Growdiff class to the Growdiff class.

Sounds sensible to me.

@martinholmer
Copy link
Collaborator

@hdoupe proposed this change to the _update method in parameters.py as part of #1952:

         for name in unused_names:
             used_names.add(name)
             pname = name[:-4]  # root parameter name
-            if pname not in self._vals:
-                msg = 'root parameter name {} not in values dictionary'
-                raise ValueError(msg.format(pname))

We don't want to drop these three lines, do we?

If a JSON reform file includes a parameter named bogus_cpi, we want to catch that error at runtime.

Unless I'm confused, we want to keep these three lines and add a test that raises a ValueError when the reform includes something like bogus_cpi (because bogus is not a valid policy parameter name).

@martinholmer
Copy link
Collaborator

@hdoupe said:

I do not see any testing that does this. So we should probably add it. I can go ahead and add this. We just want to read in the defaults files and make sure that the keys are all strings, correct?

You could easily add the test that each key is a string to this test_parameters.py test:

@pytest.mark.parametrize("fname",
                         [("behavior.json"),
                          ("consumption.json"),
                          ("current_law_policy.json"),
                          ("growdiff.json")])
def test_json_file_contents(tests_path, fname):
    """
    Check contents of JSON parameter files.
    """

@hdoupe
Copy link
Collaborator Author

hdoupe commented Apr 13, 2018

@martinholmer said:

We don't want to drop these three lines, do we?

If a JSON reform file includes a parameter named bogus_cpi, we want to catch that error at runtime.

Unless I'm confused, we want to keep these three lines and add a test that raises a ValueError when the reform includes something like bogus_cpi (because bogus is not a valid policy parameter name).

So, you are saying that a JSON reform file with an incorrect CPI parameter name needs this validation? Something like:

{
    "policy": {
        "_Bogus_cpi": {
            2017: true
        }
    }
}

should trigger those lines of code?

@martinholmer
Copy link
Collaborator

martinholmer commented Apr 13, 2018

@hdoupe asked:

So, you are saying that a JSON reform file with an incorrect CPI parameter name needs this validation?
Something like:

{"policy": {"_bogus_cpi": {"2017": true}}}

should trigger those lines of code?

Right.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Apr 13, 2018

@martinholmer I tried the following test:

bogus = """
{
    "policy": {
        "_Bogus_cpi": {
            "2017": true
        }
    }
}"""
bogus = Calculator.read_json_param_objects(bogus, None)
pol7 = Policy()
pol7.implement_reform(bogus['policy'])

And, I get a ValueError that is raised in Policy.implement_reform regardless of whether the lines in parameters.py are commented out or not. I think Policy._validate_param_names_types is called before Parameters._update. Policy._validate_param_names_types does a similar check to the lines you mentioned: https://github.com/open-source-economics/Tax-Calculator/blob/master/taxcalc/policy.py#L445-L450

However, I may be missing something.

@martinholmer
Copy link
Collaborator

@hdoupe said:

I get a ValueError that is raised in Policy.implement_reform regardless of whether the lines in parameters.py are commented out or not. I think Policy._validate_param_names_types is called before Parameters._update. Policy._validate_param_names_types does a similar check to the lines you mentioned: https://github.com/open-source-economics/Tax-Calculator/blob/master/taxcalc/policy.py#L445-L450

However, I may be missing something.

OK, I see. You were correct from the beginning. The Policy class implement_reform method catches this error before the ParametersBase _update method is called. Sorry, about the confusion. The way you had it before seems correct.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Apr 13, 2018

Ok, thanks for thoroughly reviewing that change. This is a complex process and it's easy for either one (or both) of us to miss something.

@martinholmer
Copy link
Collaborator

@hdoupe, pull request #1952 looks great! Thanks for all the work involved in adding validity checking for revisions in Behavior class parameters. And also, thanks for finding several needed improvements in the validity checking for Policy parameters.

@martinholmer
Copy link
Collaborator

@hdoupe, I notice one of the improvements I've made based on your efforts is causing conflicts in the parameters.py file. After you update your branch to be like the newest version of the master branch, those conflict should go away. At that stage, all the tests should pass and the code coverage report would say there are three untested statements in the behavior.py file. Your can just skip those lines in the coverage test because there are currently no parameters in behavior.json that are integer parameters. So, the lines would look something like this:

+                        elif int_param_type:
+                            if not pval_is_int:  # pragma: no cover
+                                msg = '{} {} value {} is not integer'  # pragma: no cover
+                                self.parameter_errors += (
+                                    'ERROR: ' +
+                                    msg.format(year, pname, pval) +
+                                    '\n'
+                                )  # pragma: no cover
+                        else:  # param is float type

Then I'll be happy to merge #1952. Thanks again for all the great work.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Apr 14, 2018

@martinholmer thanks! And thanks for thoroughly reviewing these changes and walking me through this part of Tax-Calculator. I enjoyed learning more about the parameter processing logic in Tax-Calculator. I think I have a much better grasp about how it works now. Also, a second set of eyes to review complex code such as this makes a big difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Tax-Calculator behavior parameters validation
3 participants