Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to type coercion #384

Merged
merged 9 commits into from
Mar 21, 2017

Conversation

erayd
Copy link
Contributor

@erayd erayd commented Mar 13, 2017

What

  • Don't coerce already-valid types
  • Add CHECK_MODE_EARLY_COERCE to enable previous coercion behavior
  • Don't test further types if we've already validated one successfully
  • Fix reversed arguments from centralize errors #364
  • Implement every remaining coercion case from ajv's matrix
  • Rewrite coercion tests to improve readability and simplify adding future tests.

Why

See discussion in #379 regarding coercion of already-valid types and CHECK_MODE_EARLY_COERCE. Other items are performance / bugfix, or are to improve testing.

@erayd erayd changed the title [BUGFIX] Don't coerce already-valid values (fixes #379 pt. 2) [BUGFIX] Don't coerce already-valid types (fixes #379 pt. 2) Mar 13, 2017
@erayd erayd changed the title [BUGFIX] Don't coerce already-valid types (fixes #379 pt. 2) Don't coerce already-valid types (implements #379 pt. 2) Mar 13, 2017
@@ -185,6 +185,10 @@ protected function validateType(&$value, $type, $coerce = false)
}

if ('array' === $type) {
if ($coerce) {
$value = $this->toArray($value);
Copy link
Collaborator

@shmax shmax Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I notice that ajv only does this conversion if coerceTypes is set to "array". Should this be controlled by another flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. I'm happy either way - would you like a flag?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly don't know. We can try without for a while, and see if the rationale for having one ever emerges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion - let's do that then. And if someone asks for it, then we can add it at that point.

I don't supposed you know whether @epoberezkin has stated his rationale for doing it that way in ajv?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think this is the issue that spawned the feature: ajv-validator/ajv#158

They discuss it a bit, then come up with the coerceTypes:'array' idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... will have to think on this a bit. At a minimum I should write some tests to check that post-coercion validation is happening properly.

@@ -255,10 +269,27 @@ protected function toNumber($value)

protected function toInteger($value)
{
if (is_numeric($value) && (int) $value == $value) {
$numberValue = $this->toNumber($value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little risky, isn't it?

echo toInteger("5f"); // 5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, good catch - this is why we have code reviews! Thanks :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be sorted now.

return (int) $value;
}
if ($this->getTypeCheck()->isArray($value) && count($value) === 1) {
reset($value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset returns the first element in the array, so you can skip a step and pass its results directly on line 272

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - thanks :-).

protected function toInteger($value)
{
if (is_numeric($value) && (int) $value == $value) {
return (int) $value; // cast to number
$numberValue = $this->toNumber($value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the call to toNumber get you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other type coersion cases. It means no copy pasta, because other than integer-ness, the two methods are identical.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there anything actually functionally different? Because now we're calling is_numeric twice whereas before it was only called once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - the second is_numeric call catches cases that fall through toNumber without being successfully coerced.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what I mean, though--if you put the code back the way it was then there is only one call to is_numeric. So I guess I'm still not clear on what is functionally different about the new code. Are additional values now being coerced to integer (eg. boolean) with this code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The new code also coerces from bool, null, and array, via the call to toNumber.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the logic is to apply all the coercion rules for number, and then see if the result can be further coerced to an int. If it is, then return the cast int - if not, return the original value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okie-doke

*
* @param mixed $value
*
* @return array
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the return type should be array|mixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the version of PHPDoc in this project OK with that? I've had issues with types like that in the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If it's not OK, it should be mixed)

Copy link
Contributor Author

@erayd erayd Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, apparently this project doesn't use PHPDoc. I'll change to array|mixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just mixed is fine, too. I suggested array|mixed to emphasize that array is primarily what we're interested in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your original suggestion better. If we're not using PHPdoc, then there's no reason to avoid it IMO. If such a reason crops up later, it's a pretty quick / easy change to downgrade the return type to mixed.

@erayd
Copy link
Contributor Author

erayd commented Mar 13, 2017

@shmax I think this PR now contains everything we intended it to contain. Is there anything you'd like me to add / change / remove / hide-in-the-corner-and-cry-from-shame-about?

@shmax
Copy link
Collaborator

shmax commented Mar 13, 2017

Wow, very impressive, as always! I'll take another look tomorrow when I'm fresh but for now I say LGTM

@erayd
Copy link
Contributor Author

erayd commented Mar 13, 2017

Thanks :-).

@erayd
Copy link
Contributor Author

erayd commented Mar 13, 2017

Hmm, just thought of one other thing - I need to ensure that the double-coercion for array cases is covered.

@erayd
Copy link
Contributor Author

erayd commented Mar 13, 2017

Double-coercion tests done.

@erayd erayd changed the title Don't coerce already-valid types (implements #379 pt. 2) Improvements to type coercion Mar 13, 2017
@erayd
Copy link
Contributor Author

erayd commented Mar 14, 2017

@shmax How's this?

README.md Outdated
`CHECK_MODE_EARLY_COERCE` has no effect unless used in combination with `CHECK_MODE_COERCE_TYPES`. If
enabled, type coercion will occur as soon as possible, even if the value may already be of another valid
type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you maybe expand on this a bit to make it more clear? I know what you're talking about, but I'm not sure that someone else coming to the project for the first time would.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly.

// toType
'string' => array(
// fromType fromValue toValue valid Test Number
array('string', '"string"', 'string', true), // #0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I love the new grid, much easier to read!

Copy link
Collaborator

@shmax shmax Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually find it confusing when a code sample uses values that mimic or mirror the name of the type they are meant to be demonstrating. Instead of 'string', can we use something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :-).

if ($valid) {
$this->assertTrue(
$validator->isValid(),
'Validation failed: ' . json_encode($validator->getErrors(), \JSON_PRETTY_PRINT)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

*/
public function testInvalidCoerceCasesUsingAssoc($input, $schema, $errors = array())
/** @dataProvider dataCoerceCases **/
public function testCoerceCases($schema, $data, $startType, $endType, $endValue, $valid, $early = false, $assoc = false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the 7th argument was just something like $flags? Then instead of passing in true you could pass in CHECK_MODE_EARLY_COERCE, which would be a little easier to read, and you don't have to keep adding more arguments to this method signature as more flags are added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - will do.

array('boolean', 'true', true, true), // #17
array('NULL', 'null', false, true), // #18
array('array', '["true"]', true, true), // #19
array('object', '{"a":"b"}', null, false), // #20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - well spotted!

array('integer', '45', '45', true), // #1
array('boolean', 'true', 'true', true), // #2
array('boolean', 'false', 'false', true), // #3
array('NULL', 'null', '', true), // #4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with the counting numbers in the comments? They look very nice, but it seems like the kind of thing that would be a pain to maintain going forward. What are they for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the test number, starting from zero. PHPUnit reports those numbers when anything fails - it makes tracking down a failing test case much faster / easier when you don't have to manually count them all every time, particularly when there are this many tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's what I was originally thinking about when I suggested breaking things up into smaller tests. testEarlyCoerce, testArray, testNoCoerceNeeded, etc. Maybe not for the vanilla stuff, but at least the complex ones. This is okay, too, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer them as separate methods? I like them better all in one place like this, because it's easier to compare them - but I can certainly break to one method per case if you'd rather.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you. I can live with the numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'd like to keep them the way they are please - I find it much easier to debug this way :-).

$this->assertTrue(gettype($value->negativeInteger) == 'integer');
$this->assertTrue(gettype($value->boolean) == 'boolean');
// #38 check post-coercion validation (to array)
$tests[] = array(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to break out these more complicated ones like this, but since even the complex ones have only a single property, can they be setup before you iterate over them on 83?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can. The reason I did it this way is because complex tests are more likely to be added than simple ones, and adding things to the beginning or midway through requires manually renumbering all the test cases - whereas adding something to the end is easy, and doesn't require renumbering anything.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting adding them to the big grid, only adding them after the grid but before the loop. That way you don't have to do the brain work of trying to figure out what the json string will look like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem - done.

Copy link
Collaborator

@shmax shmax Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I get it--you can't handle the complex tests the same way you do the others because they have arrays of types, so you can't slot them in the same way as the others.

Copy link
Collaborator

@shmax shmax Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least a few of them do. It seems like some of them could still be handled the same way as the ones in the big grid, eg:

before

// #39 check post-coercion validation (from array)
        $tests[] = array(
            '{"properties":{"propertyOne":{"type":"number"}}}',
            '{"propertyOne":["ABC"]}',
            'array', null, null, false
        );

after

// #39 check post-coercion validation (from array)
        $tests['number'][] = array(
            'array',
            ["ABC"]',
            null, null, false
        );

I'm sure I have all that in the wrong order, but hopefully you know what I mean

Copy link
Contributor Author

@erayd erayd Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - have moved test 39 to the grid. I don't think any of the others can move though - if you spot any, please let me know, but I think they all have reasons for being where they are.

@shmax
Copy link
Collaborator

shmax commented Mar 14, 2017

LGTM Fantastic work! Take a victory lap

@erayd
Copy link
Contributor Author

erayd commented Mar 14, 2017

Thanks :-). Really appreciate the work you've done reviewing all this too, and the various suggestions you've made - the result is better because of that.

@erayd
Copy link
Contributor Author

erayd commented Mar 17, 2017

Rebased.

@bighappyface
Copy link
Collaborator

@erayd conflict?

@erayd
Copy link
Contributor Author

erayd commented Mar 17, 2017

@bighappyface I think we overlapped, and I rebased before you merged the conflict. I have rebased it again.

@erayd
Copy link
Contributor Author

erayd commented Mar 21, 2017

Rebased.

@bighappyface
Copy link
Collaborator

Once more, please :)

 * Add all remaining coercion cases from ajv matrix
 * Rewrite the coercion tests to tidy things up a bit
If set, falls back to the old behavior of coercing to the first
compatible type, regardless of whether another already-valid type might
be available.
 * Fix whitespace
 * Turn $early into $extraFlags
 * Change "string" to "ABC" in string test
 * Update README.md description of CHECK_MODE_EARLY_COERCE
@erayd
Copy link
Contributor Author

erayd commented Mar 21, 2017

Done.

Copy link
Collaborator

@bighappyface bighappyface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@bighappyface bighappyface merged commit f3bfb47 into jsonrainbow:6.0.0-dev Mar 21, 2017
@erayd erayd mentioned this pull request Mar 21, 2017
@erayd erayd deleted the bugfix-379-coercion branch March 22, 2017 21:36
erayd added a commit to erayd/json-schema that referenced this pull request Mar 22, 2017
* Improve performance - don't loop over everything if already valid

* Don't coerce already-valid types (fixes jsonrainbow#379)

* Add remaining coercion cases & rewrite tests

 * Add all remaining coercion cases from ajv matrix
 * Rewrite the coercion tests to tidy things up a bit

* Add CHECK_MODE_EARLY_COERCE

If set, falls back to the old behavior of coercing to the first
compatible type, regardless of whether another already-valid type might
be available.

* Add multiple-type test that requires coercion

* \JSON_PRETTY_PRINT doesn't exist in PHP-5.3, so work around this

* Various PR cleanup stuff

 * Fix whitespace
 * Turn $early into $extraFlags
 * Change "string" to "ABC" in string test
 * Update README.md description of CHECK_MODE_EARLY_COERCE

* Move loop after complex tests definition

* Move test jsonrainbow#39 to grid jsonrainbow#15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants