-
-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Schema Validation #2238
JSON Schema Validation #2238
Conversation
Just wanted to say thanks for a really well-thought PR! It's great you have already included documentation. Looking forward to leaving the "draft" state :) |
I let myself go through the review initially, hope you don't mind :) |
@malarzm Thanks ! I just wanted to increase the coverage before leaving the draft stage, but overall it was almost ready. I don't have much time this week but I will look into your review very soon. Thank you again :) |
Hi, thanks again @malarzm for the draft review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thank you for taking the time to work on such a feature, I really appreciate the thought you've put into this.
I've left a few comments below from a first reading I did a little while back, but wanted to also discuss how we can make use of this feature. I see two general ways on how users might want their schema definitions:
- Set manually as implemented in this PR
- Automatically determined based upon mapping
The second is definitely more complex and IMO way outside the scope of the implementation. However, I believe that we should keep it in mind when defining how to map the feature. The current mapping might work with an automatic JSON schema generation, for example by omitting the jsonSchema
property of the annotation, or by omitting the validation-json-schema
tag in XML mappings. I don't think there's anything to change, but I wanted to bring it up if only to solicit feedback.
Speaking of annotations, I'm a little bit worried about using a JSON string in annotations. I know the idea is to be able to copy/paste the schema from somewhere else, but this point is moot if people need to quote it. Looking at it, JSON schema is (almost) compatible with the annotations, so that you could do the following:
/**
* @ODM\Document
* @ODM\Validation(
* jsonSchema={
* "required": {"name"},
* "properties": {
* "name": {
* "bsonType": "string",
* "description": "must be a string and is required"
* }
* }
* },
* action=ClassMetadata::SCHEMA_VALIDATION_ACTION_WARN,
* level=ClassMetadata::SCHEMA_VALIDATION_LEVEL_MODERATE,
* )
*/
class JsonSchemaValidated
The only change needed to make the JSON schema valid for an annotation is to change square brackets for arrays to curly brackets, as doctrine/annotations (currently) does not support using square brackets for arrays (this was suggested in doctrine/annotations#122 but closed as 2.0 was thought to make this easier but it never materialised).
However, the advantage of pasting the schema in annotations is moot if we look at PHP 8 attributes (which will eventually replace annotations):
#[ODM\Document()]
#[ODM\Validation(
jsonSchema=[
"required" => ["name"],
"properties" => [
"name" => [
"bsonType" => "string",
"description" => "must be a string and is required"
]
]
],
action=ClassMetadata::SCHEMA_VALIDATION_ACTION_WARN,
level=ClassMetadata::SCHEMA_VALIDATION_LEVEL_MODERATE,
])
class JsonSchemaValidated
In that case, pasting a string may be easier since we can single-quote it (or maybe even use HEREDOC?). Just wanted to check if you've given this some thought to make it a bit more user-friendly.
One concern I did have was that any validation errors may lead to a broken UnitOfWork that could then cause issues. This would be similar to an issue we recently found if we were to implement transaction support, but on second thought this doesn't seem to be an issue any more than it already is: a validation error would be comparable to a duplicate key exception, leading to a situation we already know and are aware of. I'll do another review of this PR and focus on test cases to make sure everything is covered.
@alcaeus Thank you very much for that detailed PR review. I will work on your suggestions as soon as I get some time off during the following days. I agree, we should definitely keep in mind the fact that we could generate that schema from metadata. That's what I was resolved to try and do when I opened #2233. But it was for sure a good suggestion by @malarzm to implement it manually first. About the annotation's specs, this is a tough choice. I actually started to implement this feature the "array way" first (without strings). I chose to fallback to the "string way" in the process of implementing it as I found some drawbacks. I'll try to summarize my thought process with a table (more details below):
The "array way"It would be great to implement it this way. As you said:
; and we wouldn't need to However I found that this dependency was still needed for the In the end the Moreover, replacing square brackets with curly braces for doctrine/annotations (PHP7 and below) is not an easy task that could be achieved with a simple search/replace as square brackets in JSON string values shouldn't be replaced, for instance: {
"regex": "[0-9]+"
} should not be transformed to: {
"regex": "{0-9}+"
} The "string way"It has a major flaw: we need to escape double quotes. However it is as simple as a search/replace so it is acceptable in my opinion. Plus, as you mentioned, we probably won't need to escape in PHP8 anymore, which is great. I haven't got to test the new PHP8 Attributes still tho so I'm not 100% sure that they accept HEREDOC, but I guess they should. $jsonSchema vs. whole validatorFurthermore, but this is probably far-fetched, I was trying at that time to interface the whole
For instance: {
$jsonSchema: { ... },
$or: [
{ phone: { $type: "string" } },
{ email: { $regex: /@mongodb\.com$/ } },
{ status: { $in: [ "Unknown", "Incomplete" ] } }
]
} This could be done in doctrine if we decide to extend the feature it in the future. However those validators are not JSON typed. As I understand it they could be composed of any Javascript type (see the The validation would have to be passed as a string in the annotation (no other choice here). This is when I fell back to the "string way". But the validation content would also need to be converted to PHP with a Javascript object parser of some sort... That's when I understood it wasn't an easy task and I should focus on Anyways, that's how I decided to use strings in the end. Sorry for the long explanation, just wanted to expose my whole thought process. But it could still be reverted, you tell me :) Validation errorsI actually didn't think too much about it as I thought the exception raised by the mongodb driver was enough, like a unique index exception. Feel free to guide me if there is some test I could add to make sure that this is enough. Thank you again for taking the time to review this PR! |
branch rebased on master |
rebased (fixed conflicts) |
@alexandre-abrioux sorry for the delay here. I'll revisit this PR in the coming days. |
@alcaeus Thanks! Just finished rebasing to upstream, I got an error on the |
Taking a short look I wasn't able to reproduce this locally, but I'll take another look when I return to review this. |
@alcaeus rebased the branch off of 2.3.x, and edited the doc:
to
Tests are green again :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, I'm really sorry for the time it has taken me to review this. Second of all, thank you so much for sticking with this and not giving up on me. Third of all, thank you for taking the time to add tests for the update command which was previously untested.
From a functional standpoint this PR is good to go, but I still have some reservations about the mapping. I'm not convinced that the annotation mapping is the best we can do, and I also think that we can make some minor improvements to the XML mapping to make it a little more contained. I'd appreciate your opinion on the suggestions, even if it seems like we're running in circles.
One more note, while looking through the changes I realised that this PR requires people to incorporate the schema for embedded documents into the documents that embed them. While this is not entirely optimal, I believe this is a sane choice for this feature, as it exposes low-level collection options. As a follow-up feature, we could think about a generator that is able to generate validation rules from other metadata (e.g. constraints from symfony/validator) and import validation rules for embedded documents.
Hi @alcaeus, Thank you for taking the time to review this, and for the very thorough and detailed response. I am going to implement the changes, but before I start digging, I have a suggestion regarding your point about the annotations. But I think we overlooked something. After re-reading all your comments, I thought later on: why not do it like this? /**
* @ODM\Document
* @ODM\Validation(
* jsonSchema=JsonSchemaValidated::JSON_SCHEMA,
* action=ClassMetadata::SCHEMA_VALIDATION_ACTION_WARN,
* level=ClassMetadata::SCHEMA_VALIDATION_LEVEL_MODERATE,
* )
*/
class JsonSchemaValidated
{
public const JSON_SCHEMA = <<<'EOT'
{
"required": ["name"],
"properties": {
"name": {
"bsonType": "string",
"description": "must be a string and is required"
}
}
}
EOT;
/** @ODM\Id */
private $id;
/** @ODM\Field(type="string") */
private $name;
} I'm really convinced it's the best of both worlds. We would get the same experience with both the XML driver and the Annotation driver: just copy-paste your schema, and you're done! I just quickly tested it in the test suite and that works. What do you think about this? |
I like it - this could also be modified to include both worlds: if the schema given to the annotation is a string, assume it's JSON and try to decode it. Afterwards, we're always expecting a valid schema. This would allow people to either copy-paste the schema from an external source without any modification at all using HEREDOC, or if they want to create the schema by writing up a PHP array they can do that as well. Feel free to implement only one of these methods, I'd be happy to expanding support in a later PR. |
Hi @alcaeus I'm very sorry for the time it took me to take action after our last conversation, I got caught in other projects. I finally implemented the requested changes:
<document name="TestDocuments\JsonSchemaValidatedDocument">
<json-schema-validation action="warn" level="moderate">
{
"required": ["name"],
"properties": {
"name": {
"bsonType": "string",
"description": "must be a string and is required"
}
}
}
</json-schema-validation>
</document>
/**
* @ODM\Document
* @ODM\Validation(
* jsonSchema=JsonSchemaValidated::JSON_SCHEMA,
* action=ClassMetadata::SCHEMA_VALIDATION_ACTION_WARN,
* level=ClassMetadata::SCHEMA_VALIDATION_LEVEL_MODERATE,
* )
*/
class JsonSchemaValidated
{
public const JSON_SCHEMA = <<<'EOT'
{
"required": ["name"],
"properties": {
"name": {
"bsonType": "string",
"description": "must be a string and is required"
}
}
}
EOT;
/** @ODM\Id */
private $id;
/** @ODM\Field(type="string") */
private $name;
}
I only implemented the JSON schema provided by the the user as a HEREDOC string. As you stated before, we can expand the feature to a schema provided as a PHP array in a later PR. Feel free to review the code one last time if needed! Thank you again for the time spent on this :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @alexandre-abrioux, and thank you for your patience with me while I struggled to find time to review this. I'll release this in the coming days. I really appreciate the time you took to build this great feature!
@alcaeus pinged me to take a look at this the other day and I just got through reading the PR from top to bottom. The first thing that stood out to me was that this was focused only on I see this was previously alluded to in #2238 (comment):
The Javascript types you're seeing there (e.g. regex) are really just due to documentation using the MongoDB shell for its example. If this page showed examples in multiple languages/drivers (as some others do), we might expect to see native language types used (or classes like As a general rule, I think it makes sense to use Extended JSON whenever we're asking users to provide JSON that needs to be converted to MongoDB data. That syntax is both valid JSON and capable of expressing all special BSON types. With respect to this PR, I think it'd make sense to rename "json-schema-validation" to "schema-validation" (consistent with the MongoDB feature name) and then solicit the following options:
This would require anyone using a JSON schema to provide the top-level |
Ah, that's a good point. @alexandre-abrioux I can make those changes if you want so you don't have to do another pass. Using extended JSON for this makes sense, as it saves us from re-declaring a good amount of syntax in the XSD (like I did for partial index filters). |
@jmikola Awesome! That is exactly what I had in mind when I first though about this feature, but I had no knowledge of the extended JSON specification, so couldn't wrap my head around how to solve this easily! @alcaeus I'll take a shot at it right away! I'd love contribute and finish this feature if I can. Just to be clear, this is what I'm headed to, using MongoDB's documentation example and using the extended JSON specification for the regex part. Please confirm me that's what you had in mind (also consider the annotation name and properties) and I'll make the changes. For the XML part: <document name="TestDocuments\SchemaValidatedDocument">
<schema-validation action="warn" level="moderate">
{
"$jsonSchema": { ... }
"$or": [
{ "phone": { "$type": "string" } },
{ "email": { "$regex": { "$regularExpression" : { "pattern": "@mongodb\.com$", "options": "" } } } },
{ "status": { "$in": [ "Unknown", "Incomplete" ] } }
]
}
</schema-validation>
</document> For the annotation part: /**
* @ODM\Document
* @ODM\Validation(
* validator=SchemaValidated::VALIDATOR,
* action=ClassMetadata::SCHEMA_VALIDATION_ACTION_WARN,
* level=ClassMetadata::SCHEMA_VALIDATION_LEVEL_MODERATE,
* )
*/
class SchemaValidated
{
public const VALIDATOR = <<<'EOT'
{
"$jsonSchema": { ... }
"$or": [
{ "phone": { "$type": "string" } },
{ "email": { "$regex": { "$regularExpression" : { "pattern": "@mongodb\\.com$", "options": "" } } } },
{ "status": { "$in": [ "Unknown", "Incomplete" ] } }
]
}
EOT;
// rest of the class code...
} |
Yep, that looks lovely! @jmikola if I'm not mistaken we should use |
Implemented the changes, looks good to me! Notes:
Don't hesitate to make edits the documentation if needed. Thanks! |
Correct.
I think it may make more sense to immediately chain |
@jmikola Thanks, it makes sense! I was afraid the PHP value wouldn't get serialized properly in I made the appropriate edits. However the static analysis test seems to fail now, but on part of the code I didn't modify. Would it be related to the recent upgrade of |
I'll take a look and fix this before merging. Again, thanks for your work here! |
Serialization support for the driver's BSON classes was implemented in 1.2.0 (PHPC-460), so there should be nothing to worry about there. 👍 |
Summary
This pull request introduces a new
@Validation
class annotation containing three options:jsonSchema
action
level
; corresponding to the three options of MongoDB collections:
validator.$jsonSchema
validationAction
validationLevel
The JSON Extension
The MongoDB Driver requires an array for the
validator
option of thecreateCollection
method andcollMod
command.We therefore have to decode the user-defined JSON schema, thus the need to add the
ext-json
to therequire-dev
section of thecomposer.json
.The
ext-json
is still not required for users to install the package, it is only added to the suggestion list.Adding a new Validation annotation vs. adding new options to the Document annotation
When I started working on this feature I chose to add three new properties to the
@Document
annotation:validationJsonSchema
validationAction
validationLevel
But I found that it would be convenient for the
validationAction
andvalidationLevel
options to be enumerations as it would help the user to realize directly during the mapping process that they made a mistake in filling those.A wrong value would trigger
doctrine/annotations
's inner logic and raise anAnnotationException
.When used with Symfony for instance users wouldn't have to wait until execution of
odm:schema:update
command to realize they made a mistake, they would know right away on cache clear which happens on every request by default on dev environment.Using enumerations was only possible by creating a whole new annotation class as the
@Document
annotation has a constructor and therefore its properties' annotations are not parsed, see https://github.com/doctrine/annotations/blob/e2623fd97c136cc7a18b467da8d8331c01de051c/lib/Doctrine/Common/Annotations/DocParser.php#L548To sum up adding
@Enum
on the annotation properties was only possible by creating a new class, thus the choice of a new@Validation
annotation. It makes the annotation-driven configuration similar to the XML-driven configuration which also contains enumerations.The validationAction / validationLevel default values issue
MongoDB version ≥ 4.4.0 allows to reset the
validationAction
andvalidationLevel
options of the collections to their default values by passing empty strings to the respective arguments of thecollMod
command.However I found that versions 4.2.0 and bellow do not care about empty strings in the
validationAction
/validationLevel
arguments of thecollMod
command. The command options will just get ignored and the collection options will not be modified.In order to reset those options during
odm:schema:update
, in case for instance the user removed theaction
/level
options of the@Validation
annotation, we need to have default values defined for those options in Doctrine.Those default values are documented and are equal the default values defined by MongoDB.
However if the MongoDB team decides to change those default values in the future it could cause confusion for our users, but it's the only solution I found to be able to support MongoDB versions < 4.4.0.
Documentation
I updated the
annotations-reference
and added a new section on thevalidation-of-documents
cookbook, I hope that's OK. Feel free to proofread it and recommend some adjustments as English is not my mother tong.TODO