-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
$schema can change across embedded resources #914
Conversation
jsonschema-core.xml
Outdated
resulting behavior is implementation-defined. | ||
The "$schema" keyword SHOULD be used in the document root schema object, | ||
and MAY be used in the root schema objects of embedded schema resources. | ||
It MUST NOT appear in subschemas. If absent from the document root schema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the wording of "subschemas" is a bit confusing to me. if I can attempt to clarify with an example
$schema: draftN
$id: root
items:
$schema: draftM
$id: items
items is a root schema object because it's got an $id. but is items no longer a "subschema" of the root? I feel like saying it's not a subschema isn't consistent with how the term subschema is used in the rest of the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "non-root schemas"? Or just "other schemas"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(IMO) "subschema" only has context when in reference to another schema. the schema at id "items" is a subschema of the schema at id "root". The schema at id "root" is not a subschema of anything (it has no parent schema).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically, we've used "subschema" to indicate containment but not reference. Referenced schemas have not been historically labelled "subschemas."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it comes down to how we define a "schema document." Is that the specific file, and that's it? Or is it the file, and all of its external references? (This pertains to the change below as well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Some implementations provide an interface to extract these - either as multiple documents or "bundled" together in one (potentially renaming conflicting $refs if needed). In one of my web apps I do this in a GET /json_schema/:schema_name
endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karenetheridge yeah, the bundling use case was ultimately what we decided use when figuring out what to do with $id
(splitting the $anchor
case out and cutting a bunch of nonsensical but syntactically legal values). And that led to the idea of $id
as identifying resources as opposed to just random otherwise unremarkable schema objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be useful to have a name for that "document plus all external references, transitively" concept
I would call this a "trancluded de-referenced bundle".
- Transclusion is what is done to the schemas
- De-referenced is the result of the process
- Bundle is the end product descriptor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Relequestual I'll think on this. Does it need to go in now or can we file an issue for this terminology? If we adopt it (and I'm cautiously supportive), it should probably go in everywhere and I'd rather not add all of that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I've gone back over all of this. I am about to push a commit to address @notEthan's original question about the usage of "subschema" (I agree it is unclear).
For the transcluded de-reference bundle thing, I have filed issue #935. Note that the discussion involved a lot more than the bundle use case, so it really needs to be discussed separately from this PR. We can add more terminology later, assuming my most recent commit addresses the actual confusion in the PR text.
this seems to contradict the resolution of the schemas which describe a schema, as defined by the metaschema. oof, this is hard to pick all the right words to describe properly. I'll try not to get it too wrong. referring to this schema with two specifications (draftN, draftM) describing bits of it: $id: bar
$schema: draftN
$defs:
foo:
$id: foo
$schema: draftM when I refer to the object at #/$defs/foo - before I know what it is or that it's even a schema - I start at the root (#), described by the draftN metaschema. in there I see that #/properties/$defs/additionalProperties is a reference resolving to draftN itself, so the schema #/$defs/foo is an instance of the draftN metaschema. but I have a $schema saying it is draftM. the metaschema has been made incorrect. maybe the metaschema needs to change in some manner to allow either a schema which instantiates the metaschema itself, or a thing with a $schema which does not instantiate the metaschema. $id: draftN
properties:
$defs:
additionalProperties:
oneOf:
- $recursiveRef: '#'
- required: ['$schema'] that's not perfect; it doesn't actually say what kind of thing the object with a $schema is. but I think it's at least an improvement in that it's not giving an incorrect reference to a schema (the draftN metaschema) which does not apply to the instance (the draftM schema). and you can't do a $ref to the $schema since that's in the instance (the schema instantiating the metaschema), and schemas (or metaschemas) can't reference instance data. |
except, of course, the keyword $schema is only defined for a schema, so if the subschema takes the second oneOf option in my weird modification above (where it has a $schema but isn't an instance of the metaschema), it's not recognized as a schema at all and $schema has no meaning. I think I must retract that idea ... in order for the implementation to recognize that subschema #/$defs/foo is an instance of the draftM metaschema, it first has to be recognized as an instance of the draftN metaschema, and then change to no longer be that. |
@notEthan What you learn about ALL meta-schemas, even if they don't explicitly list it, ALWAYS include the core vocabulary. This is in part because the core vocabulary is the bootstrapping vocabulary. Processing a schema should always start with a check for
If that doesn't help, think of it this way: We defined If we're supporting changing meta-schemas while following a |
Two counter-points to this:
|
If you're at the document root you don't need those rules to figure out if it's a resource root because you already know it's the document root- that's why So techincally not all schemas, true 😛 Regarding Otherwise I'd say implementations would have to opt-in to supporting |
@karenetheridge thinking about |
Hang on... is this explicitly allowing @notEthan's comment suggests the former, and that worries me. I think a single resource should follow a single schema draft, allowing |
@gregsdennis it can be used in the resource root, which can be "internal" in the sense of not being a document root. But an embedded resource is still a different resource, it's just stuffed into the document. Here is the use case: I have some large number of schemas. They look like this (assume that they {
"$id": "https://example.com/schema/aaa",
"$schema": "https://json-schema.org/draft/2020-06",
...
} {
"$id": "https://example.com/schema/bbb",
"$schema": "http://json-schema.org/draft-06",
...
} {
"$id": "https://example.com/schema/ccc",
"$schema": "http://json-schema.org/draft-07",
...
} etc. I want to bundle them in a single document for ease of distribution, which (as @karenetheridge notes, is something that there are tools for now). The result would be: {
"$id": "https://example.com/schema/bundled",
"$schema": "https://json-schema.org/draft/2020-06",
"$defs": {
"aaa": {
"$id": "https://example.com/schema/aaa",
"$schema": "https://json-schema.org/draft/2020-06",
...
},
"bbb": {
"$id": "https://example.com/schema/bbb",
"$schema": "http://json-schema.org/draft-06",
...
},
"ccc": {
"$id": "https://example.com/schema/ccc",
"$schema": "http://json-schema.org/draft-07",
...
}
}
} This should work. The presence of an Note, however, that this is NOT VALID: {
"$id": "https://example.com/schema/whatever",
"$schema": "https://json-schema.org/draft/2020-06",
"properties": {
"foo": {
"$schema": "http://json-schema.org/draft-07",
...
}
}
} In this example, there is no I was fairly sure we had an extensive conversation around this stuff but admittedly it would have been quite a while ago. But it's all about the bundling use case. If we need to have the whole discussion on this again then we should do it in slack. PRs are not the place to debate fundamental direction- I wrote a PR because it had been settled. |
I think allowing {
"$id": "https://example.com/schema1",
"$schema": "http://json-schema.org/draft-06/schema#",
"type": "object",
"properties": {
"foo": {
"$id": "https://example.com/schema2",
"$schema": "http://json-schema.org/draft-07/schema#",
"if": "asdf"
}
}
} When you validate this schema against the draft-06 meta-schema, semantically, it's not a schema, it's just arbitrary JSON. The neither the inner |
@jdesrosiers People have in the past asked for a way to restrict the draft of the
Note that part of the point of #849 is to give a clear description of how to process schemas and meta-schemas. Which is why all of that sort of stuff has been pulled out of where it was scattered all over and consolidated into what's now section 9. So if we want to formalize stuff around crossing a resource boundary, that's where that goes. And to some degree I'm doing that anyway. If I can ever get back to that issue, which I've been trying to do for 2 weeks now. For reference, #850 is the issue for this change, and #808 is the |
The only reason meta-validation works in my implementation is because I modify the schemas when they are loaded to separate embedded schemas. The schema then get validated separately and there is no problem. But, if I validate that schema I gave before as the instance and meta-schema as the schema, it doesn't work as expected. That's why I don't think just giving guidance on how to process schemas sufficiently solves the problem. #918 Introduces a way for a schema to declare that a value is a schema without saying what kind of schema. It adds a new |
I've agreed with one suggested change. |
The push just now is a rebase to fix conflicts- nothing else has changed, waiting on Relequestual's feedback. |
@jdesrosiers I somehow didn't notice this before:
I'd actually say that's exactly how it should be processed. The embedding is essentially a... I dunno, ?transport layer? convenience. The real unit of schema-ness is the schema resource, which we didn't quiet settle on for 2019-09 b/c the schema resource idea appeared near the end of that process as a solution to other things. So that makes loading schemas a little more challenging, but once that is done, working with schema resources works just fine. In this way, the schema document with embedded schema resources is not, itself, really a schema. It's a package containing schemas, and the ideal option might be:
There's something in there about resolving relative Just brainstorming on this. |
I can understand @jdesrosiers concerns about meta schema validation. My feeling on this is we should add requirements to the following:
|
I think @jdesrosiers approach in #918 is interesting, but we need a LOT more time to flesh that out, and we have time pressures to deliver THIS draft sooner. |
@Relequestual at this point all of the small change requests have been addressed. Regarding the main conversation about how to handle a switched meta-schema:
JSON Schema has never, in any draft, required an error on an unrecognized meta-schema. As of 2019-09, you can cause an error through The key principle here is that We cannot have a scenario where bundling an external reference as an embedded schema resource changes the behavior from best effort ("I have no idea what this is but I'll pretend it's the standard core+validation and give it a shot") to an error.
I'm not entirely sure that I follow, and I think we should be having this debate in an issue so I'm going to file that. This PR is effectively blocked. |
$schema is now definitively resource-scoped rather than document-scoped, as crossing a resource boundary is the same as following a $ref to an external resource.
Yeah, I mean that should have been obvious. Of course. |
My feeling is @jdesrosiers broadly approved of the suggested change. This is a small change, identifying behaviour which was previously just 🤷♂️ (not defined), so I'm merging it. |
Closes #808, closes #850
$schema
is now definitively resource-scoped rather thandocument-scoped, as crossing a resource boundary is the same as
following a
$ref
to an external resource.