-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix output relative locations to be consistent #938
Fix output relative locations to be consistent #938
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregsdennis my vague recollection here is that we wanted a pointer (not a fragment-encode pointer) because this value is not resolvable as any sort of URI. It's a pointer that works on that transitively included in-memory bundle thing that I filed #935 to come up with a name for after @karenetheridge asked about it and @Relequestual said we should name.
We define the thing this pointer expresses as the "validation path" in the section on static and dynamic scopes, but don't really say anything about URIs vs raw pointers.
Anyway, I'm pretty sure from my point of view the raw pointer was intentional and the example showing a fragment is an error, because there's no base URI against which that fragment could be resolved, but I could possibly be sold on the other way around.
That response, then, might imply that the instance location shouldn't be fragment-encoded either. Another thought: Since all documents can technically have an ID, even if it's the URL used to load the JSON (per RFC 3986), I'm not bothered by the fact that we could represent both keywordLocation and instanceLocation as URIs in fragment-encoded form. |
I agree that instanceLocation and keywordLocation should be raw json pointers, not fragments -- i.e. (edit: note that absoluteKeywordLocation isn't even an "absolute URI" as per the definition in the URI RFC -- because it will likely have a fragment. It should always have a scheme and path though -- assuming that there is a base uri associated with the overall document, which isn't always true as @gregsdennis says below). |
This is not true if a schema is merely defined in code. My validator does assign a document location for I agree that this was intentionally a raw pointer. The idea is that it is intended to identify the path that was taken to get to the location. As to whether it includes a "#"... I have no big feels since I'm not sure I understand the difference. |
By “can technically have an ID”, I meant that either one is determined from the algorithm in RFC 3986 or one is assigned. I’m not saying that “there necessarily exists an ID defined for that document.” I’m not attached to either form, but my point is:
I’m not attached to either form, because URI fragment syntax form and raw JSON pointers are interchangeable, according to the JSON Pointer RFC. (I’m preempting those who’ll jump on this asking about off-topic absolute URIs: we’re just talking about the non-absolute keyword and instance locations and their URI fragment or JSON Pointer forms.) I’ll leave it up to those that wrote the spec to explain their true desires and I’ll update this PR accordingly. |
ec5fa6f
to
df7a035
Compare
I just updated with new language to use just "JSON pointers" for both keyword locations and instance locations. I also updated the examples. |
The schema in |
Including a More philosophically, having a fragment implies that there's a base URI onto which you can tack that fragment to produce a full URI that can be used the way URIs are always used. We could potentially figure out how to define such a base URI but it seems unnecessary. @karenetheridge So:
|
I just added a new commit with the output schema change to "json-pointer" from "uri-reference". |
I think we need one more fix to the schema, due to:
that is, absoluteKeywordLocation should be |
@ssilverman this looks like what you say it is. I'll defer to @gregsdennis on whether this is the desired option since this is really his area. Let me know if any of my answer about fragments vs non-fragment pointers is unclear, I'd be happy to write out examples or whatever.
|
I think that "absoluteKeywordLocation" should stick with being a URI type because it's my belief that that's its intended purpose, or maybe even change to "uri-reference" if it's just going to hold a JSON Pointer (as a fragment), but I'm not attached. It's certainly handy to have "absoluteKeywordLocation" be a "URI" type in code. I'll also note, for posterity and for any future readers reading this PR, that "URI", according to RFC 3986 (https://tools.ietf.org/html/rfc3986#appendix-A), is not the same as "absolute URI"; it's a superset.
|
If "keywordLocation" will always be a json pointer and "absoluteKeywordLocation" will always be a uri, then one (or both?) of these properties should maybe be renamed, so it is more clear that one is a URI and the other is not. We cannot say that absoluteKeywordLocation is an absolute URI because we may not have a base URI to resolve against (for example schemas being passed in directly into the implementation, not from an external source. #849 is a good place to spiral off discussion on this). (We also don't even have a format defined for "absolute-uri", which I've always thought was an interesting oversight.) |
I've thought about this more and I think that a polymorphic type here would be a bad idea, causing more work for downstream consumers. Instead, it should always be a URI -- and if we don't have an (absolute) base uri to resolve it against, then it would just be a fragment-only URI reference (e.g. "#/foo/bar/baz"). So, the "absolute" implication in the name is only on a best-effort basis, and it is not necessarily resolvable. But if the schema had an absolute $id in it, or a relative URI $id that we were able to resolve to form an absolute URI, then it would be absolute. |
9e96c21
to
bb7ebb2
Compare
bb7ebb2
to
cf59d7a
Compare
Force pushed to track master. |
This PR is good IMO; it would be great to see it merged. One more thing -- in output/schema.json, 'absoluteKeywordLocation' really ought to be a uri-reference, not a uri, since it is not guaranteed in the spec that a schema's base uri is a uri (as opposed to uri-reference) -- and often, especially with adhoc schemas, the base uri is simply ''. Otherwise, the spec should be tightened up to state that a schema's base uri must always be a uri (which will force implementations to use a default value, say |
I'm on mobile so sorry for the lack of references. @karenetheridge Regarding your last comment on the base uri is not always defined. When it's not defined by the user, the reference we use which defines how you determine the base uri, states that if it's not provided, it is application defined (after a few considerations first). In effect applications must provide a default in order to properly do URI resolutions. I've been round this roundabout a few times. What isn't clear is if such a base uri should be exposed to the end user or not. Edit: reading the other comments, I'm echoing what others have said 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
That's the thing.. in previous drafts, the base uri wouldn't necessarily ever be exposed to the user, but now we mandate a format for errors which contains absoluteKeywordLocation, which is going to involve the base uri for every schema that doesn't specify its own absolute |
http://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.3.2 If the user sets an
OK. I'd like to avoid this potential confusion if possible. And I feel it is possible. In terms of the whole "absolute URI" vs "absolute-uri", what we mean is an "aboslute location using the canonical URI and URI fragment". I think not giving it a name beyond an "aboslute keyword location URI" is preferable to having the potential confusion over "aboslute URI" and "absolute-uri". @karenetheridge Would you mind creating a new issue for this please? Then we can merge and resolve this PR. |
If there is no $id, there is no canonical uri (other than a base of '' and then a json pointer fragment - but that's still going to be different than keywordLocation if a $ref was traversed), but my key point here is an $id does not have to be absolute. I may declare my schema with an $id of "myschema.json" and have a $ref to "otherschema.json", and an error occuring at that $ref ought to be identified with otherschema.json. But these aren't absolute uris, despite output/schema.json insisting that absoluteKeywordLocation is
I'm not sure I understand which direction that issue should be going in. My understanding is that we define
#2 is much more straightforward; #1 has wider ramifications for implementors (what happens if we are parsing a schema and we encounter a relative uri in $id and we have no absolute base uri with which to resolve it?) And (back to my first paragraph) even if there is no $id in the schema, absoluteKeywordLocation is still different than keywordLocation if there were any $ref traversals, and knowing the short path in the schema where the error occured is still valuable. e.g. imagine a deeply recursive schema where the same $ref is traversed multiple times to arrive at the final schema location -- keywordLocation will be '..../$ref/.../$ref/.../$ref/...' which is pretty awful for a user to parse to figure out what subschema the error actually occured at. In my implementation, I always include absoluteKeywordLocation in the error output whenever a $ref has been traversed (that is, whenever keywordLocation != absoluteKeywordLocation), and in ad-hoc schemas this is just a json pointer fragment. |
The whole point of It is true that |
FWIW, I think this is horrible. It means that implementations must either force the caller to provide an absolute base uri for ad-hoc schemas, or to use a default that will just be a placeholder and make no real sense. I doubt that many implementations enforce this right now. |
This is literally how everything that uses URIs works. There is a whole section on base URI resolution covering all possible cases in RFC 3986 (which I think you know). We're not doing anything but referencing totally normal behavior. If you have no need for a real base URI, then a non-meaningful placeholder is in fact the most appropriate thing. Why force people to choose something when all that it's needed for is output? The |
But it's not needed for output. e.g. what's wrong with this? schema:
evaluated against data
the absoluteKeywordLocation provides value on top of keywordLocation, and |
100% what @handrews said in his last comment above.
@karenetheridge What output do you get if you reference another schema document or you have an embedded schema resource?
Consumers of the output shouldn't have to do URI resolution to find the keyword location. Downstream tools may mandate that the schema has an |
The only hard part is when the URI in absKeywordLocation is from an $id in the middle of the document, so you still have to hunt for the location. I like to track the ID and the base “loading” URI of the enclosing document. Also, I’ve chosen in my validator to force the user to specify at least a default base URI for each loaded schema (which is likely the retrieval URL). |
To some degree that's a schema design problem. You can write unmaintainable garabage in any system 😝 But most people will not be throwing |
In all cases, |
> > this corrects an oversight from the changes made in json-schema-org#938.
this corrects an oversight from the changes made in json-schema-org#938.
No description provided.