Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix output relative locations to be consistent #938

Merged

Conversation

ssilverman
Copy link
Member

No description provided.

Copy link
Contributor

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregsdennis my vague recollection here is that we wanted a pointer (not a fragment-encode pointer) because this value is not resolvable as any sort of URI. It's a pointer that works on that transitively included in-memory bundle thing that I filed #935 to come up with a name for after @karenetheridge asked about it and @Relequestual said we should name.

We define the thing this pointer expresses as the "validation path" in the section on static and dynamic scopes, but don't really say anything about URIs vs raw pointers.

Anyway, I'm pretty sure from my point of view the raw pointer was intentional and the example showing a fragment is an error, because there's no base URI against which that fragment could be resolved, but I could possibly be sold on the other way around.

@ssilverman
Copy link
Member Author

That response, then, might imply that the instance location shouldn't be fragment-encoded either.

Another thought: Since all documents can technically have an ID, even if it's the URL used to load the JSON (per RFC 3986), I'm not bothered by the fact that we could represent both keywordLocation and instanceLocation as URIs in fragment-encoded form.

@karenetheridge
Copy link
Member

karenetheridge commented May 23, 2020

I agree that instanceLocation and keywordLocation should be raw json pointers, not fragments -- i.e. "" and "/foo/bar/baz", not "#" and "#/foo/bar/baz". Only absoluteKeywordLocation should be a URI (and not a relative URI, either).

(edit: note that absoluteKeywordLocation isn't even an "absolute URI" as per the definition in the URI RFC -- because it will likely have a fragment. It should always have a scheme and path though -- assuming that there is a base uri associated with the overall document, which isn't always true as @gregsdennis says below).

@gregsdennis
Copy link
Member

gregsdennis commented May 23, 2020

all documents can technically have an ID

This is not true if a schema is merely defined in code. My validator does assign a document location for $ref dereferencing, but it doesn't assign an ID.

I agree that this was intentionally a raw pointer. The idea is that it is intended to identify the path that was taken to get to the location.

As to whether it includes a "#"... I have no big feels since I'm not sure I understand the difference.

@ssilverman
Copy link
Member Author

ssilverman commented May 23, 2020

By “can technically have an ID”, I meant that either one is determined from the algorithm in RFC 3986 or one is assigned. I’m not saying that “there necessarily exists an ID defined for that document.”

I’m not attached to either form, but my point is:

  1. All output examples in the spec use fragment syntax.
  2. The instance location is defined in the spec to MUST be in fragment form.
  3. The keyword locator is not defined in the spec to be in fragment form.
  4. The above comments imply a non-fragment form is desired.
  5. I’m trying to resolve these inconsistencies.

I’m not attached to either form, because URI fragment syntax form and raw JSON pointers are interchangeable, according to the JSON Pointer RFC. (I’m preempting those who’ll jump on this asking about off-topic absolute URIs: we’re just talking about the non-absolute keyword and instance locations and their URI fragment or JSON Pointer forms.) I’ll leave it up to those that wrote the spec to explain their true desires and I’ll update this PR accordingly.

@ssilverman ssilverman changed the title Fix output keyword relative location to be of URI fragment-encoded form Fix output relative locations to be consistent May 24, 2020
@ssilverman ssilverman force-pushed the output-keyword-relative branch 2 times, most recently from ec5fa6f to df7a035 Compare May 24, 2020 01:55
@ssilverman
Copy link
Member Author

I just updated with new language to use just "JSON pointers" for both keyword locations and instance locations. I also updated the examples.

@ssilverman ssilverman requested a review from handrews May 24, 2020 02:00
@karenetheridge
Copy link
Member

The schema in output/schema.json will need updating as well to show that instanceLocation and keywordLocation are no longer uri-references, but rather json pointers.

@handrews
Copy link
Contributor

@gregsdennis

As to whether it includes a "#"... I have no big feels since I'm not sure I understand the difference.

Including a # means it's a fragment, and therefore some escaping needs to be done to handle URI reserved characters. This is rarely needed in practice but it is a thing.

More philosophically, having a fragment implies that there's a base URI onto which you can tack that fragment to produce a full URI that can be used the way URIs are always used. We could potentially figure out how to define such a base URI but it seems unnecessary.

@karenetheridge absoluteKeywordLocation is not meant to reference RFC 3986's absolute-URI. It's analogous to absolute vs relative filesystem paths. Sadly, there are only so many words available, and not all of their usages agree anyway.

So:

  • If there is a reasonable base URI identifying a resource of a media type that accepts JSON Pointer fragments as its fragment syntax, then we can make that a URI fragment.
    • A JSON Schema resource meets this criteria (application/schema+json supports JSON Pointer fragments).
    • JSON instances do not (b/c application/json explicitly does not support JSON Pointer fragments).
    • An approach that would allow constructing a compound document that preserved the presence of $ref in paths was suggested in Allow "$ref" to take a schema, to make inlining easier? #779 but rejected, at least at the time.
  • Otherwise, we should use regular JSON Pointers.

@ssilverman
Copy link
Member Author

I just added a new commit with the output schema change to "json-pointer" from "uri-reference".

@karenetheridge
Copy link
Member

I think we need one more fix to the schema, due to:

Otherwise, we should use regular JSON Pointers.

that is, absoluteKeywordLocation should be { "type": "string", "anyOf": [ { "format": "uri" }, { "format": "json-pointer" } ] }.

@handrews
Copy link
Contributor

@ssilverman this looks like what you say it is. I'll defer to @gregsdennis on whether this is the desired option since this is really his area. Let me know if any of my answer about fragments vs non-fragment pointers is unclear, I'd be happy to write out examples or whatever.

keywordLocation could be a fragment since it's relative to an application/schema+json resource, but then we end up with a mix of stuff so maybe just making them all pointers is better.

@ssilverman
Copy link
Member Author

I think that "absoluteKeywordLocation" should stick with being a URI type because it's my belief that that's its intended purpose, or maybe even change to "uri-reference" if it's just going to hold a JSON Pointer (as a fragment), but I'm not attached. It's certainly handy to have "absoluteKeywordLocation" be a "URI" type in code.

I'll also note, for posterity and for any future readers reading this PR, that "URI", according to RFC 3986 (https://tools.ietf.org/html/rfc3986#appendix-A), is not the same as "absolute URI"; it's a superset.

  1. "Absolute URI" basically means "has a scheme, does not have a fragment, but may have anything else."
  2. "Relative reference" basically means "does not have a scheme, but may have anything else."
  3. "URI" means "absolute URI that may also have a fragment."

@karenetheridge
Copy link
Member

karenetheridge commented May 25, 2020

If "keywordLocation" will always be a json pointer and "absoluteKeywordLocation" will always be a uri, then one (or both?) of these properties should maybe be renamed, so it is more clear that one is a URI and the other is not.

We cannot say that absoluteKeywordLocation is an absolute URI because we may not have a base URI to resolve against (for example schemas being passed in directly into the implementation, not from an external source. #849 is a good place to spiral off discussion on this). (We also don't even have a format defined for "absolute-uri", which I've always thought was an interesting oversight.)

@karenetheridge
Copy link
Member

karenetheridge commented May 25, 2020

that is, absoluteKeywordLocation should be { "type": "string", "anyOf": [ { "format": "uri" }, { "format": "json-pointer" } ] }.

I've thought about this more and I think that a polymorphic type here would be a bad idea, causing more work for downstream consumers. Instead, it should always be a URI -- and if we don't have an (absolute) base uri to resolve it against, then it would just be a fragment-only URI reference (e.g. "#/foo/bar/baz"). So, the "absolute" implication in the name is only on a best-effort basis, and it is not necessarily resolvable. But if the schema had an absolute $id in it, or a relative URI $id that we were able to resolve to form an absolute URI, then it would be absolute.

@ssilverman ssilverman force-pushed the output-keyword-relative branch from bb7ebb2 to cf59d7a Compare June 11, 2020 14:52
@ssilverman
Copy link
Member Author

Force pushed to track master.

@karenetheridge
Copy link
Member

karenetheridge commented Jul 16, 2020

This PR is good IMO; it would be great to see it merged.

One more thing -- in output/schema.json, 'absoluteKeywordLocation' really ought to be a uri-reference, not a uri, since it is not guaranteed in the spec that a schema's base uri is a uri (as opposed to uri-reference) -- and often, especially with adhoc schemas, the base uri is simply ''.

Otherwise, the spec should be tightened up to state that a schema's base uri must always be a uri (which will force implementations to use a default value, say http://localhost:1234 but perhaps there are better defaults) in the absence of any direction from the user.

@Relequestual
Copy link
Member

Relequestual commented Jul 17, 2020

I'm on mobile so sorry for the lack of references.

@karenetheridge Regarding your last comment on the base uri is not always defined. When it's not defined by the user, the reference we use which defines how you determine the base uri, states that if it's not provided, it is application defined (after a few considerations first). In effect applications must provide a default in order to properly do URI resolutions. I've been round this roundabout a few times. What isn't clear is if such a base uri should be exposed to the end user or not.

Edit: reading the other comments, I'm echoing what others have said 😅

Copy link
Member

@Relequestual Relequestual left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@karenetheridge
Copy link
Member

What isn't clear is if such a base uri should be exposed to the end user or not.

That's the thing.. in previous drafts, the base uri wouldn't necessarily ever be exposed to the user, but now we mandate a format for errors which contains absoluteKeywordLocation, which is going to involve the base uri for every schema that doesn't specify its own absolute $id (and $id is permitted to be a uri-reference, not uri, even in the top level, and it may not even exist at all).

@Relequestual
Copy link
Member

Relequestual commented Jul 18, 2020

http://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.3.2
This information MAY be omitted only if either the relative location contains no references or **if the schema does not declare an absolute URI as its "$id"**.

If the user sets an $id, then they get that back in the absoluteKeywordLocation URI. If they haven't then they MAY get nothing, or they MAY get the application defined URI. My feeling is If they haven't set an $id, then they probably don't care about this URI in the output result, and if they DO, they should set an $id.


@karenetheridge absoluteKeywordLocation is not meant to reference RFC 3986's absolute-URI. It's analogous to absolute vs relative filesystem paths. Sadly, there are only so many words available, and not all of their usages agree anyway.

OK. I'd like to avoid this potential confusion if possible. And I feel it is possible.

In terms of the whole "absolute URI" vs "absolute-uri", what we mean is an "aboslute location using the canonical URI and URI fragment". I think not giving it a name beyond an "aboslute keyword location URI" is preferable to having the potential confusion over "aboslute URI" and "absolute-uri".

@karenetheridge Would you mind creating a new issue for this please? Then we can merge and resolve this PR.

@karenetheridge
Copy link
Member

If the user sets an $id, then they get that back in the absoluteKeywordLocation URI. If they haven't then they MAY get nothing, or they MAY get the application defined URI. My feeling is If they haven't set an $id, then they probably don't care about this URI in the output result, and if they DO, they should set an $id.

If there is no $id, there is no canonical uri (other than a base of '' and then a json pointer fragment - but that's still going to be different than keywordLocation if a $ref was traversed), but my key point here is an $id does not have to be absolute. I may declare my schema with an $id of "myschema.json" and have a $ref to "otherschema.json", and an error occuring at that $ref ought to be identified with otherschema.json. But these aren't absolute uris, despite output/schema.json insisting that absoluteKeywordLocation is format: uri not format: uri-reference.

@karenetheridge Would you mind creating a new issue for this please? Then we can merge and resolve this PR.

I'm not sure I understand which direction that issue should be going in. My understanding is that we define format: uri as "a uri with a scheme" (as per the URI RFC), and therefore absoluteKeywordLocation requires that the uri have a scheme, which is inconsistent with $id defined as format: uri-reference (a uri that does not require a scheme). There are two main ways we can resolve this:

  1. be clear in the spec that the fully resolved uri of a schema resource should be absolute (that is, if the $id is a uri-reference, it needs to have an absolute base uri provided in some way to resolve it to an absolute uri)
  2. alter output/schema.json so absoluteKeywordLocation is format: uri-reference

#2 is much more straightforward; #1 has wider ramifications for implementors (what happens if we are parsing a schema and we encounter a relative uri in $id and we have no absolute base uri with which to resolve it?)

And (back to my first paragraph) even if there is no $id in the schema, absoluteKeywordLocation is still different than keywordLocation if there were any $ref traversals, and knowing the short path in the schema where the error occured is still valuable. e.g. imagine a deeply recursive schema where the same $ref is traversed multiple times to arrive at the final schema location -- keywordLocation will be '..../$ref/.../$ref/.../$ref/...' which is pretty awful for a user to parse to figure out what subschema the error actually occured at. In my implementation, I always include absoluteKeywordLocation in the error output whenever a $ref has been traversed (that is, whenever keywordLocation != absoluteKeywordLocation), and in ad-hoc schemas this is just a json pointer fragment.

@handrews
Copy link
Contributor

The whole point of absoluteKeywordLocation is to be a full URI. "absolute" was a misnomer here because this URI will often have a fragment (I think all of us got confused by that at one point or another).

It is true that $id can take a URI-reference value, but it MUST resolve to an absolute URI (and in this case we really do mean absolute), and that is what should be used, plus a fragment if needed, in absoluteKeywordLocation.

@karenetheridge
Copy link
Member

It is true that $id can take a URI-reference value, but it MUST resolve to an absolute URI

FWIW, I think this is horrible. It means that implementations must either force the caller to provide an absolute base uri for ad-hoc schemas, or to use a default that will just be a placeholder and make no real sense. I doubt that many implementations enforce this right now.

@handrews
Copy link
Contributor

@karenetheridge

FWIW, I think this is horrible. It means that implementations must either force the caller to provide an absolute base uri for ad-hoc schemas, or to use a default that will just be a placeholder and make no real sense. I doubt that many implementations enforce this right now.

This is literally how everything that uses URIs works. There is a whole section on base URI resolution covering all possible cases in RFC 3986 (which I think you know). We're not doing anything but referencing totally normal behavior.

If you have no need for a real base URI, then a non-meaningful placeholder is in fact the most appropriate thing. Why force people to choose something when all that it's needed for is output? The about: URI scheme is suitable for this.

@karenetheridge
Copy link
Member

But it's not needed for output. e.g. what's wrong with this?

schema:

{
  $defs: {
    base: { type: string },
    nested: {
      anyOf: [
        { $ref: '#/$defs/base' },
        {
          type: object,
          properties: {
            foo: { $ref: '#/$defs/base' }
          }
        }
      ]
    }
  },
  $ref: '#/$defs/nested',
}

evaluated against data { foo: { foo: true } } gives these errors (this is copied
directly from my implementation):

[
  {
    instanceLocation: '',
    keywordLocation: '/$ref/anyOf/0/$ref/type'
    absoluteKeywordLocation: '#/$defs/base/type',
    error: 'wrong type (expected string)',
  },
  {
    instanceLocation: '/foo',
    keywordLocation: '/$ref/anyOf/1/properties/foo/$ref/type'
    absoluteKeywordLocation: '#/$defs/base/type',
    error: 'wrong type (expected string)',
  },
  {
    instanceLocation: '',
    keywordLocation: '/$ref/anyOf/1/properties'
    absoluteKeywordLocation: '#/$defs/nested/anyOf/1/properties',
    error: 'not all properties are valid',
  },
  {
    instanceLocation: '',
    keywordLocation: '/$ref/anyOf'
    absoluteKeywordLocation: '#/$defs/nested/anyOf',
    error: 'no subschemas are valid',
  }
]

the absoluteKeywordLocation provides value on top of keywordLocation, and
using a placeholder base uri would just be adding useless noise to all those properties.

@Relequestual
Copy link
Member

100% what @handrews said in his last comment above.

But it's not needed for output. e.g. what's wrong with this?

@karenetheridge What output do you get if you reference another schema document or you have an embedded schema resource?

absoluteKeywordLocation provides you the ability to know where the keyword is located, regardless of document or embedded schema resources. The original idea was so you don't have to follow reference chains, but it's TOTALLY FINE, and expected, to not provide this value if the URI is not provided (that it seems was the original intent).

Consumers of the output shouldn't have to do URI resolution to find the keyword location. Downstream tools may mandate that the schema has an $id.

@ssilverman
Copy link
Member Author

The only hard part is when the URI in absKeywordLocation is from an $id in the middle of the document, so you still have to hunt for the location. I like to track the ID and the base “loading” URI of the enclosing document. Also, I’ve chosen in my validator to force the user to specify at least a default base URI for each loaded schema (which is likely the retrieval URL).

@handrews
Copy link
Contributor

@ssilverman

The only hard part is when the URI in absKeywordLocation is from an $id in the middle of the document, so you still have to hunt for the location.

To some degree that's a schema design problem. You can write unmaintainable garabage in any system 😝

But most people will not be throwing $id around randomly. If they use it anywhere other than the document root at all, it's mostly likely under $defs and would be straightforward to find.

@karenetheridge
Copy link
Member

karenetheridge commented Jul 19, 2020

@karenetheridge What output do you get if you reference another schema document or you have an embedded schema resource?

In all cases, absoluteKeywordLocation (in my implementation) is the fully-resolved-as-best-as-we-are-able canonical URI of the present location. If there was no base uri provided and no $id anywhere in this schema or parents, it's the json pointer from the document root. If a base uri was provided (either to the implementation separately, or via an $id somewhere above the present location), then the canonical uri is that base uri combined with the json pointer path from wherever the base uri's location is (the document root or the innermost $id). It's the path of shortest traversal from something with a known canonical uri, to the present location (whereas keywordLocation is the full path the schema evaluation took to get there, which may have several $refs and even loops in it). In my implementation I always provide absoluteKeywordLocation in errors/annotations whenever it's different from keywordLocation (which is iff a canonical uri is known, inclusive-or a $ref or $recursiveRef has been traversed).

@Relequestual Relequestual merged commit 9b32946 into json-schema-org:master Aug 20, 2020
karenetheridge added a commit to karenetheridge/json-schema-spec that referenced this pull request Aug 26, 2020
>
> this corrects an oversight from the changes made in json-schema-org#938.
karenetheridge added a commit to karenetheridge/json-schema-spec that referenced this pull request Aug 26, 2020
@ssilverman ssilverman deleted the output-keyword-relative branch September 5, 2020 09:01
@gregsdennis gregsdennis added clarification Items that need to be clarified in the specification and removed Type: Bug labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Items that need to be clarified in the specification core
Projects
Development

Successfully merging this pull request may close these issues.

5 participants