Fix output relative locations to be consistent #938

ssilverman · 2020-05-23T10:07:17Z

No description provided.

handrews

@gregsdennis my vague recollection here is that we wanted a pointer (not a fragment-encode pointer) because this value is not resolvable as any sort of URI. It's a pointer that works on that transitively included in-memory bundle thing that I filed #935 to come up with a name for after @karenetheridge asked about it and @Relequestual said we should name.

We define the thing this pointer expresses as the "validation path" in the section on static and dynamic scopes, but don't really say anything about URIs vs raw pointers.

Anyway, I'm pretty sure from my point of view the raw pointer was intentional and the example showing a fragment is an error, because there's no base URI against which that fragment could be resolved, but I could possibly be sold on the other way around.

ssilverman · 2020-05-23T18:41:15Z

That response, then, might imply that the instance location shouldn't be fragment-encoded either.

Another thought: Since all documents can technically have an ID, even if it's the URL used to load the JSON (per RFC 3986), I'm not bothered by the fact that we could represent both keywordLocation and instanceLocation as URIs in fragment-encoded form.

karenetheridge · 2020-05-23T18:52:24Z

I agree that instanceLocation and keywordLocation should be raw json pointers, not fragments -- i.e. "" and "/foo/bar/baz", not "#" and "#/foo/bar/baz". Only absoluteKeywordLocation should be a URI (and not a relative URI, either).

(edit: note that absoluteKeywordLocation isn't even an "absolute URI" as per the definition in the URI RFC -- because it will likely have a fragment. It should always have a scheme and path though -- assuming that there is a base uri associated with the overall document, which isn't always true as @gregsdennis says below).

gregsdennis · 2020-05-23T21:00:25Z

all documents can technically have an ID

This is not true if a schema is merely defined in code. My validator does assign a document location for $ref dereferencing, but it doesn't assign an ID.

I agree that this was intentionally a raw pointer. The idea is that it is intended to identify the path that was taken to get to the location.

As to whether it includes a "#"... I have no big feels since I'm not sure I understand the difference.

ssilverman · 2020-05-23T22:34:49Z

By “can technically have an ID”, I meant that either one is determined from the algorithm in RFC 3986 or one is assigned. I’m not saying that “there necessarily exists an ID defined for that document.”

I’m not attached to either form, but my point is:

All output examples in the spec use fragment syntax.
The instance location is defined in the spec to MUST be in fragment form.
The keyword locator is not defined in the spec to be in fragment form.
The above comments imply a non-fragment form is desired.
I’m trying to resolve these inconsistencies.

I’m not attached to either form, because URI fragment syntax form and raw JSON pointers are interchangeable, according to the JSON Pointer RFC. (I’m preempting those who’ll jump on this asking about off-topic absolute URIs: we’re just talking about the non-absolute keyword and instance locations and their URI fragment or JSON Pointer forms.) I’ll leave it up to those that wrote the spec to explain their true desires and I’ll update this PR accordingly.

ssilverman · 2020-05-24T02:00:04Z

I just updated with new language to use just "JSON pointers" for both keyword locations and instance locations. I also updated the examples.

karenetheridge · 2020-05-24T16:29:29Z

The schema in output/schema.json will need updating as well to show that instanceLocation and keywordLocation are no longer uri-references, but rather json pointers.

handrews · 2020-05-24T19:35:56Z

@gregsdennis

As to whether it includes a "#"... I have no big feels since I'm not sure I understand the difference.

Including a # means it's a fragment, and therefore some escaping needs to be done to handle URI reserved characters. This is rarely needed in practice but it is a thing.

More philosophically, having a fragment implies that there's a base URI onto which you can tack that fragment to produce a full URI that can be used the way URIs are always used. We could potentially figure out how to define such a base URI but it seems unnecessary.

@karenetheridge absoluteKeywordLocation is not meant to reference RFC 3986's absolute-URI. It's analogous to absolute vs relative filesystem paths. Sadly, there are only so many words available, and not all of their usages agree anyway.

So:

If there is a reasonable base URI identifying a resource of a media type that accepts JSON Pointer fragments as its fragment syntax, then we can make that a URI fragment.
- A JSON Schema resource meets this criteria (application/schema+json supports JSON Pointer fragments).
- JSON instances do not (b/c application/json explicitly does not support JSON Pointer fragments).
- An approach that would allow constructing a compound document that preserved the presence of $ref in paths was suggested in Allow "$ref" to take a schema, to make inlining easier? #779 but rejected, at least at the time.
Otherwise, we should use regular JSON Pointers.

ssilverman · 2020-05-25T00:15:58Z

I just added a new commit with the output schema change to "json-pointer" from "uri-reference".

karenetheridge · 2020-05-25T03:16:39Z

I think we need one more fix to the schema, due to:

Otherwise, we should use regular JSON Pointers.

that is, absoluteKeywordLocation should be { "type": "string", "anyOf": [ { "format": "uri" }, { "format": "json-pointer" } ] }.

handrews · 2020-05-25T03:39:13Z

@ssilverman this looks like what you say it is. I'll defer to @gregsdennis on whether this is the desired option since this is really his area. Let me know if any of my answer about fragments vs non-fragment pointers is unclear, I'd be happy to write out examples or whatever.

keywordLocation could be a fragment since it's relative to an application/schema+json resource, but then we end up with a mix of stuff so maybe just making them all pointers is better.

ssilverman · 2020-05-25T03:59:13Z

I think that "absoluteKeywordLocation" should stick with being a URI type because it's my belief that that's its intended purpose, or maybe even change to "uri-reference" if it's just going to hold a JSON Pointer (as a fragment), but I'm not attached. It's certainly handy to have "absoluteKeywordLocation" be a "URI" type in code.

I'll also note, for posterity and for any future readers reading this PR, that "URI", according to RFC 3986 (https://tools.ietf.org/html/rfc3986#appendix-A), is not the same as "absolute URI"; it's a superset.

"Absolute URI" basically means "has a scheme, does not have a fragment, but may have anything else."
"Relative reference" basically means "does not have a scheme, but may have anything else."
"URI" means "absolute URI that may also have a fragment."

karenetheridge · 2020-05-25T04:20:13Z

If "keywordLocation" will always be a json pointer and "absoluteKeywordLocation" will always be a uri, then one (or both?) of these properties should maybe be renamed, so it is more clear that one is a URI and the other is not.

We cannot say that absoluteKeywordLocation is an absolute URI because we may not have a base URI to resolve against (for example schemas being passed in directly into the implementation, not from an external source. #849 is a good place to spiral off discussion on this). (We also don't even have a format defined for "absolute-uri", which I've always thought was an interesting oversight.)

karenetheridge · 2020-05-25T17:32:34Z

that is, absoluteKeywordLocation should be { "type": "string", "anyOf": [ { "format": "uri" }, { "format": "json-pointer" } ] }.

I've thought about this more and I think that a polymorphic type here would be a bad idea, causing more work for downstream consumers. Instead, it should always be a URI -- and if we don't have an (absolute) base uri to resolve it against, then it would just be a fragment-only URI reference (e.g. "#/foo/bar/baz"). So, the "absolute" implication in the name is only on a best-effort basis, and it is not necessarily resolvable. But if the schema had an absolute $id in it, or a relative URI $id that we were able to resolve to form an absolute URI, then it would be absolute.

…ed form

…locations

ssilverman · 2020-06-11T14:53:25Z

Force pushed to track master.

karenetheridge · 2020-07-16T21:36:04Z

This PR is good IMO; it would be great to see it merged.

One more thing -- in output/schema.json, 'absoluteKeywordLocation' really ought to be a uri-reference, not a uri, since it is not guaranteed in the spec that a schema's base uri is a uri (as opposed to uri-reference) -- and often, especially with adhoc schemas, the base uri is simply ''.

Otherwise, the spec should be tightened up to state that a schema's base uri must always be a uri (which will force implementations to use a default value, say http://localhost:1234 but perhaps there are better defaults) in the absence of any direction from the user.

Relequestual · 2020-07-17T08:48:17Z

I'm on mobile so sorry for the lack of references.

@karenetheridge Regarding your last comment on the base uri is not always defined. When it's not defined by the user, the reference we use which defines how you determine the base uri, states that if it's not provided, it is application defined (after a few considerations first). In effect applications must provide a default in order to properly do URI resolutions. I've been round this roundabout a few times. What isn't clear is if such a base uri should be exposed to the end user or not.

Edit: reading the other comments, I'm echoing what others have said 😅

Relequestual

Looks good to me!

karenetheridge · 2020-07-17T17:19:26Z

What isn't clear is if such a base uri should be exposed to the end user or not.

That's the thing.. in previous drafts, the base uri wouldn't necessarily ever be exposed to the user, but now we mandate a format for errors which contains absoluteKeywordLocation, which is going to involve the base uri for every schema that doesn't specify its own absolute $id (and $id is permitted to be a uri-reference, not uri, even in the top level, and it may not even exist at all).

Relequestual · 2020-07-18T11:06:30Z

http://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.3.2
This information MAY be omitted only if either the relative location contains no references or **if the schema does not declare an absolute URI as its "$id"**.

If the user sets an $id, then they get that back in the absoluteKeywordLocation URI. If they haven't then they MAY get nothing, or they MAY get the application defined URI. My feeling is If they haven't set an $id, then they probably don't care about this URI in the output result, and if they DO, they should set an $id.

@karenetheridge absoluteKeywordLocation is not meant to reference RFC 3986's absolute-URI. It's analogous to absolute vs relative filesystem paths. Sadly, there are only so many words available, and not all of their usages agree anyway.

OK. I'd like to avoid this potential confusion if possible. And I feel it is possible.

In terms of the whole "absolute URI" vs "absolute-uri", what we mean is an "aboslute location using the canonical URI and URI fragment". I think not giving it a name beyond an "aboslute keyword location URI" is preferable to having the potential confusion over "aboslute URI" and "absolute-uri".

@karenetheridge Would you mind creating a new issue for this please? Then we can merge and resolve this PR.

karenetheridge · 2020-07-18T17:02:52Z

If the user sets an $id, then they get that back in the absoluteKeywordLocation URI. If they haven't then they MAY get nothing, or they MAY get the application defined URI. My feeling is If they haven't set an $id, then they probably don't care about this URI in the output result, and if they DO, they should set an $id.

If there is no $id, there is no canonical uri (other than a base of '' and then a json pointer fragment - but that's still going to be different than keywordLocation if a $ref was traversed), but my key point here is an $id does not have to be absolute. I may declare my schema with an $id of "myschema.json" and have a $ref to "otherschema.json", and an error occuring at that $ref ought to be identified with otherschema.json. But these aren't absolute uris, despite output/schema.json insisting that absoluteKeywordLocation is format: uri not format: uri-reference.

@karenetheridge Would you mind creating a new issue for this please? Then we can merge and resolve this PR.

I'm not sure I understand which direction that issue should be going in. My understanding is that we define format: uri as "a uri with a scheme" (as per the URI RFC), and therefore absoluteKeywordLocation requires that the uri have a scheme, which is inconsistent with $id defined as format: uri-reference (a uri that does not require a scheme). There are two main ways we can resolve this:

be clear in the spec that the fully resolved uri of a schema resource should be absolute (that is, if the $id is a uri-reference, it needs to have an absolute base uri provided in some way to resolve it to an absolute uri)
alter output/schema.json so absoluteKeywordLocation is format: uri-reference

#2 is much more straightforward; #1 has wider ramifications for implementors (what happens if we are parsing a schema and we encounter a relative uri in $id and we have no absolute base uri with which to resolve it?)

And (back to my first paragraph) even if there is no $id in the schema, absoluteKeywordLocation is still different than keywordLocation if there were any $ref traversals, and knowing the short path in the schema where the error occured is still valuable. e.g. imagine a deeply recursive schema where the same $ref is traversed multiple times to arrive at the final schema location -- keywordLocation will be '..../$ref/.../$ref/.../$ref/...' which is pretty awful for a user to parse to figure out what subschema the error actually occured at. In my implementation, I always include absoluteKeywordLocation in the error output whenever a $ref has been traversed (that is, whenever keywordLocation != absoluteKeywordLocation), and in ad-hoc schemas this is just a json pointer fragment.

handrews · 2020-07-18T18:40:35Z

The whole point of absoluteKeywordLocation is to be a full URI. "absolute" was a misnomer here because this URI will often have a fragment (I think all of us got confused by that at one point or another).

It is true that $id can take a URI-reference value, but it MUST resolve to an absolute URI (and in this case we really do mean absolute), and that is what should be used, plus a fragment if needed, in absoluteKeywordLocation.

karenetheridge · 2020-07-18T18:51:23Z

It is true that $id can take a URI-reference value, but it MUST resolve to an absolute URI

FWIW, I think this is horrible. It means that implementations must either force the caller to provide an absolute base uri for ad-hoc schemas, or to use a default that will just be a placeholder and make no real sense. I doubt that many implementations enforce this right now.

handrews · 2020-07-18T19:05:48Z

@karenetheridge

FWIW, I think this is horrible. It means that implementations must either force the caller to provide an absolute base uri for ad-hoc schemas, or to use a default that will just be a placeholder and make no real sense. I doubt that many implementations enforce this right now.

This is literally how everything that uses URIs works. There is a whole section on base URI resolution covering all possible cases in RFC 3986 (which I think you know). We're not doing anything but referencing totally normal behavior.

If you have no need for a real base URI, then a non-meaningful placeholder is in fact the most appropriate thing. Why force people to choose something when all that it's needed for is output? The about: URI scheme is suitable for this.

karenetheridge · 2020-07-18T19:20:30Z

But it's not needed for output. e.g. what's wrong with this?

schema:

{
  $defs: {
    base: { type: string },
    nested: {
      anyOf: [
        { $ref: '#/$defs/base' },
        {
          type: object,
          properties: {
            foo: { $ref: '#/$defs/base' }
          }
        }
      ]
    }
  },
  $ref: '#/$defs/nested',
}

evaluated against data { foo: { foo: true } } gives these errors (this is copied
directly from my implementation):

[
  {
    instanceLocation: '',
    keywordLocation: '/$ref/anyOf/0/$ref/type'
    absoluteKeywordLocation: '#/$defs/base/type',
    error: 'wrong type (expected string)',
  },
  {
    instanceLocation: '/foo',
    keywordLocation: '/$ref/anyOf/1/properties/foo/$ref/type'
    absoluteKeywordLocation: '#/$defs/base/type',
    error: 'wrong type (expected string)',
  },
  {
    instanceLocation: '',
    keywordLocation: '/$ref/anyOf/1/properties'
    absoluteKeywordLocation: '#/$defs/nested/anyOf/1/properties',
    error: 'not all properties are valid',
  },
  {
    instanceLocation: '',
    keywordLocation: '/$ref/anyOf'
    absoluteKeywordLocation: '#/$defs/nested/anyOf',
    error: 'no subschemas are valid',
  }
]

the absoluteKeywordLocation provides value on top of keywordLocation, and
using a placeholder base uri would just be adding useless noise to all those properties.

Relequestual · 2020-07-19T13:28:26Z

100% what @handrews said in his last comment above.

But it's not needed for output. e.g. what's wrong with this?

@karenetheridge What output do you get if you reference another schema document or you have an embedded schema resource?

absoluteKeywordLocation provides you the ability to know where the keyword is located, regardless of document or embedded schema resources. The original idea was so you don't have to follow reference chains, but it's TOTALLY FINE, and expected, to not provide this value if the URI is not provided (that it seems was the original intent).

Consumers of the output shouldn't have to do URI resolution to find the keyword location. Downstream tools may mandate that the schema has an $id.

ssilverman · 2020-07-19T18:00:19Z

The only hard part is when the URI in absKeywordLocation is from an $id in the middle of the document, so you still have to hunt for the location. I like to track the ID and the base “loading” URI of the enclosing document. Also, I’ve chosen in my validator to force the user to specify at least a default base URI for each loaded schema (which is likely the retrieval URL).

handrews · 2020-07-19T18:11:03Z

@ssilverman

The only hard part is when the URI in absKeywordLocation is from an $id in the middle of the document, so you still have to hunt for the location.

To some degree that's a schema design problem. You can write unmaintainable garabage in any system 😝

But most people will not be throwing $id around randomly. If they use it anywhere other than the document root at all, it's mostly likely under $defs and would be straightforward to find.

karenetheridge · 2020-07-19T18:54:48Z

@karenetheridge What output do you get if you reference another schema document or you have an embedded schema resource?

In all cases, absoluteKeywordLocation (in my implementation) is the fully-resolved-as-best-as-we-are-able canonical URI of the present location. If there was no base uri provided and no $id anywhere in this schema or parents, it's the json pointer from the document root. If a base uri was provided (either to the implementation separately, or via an $id somewhere above the present location), then the canonical uri is that base uri combined with the json pointer path from wherever the base uri's location is (the document root or the innermost $id). It's the path of shortest traversal from something with a known canonical uri, to the present location (whereas keywordLocation is the full path the schema evaluation took to get there, which may have several $refs and even loops in it). In my implementation I always provide absoluteKeywordLocation in errors/annotations whenever it's different from keywordLocation (which is iff a canonical uri is known, inclusive-or a $ref or $recursiveRef has been traversed).

> > this corrects an oversight from the changes made in json-schema-org#938.

this corrects an oversight from the changes made in json-schema-org#938.

handrews reviewed May 23, 2020

View reviewed changes

handrews added this to the draft-08-patch1 milestone May 23, 2020

handrews added core output labels May 23, 2020

ssilverman changed the title ~~Fix output keyword relative location to be of URI fragment-encoded form~~ Fix output relative locations to be consistent May 24, 2020

ssilverman force-pushed the output-keyword-relative branch 2 times, most recently from ec5fa6f to df7a035 Compare May 24, 2020 01:55

ssilverman requested a review from handrews May 24, 2020 02:00

ssilverman force-pushed the output-keyword-relative branch from 9e96c21 to bb7ebb2 Compare May 27, 2020 01:47

karenetheridge mentioned this pull request Oct 11, 2023

please update with fixes to 2019-09 schemas json-schema-org/website#140

Closed

ssilverman added 3 commits June 11, 2020 07:51

Fix output instance relative location to be of non-URI fragment-encod…

3ffae94

…ed form

Update output examples to use non-URI JSON pointers for non-absolute …

f1f5f63

…locations

Update the output schema with json-pointer replacing uri-reference

cf59d7a

ssilverman force-pushed the output-keyword-relative branch from bb7ebb2 to cf59d7a Compare June 11, 2020 14:52

karenetheridge mentioned this pull request Jun 19, 2020

Issue-949: Fixing inconsistencies in output schema #950

Merged

Relequestual approved these changes Jul 17, 2020

View reviewed changes

Relequestual assigned Relequestual and handrews Jul 17, 2020

handrews approved these changes Aug 19, 2020

View reviewed changes

Relequestual merged commit 9b32946 into json-schema-org:master Aug 20, 2020

karenetheridge added a commit to karenetheridge/json-schema-spec that referenced this pull request Aug 26, 2020

fix verbose output example with json-pointer replacing uri-reference

d94ac2d

> > this corrects an oversight from the changes made in json-schema-org#938.

karenetheridge added a commit to karenetheridge/json-schema-spec that referenced this pull request Aug 26, 2020

fix verbose output example with json-pointer replacing uri-reference

b749600

this corrects an oversight from the changes made in json-schema-org#938.

karenetheridge mentioned this pull request Aug 26, 2020

fix verbose output example with json-pointer replacing uri-reference #974

Merged

ssilverman deleted the output-keyword-relative branch September 5, 2020 09:01

gregsdennis added clarification Items that need to be clarified in the specification and removed Type: Bug labels Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix output relative locations to be consistent #938

Fix output relative locations to be consistent #938

ssilverman commented May 23, 2020

handrews left a comment

ssilverman commented May 23, 2020

karenetheridge commented May 23, 2020 •

edited

Loading

gregsdennis commented May 23, 2020 •

edited

Loading

ssilverman commented May 23, 2020 •

edited

Loading

ssilverman commented May 24, 2020

karenetheridge commented May 24, 2020

handrews commented May 24, 2020

ssilverman commented May 25, 2020

karenetheridge commented May 25, 2020

handrews commented May 25, 2020

ssilverman commented May 25, 2020

karenetheridge commented May 25, 2020 •

edited

Loading

karenetheridge commented May 25, 2020 •

edited

Loading

ssilverman commented Jun 11, 2020

karenetheridge commented Jul 16, 2020 •

edited

Loading

Relequestual commented Jul 17, 2020 •

edited

Loading

Relequestual left a comment

karenetheridge commented Jul 17, 2020

Relequestual commented Jul 18, 2020 •

edited

Loading

karenetheridge commented Jul 18, 2020

handrews commented Jul 18, 2020

karenetheridge commented Jul 18, 2020

handrews commented Jul 18, 2020

karenetheridge commented Jul 18, 2020

Relequestual commented Jul 19, 2020

ssilverman commented Jul 19, 2020

handrews commented Jul 19, 2020

karenetheridge commented Jul 19, 2020 •

edited

Loading

Fix output relative locations to be consistent #938

Fix output relative locations to be consistent #938

Conversation

ssilverman commented May 23, 2020

handrews left a comment

Choose a reason for hiding this comment

ssilverman commented May 23, 2020

karenetheridge commented May 23, 2020 • edited Loading

gregsdennis commented May 23, 2020 • edited Loading

ssilverman commented May 23, 2020 • edited Loading

ssilverman commented May 24, 2020

karenetheridge commented May 24, 2020

handrews commented May 24, 2020

ssilverman commented May 25, 2020

karenetheridge commented May 25, 2020

handrews commented May 25, 2020

ssilverman commented May 25, 2020

karenetheridge commented May 25, 2020 • edited Loading

karenetheridge commented May 25, 2020 • edited Loading

ssilverman commented Jun 11, 2020

karenetheridge commented Jul 16, 2020 • edited Loading

Relequestual commented Jul 17, 2020 • edited Loading

Relequestual left a comment

Choose a reason for hiding this comment

karenetheridge commented Jul 17, 2020

Relequestual commented Jul 18, 2020 • edited Loading

karenetheridge commented Jul 18, 2020

handrews commented Jul 18, 2020

karenetheridge commented Jul 18, 2020

handrews commented Jul 18, 2020

karenetheridge commented Jul 18, 2020

Relequestual commented Jul 19, 2020

ssilverman commented Jul 19, 2020

handrews commented Jul 19, 2020

karenetheridge commented Jul 19, 2020 • edited Loading

karenetheridge commented May 23, 2020 •

edited

Loading

gregsdennis commented May 23, 2020 •

edited

Loading

ssilverman commented May 23, 2020 •

edited

Loading

karenetheridge commented May 25, 2020 •

edited

Loading

karenetheridge commented May 25, 2020 •

edited

Loading

karenetheridge commented Jul 16, 2020 •

edited

Loading

Relequestual commented Jul 17, 2020 •

edited

Loading

Relequestual commented Jul 18, 2020 •

edited

Loading

karenetheridge commented Jul 19, 2020 •

edited

Loading