Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$merge and $patch for "schema extensions" #15

Closed
fge opened this issue Apr 6, 2016 · 100 comments
Closed

$merge and $patch for "schema extensions" #15

fge opened this issue Apr 6, 2016 · 100 comments

Comments

@fge
Copy link
Contributor

fge commented Apr 6, 2016

And again, those are the two only reliable mechanisms allowing for truly extending schemas in an unambiguous fashion (yes, I love unambiguous definitions). Recall:

Recall:

  • implementations would be REQUIRED to implement $merge;
  • implementations MAY implement $patch.

Why, then, define $patch? Simply because it allows for schema alterations which $merge cannot do. However, $merge answers the vast majority of cases.

Recall of the rules:

  • those keywords take precedence over all other JSON Schema keywords;
  • however, $ref still takes precedence.
@brettz9
Copy link

brettz9 commented Apr 7, 2016

I'd personally prefer requiring the more expressive one, JSON Patch by default (and either also requiring JSON Merge Patch or making it optional). Even though JSON Patch is more cumbersome, null is, imo, not that obscure of a feature to forego if support is being added for one or the other.

@awwright
Copy link
Member

awwright commented Apr 8, 2016

You'll have to clarify what you mean by "And again" because this is an entirely new issue in the tracker. Can you provide some use-cases of what problems this solves?

@fge
Copy link
Contributor Author

fge commented Apr 8, 2016

@ACubed I have mentioned it countless times on the forums, but anyway, here it goes again...

Schema at uri1 reads:

{
    "description": "base schema",
    "type": "object",
    "properties": { "p1": { "type": "string" } },
    "additionalProperties": false
}

Schema at uri2 wants to extend that schema so that p1 has a minimum length of 4, and add a property p2 of type integer:

{
    "$patch": {
        "source": { "$ref": "uri1" },
        "with": [
            { "op": "add", "path": "/properties/p1/minLength", "value": 4 },
            { "op": "add", "path": "/properties/p2", "value": { "type": "integer" } }
        ]
    }
}

That is with JSON Patch.

With JSON Merge Patch that would be:

{
    "$merge": {
        "source": { "$ref": "uri1" },
        "with": {
            "properties": { 
                "p1": { "minLength": 4 },
                "p2": { "type": "integer" }
            }
        }
    }
}

This means there is no need for the hacks of "limiting only to defined properties" etc.

Unambiguous and it Just Works(tm).

And yes, I said "again" since I have proposed this idea for more than a year.

@awwright
Copy link
Member

awwright commented Apr 8, 2016

@fge Please understand it needs to get posted here, too, for the record. Thanks!

@brettz9
Copy link

brettz9 commented Apr 8, 2016

Note that with JSON Merge Patch, null is used to indicate deletion of a value, but there is no mechanism in that specification for adding the value null. So while JSON Merge Patch is a nice option to have around since it is more compact and easier to see at a glance what the changes are, it has this deficiency regarding null values.

@fge
Copy link
Contributor Author

fge commented Apr 8, 2016

@brettz9 yes, I am fully aware of it; however, no JSON Schema keyword currently allows null as a value, so this remains a viable option.

Maybe this will not be the case in v5 though.

@epoberezkin
Copy link
Member

"type": "null" ?

@fge
Copy link
Contributor Author

fge commented Apr 8, 2016

@epoberezkin that is the JSON String "null", not JSON null

@epoberezkin
Copy link
Member

Ah. I now understand what you mean - there is no null in JSON-schema itself indeed.

@erosb
Copy link

erosb commented Apr 8, 2016

@fge "enum" does permit null, doesn't it?

@epoberezkin
Copy link
Member

null can be used in enums, but it is ok for $merge, as it replaces the whole array. Only constant will be the issue, but it's an edge-case really.

@epoberezkin
Copy link
Member

@fge, just to clarify. The result of both $merge and $patch is the patched schema, there is no assumption that the original schema gets modified, correct?

So I can create schemas like this:

{
  "id": "my-meta-schema.json",
  "$merge": {
    "source": "http://json-schema.org/draft-04/schema#",
    "with": {
      "properties": {
        "myCustomKeyword": { "type": "string" }
      }
    }
  }
}

@epoberezkin
Copy link
Member

I agree with @fge that $merge covers literally all real use cases. It doesn't allow removing nulls and adding/removing array items, but I think we can live without it. Drop $patch maybe?

@fge
Copy link
Contributor Author

fge commented Apr 8, 2016

@epoberezkin yes, the result is the patched schema; the original schema is not touched in any way, shape or form.

Of course, this means requiring a preprocessing of some sort, but then so does $ref, and it still retains priority.

@epoberezkin
Copy link
Member

With the current standard it's relatively easy to extend a schema that is not recursive without merge/patch - allOf does good enough job.

But for recursive schemas this approach simply doesn't work, so there is no mechanism to extend meta-schemas and any other recursive schemas, which is a shame... So $merge is really important.

@fge
Copy link
Contributor Author

fge commented Apr 8, 2016

@erosb only as a member in an array.

But then JSON Merge Patch doesn't allow modifications of array elements individually. If you specify an array, the whole array is replaced.

JSON Patch on the other hand allows patching of arrays without a problem.

@fge
Copy link
Contributor Author

fge commented Apr 8, 2016

@epoberezkin that is not correct. Your schema would be:

{
  "id": "my-meta-schema.json",
  "$merge": {
    "source": { "$ref": "http://json-schema.org/draft-04/schema#" },
    "with": {
      "properties": {
        "myCustomKeyword": { "type": "string" }
      }
    }
  }
}

The argument to source is a JSON Schema, not a URI to a JSON Schema.

@epoberezkin
Copy link
Member

Got it. Yes, makes sense. This way you can both patch some third-party schema and also use the same patch to patch multiple schemas without the need to put them in separate files.

@brettz9
Copy link

brettz9 commented Apr 16, 2016

FWIW, JSON Patch also has copy and move, especially useful when one wishes to keep in sync with the referenced document, however it changes... JSON Merge Patch not only can't do this but it is ambiguous as to whether a change was a result of a move or just a deletion+addition

(And incidentally, if there is ever a JSON Merge Patch 2.0, I hope it can avoid the null problem by using something like "\u0000" in place of null (and then require strings to begin with an actual NUL to be preceded by one extra NUL) as well as to allow modification of strings by allowing for inclusion of the previous string, e.g., "Prepending some text to the old value: ${current}", with \${ allowing the literal.)

@awwright
Copy link
Member

@brettz9 (Or just introduce a "delete" operation.)

@brettz9
Copy link

brettz9 commented Apr 17, 2016

@ACubed: JSON Patch has the "remove" operation, but I'm speaking of JSON Merge Patch, which doesn't have--and wouldn't work with--operations per se--it just mimics the appearance (of the altered portions) of the document (as in @fge's example)--with the exception of null which it overrides. Overriding a specific string (and providing an escape option for it, as I mentioned) would allow one to avoid the ambiguity.

@brettz9
Copy link

brettz9 commented Apr 18, 2016

The following is my ideal JSON patch/merge approach which I'll call "Power Patch". It has the full expressivity of JSON Patch, but in a more succinct form, and it can optionally include JSON Merge Patch patches and/or "Whole document" patches (explained below).

The paths mentioned below are, as in JSON Patch, JSON Pointers.

value can be any JSON value.

Order is significant, even for the keys of an object (which, incidentally, ECMAScript now assures).

mergePatchValue would behave as patch objects in JSON Merge Patch, with the exception (discussed earlier) that null could be used as an actual value to add to a document, and a single "\u0000" would trigger deletions instead (with other strings starting with NUL requiring an additional NUL to be added at the beginning). The advantage of this approach is that it allows a clear indication of where modifications are being made without needing to redefine the whole document and without needing to specify the operations (though limited to JSON Patch's add, replace, and remove).

docValue would be shaped similarly to mergePatchValue, but instead of using NUL for deletions, this object would cause any properties which had been present before and which were not specified now to be deleted. Although this might technically not be what we think of as a patch, its value lies in ensuring that each snapshot reflects the whole structure at a given moment, and without explicitly needing to delete unused items. And with its integration into "Power Patch", users have the option of switching to or from this approach as desired.

The rest of the properties (which are inspired by, and expanding on, JSON Patch), offer the advantage of allowing for moving, copying, and testing. Unlike the other approaches, by looking at any given patch, you are able to tell what parts were assumed to be already present (the paths) and what parts are new (the values). In order to allow the JSON Patch properties to work with less redundancy (as in JSON Merge Patch or the "Whole document" method), I've added basePaths which is an object whose key is a base path and whose value is itself a Power Patch object (which can therefore have its own "basePaths" for nesting paths further).

The following is a single patch showing all properties, though each property is optional. These patches could also be put within an array (or an object whose keys are versions) to indicate a sequence of patches to be applied.

{
    merge: mergePatchValue,
    whole: docValue,
    basePaths: {
        path1: {
            // Operations here would use paths (e.g., path2) relative to the base path (i.e., path1)
            add: {
                path2: value
            }
            basePaths: {
                path2: {
                    replace: {
                        path3: value
                    }
                }
            }
        }
    },
    test: {
        "path1": value
    },
    add: {
        "path1": value
    },
    replace: {
        "path1": value
    },
    move: {
        "fromPath1": "destinationPath1"
    },
    copy: {
        "fromPath1": "destinationPath1"
    },
    remove: ["path1"]
}

The only disadvantage I see to this approach (as compared to JSON Patch) is that if you want to allow some kind of custom options to a given operation, you'd have to add them as a property outside of the operation object (e.g., copyOptions). But I think the succinctness is worth it.

@brettz9
Copy link

brettz9 commented Apr 18, 2016

One other consideration: as pertains to JSON Schema and if this format were to also be used for versioning (especially for offline storage like IndexedDB), move and copy might also be used as a signal to the application that the content itself ought to be moved or copied when there is a schema change (since the other operations would not provide enough information, e.g., whether a remove+add was deleting the schema property and its associated content or just moving it).

@fge
Copy link
Contributor Author

fge commented Apr 18, 2016

@brettz9 there is a reason why JSON Patch uses arrays, and your proposal demonstrates it: you cannot rely on keys order.

In your example it is requested that basePaths be evaluated before anything else... And that won't happen!

@brettz9
Copy link

brettz9 commented Apr 18, 2016

Actually, ECMAScript is being redefined to allow for predictable iteration order: http://tc39.github.io/ecma262/#sec-ordinary-object-internal-methods-and-internal-slots-ownpropertykeys . (And FWIW, in my example, it is not basePaths alone that needs to work first--that is just an optional item, but yes, iteration order does need to be reliable, but thankfully, it appears that with IE no longer a sticking point, iteration order is becoming reliable again...)

@fge
Copy link
Contributor Author

fge commented Apr 18, 2016

@brettz9 ECMAScript maybe. JSON? Won't happen, for backwards compatibility reasons.

@brettz9
Copy link

brettz9 commented Apr 18, 2016

Yeah, thanks, @fge, I do see the JSON spec mentions it being "unordered".

(And FWIW, I may have mistakenly repeated this comment since for-in order states at https://tc39.github.io/ecma262/#sec-enumerate-object-properties (which is indicated by https://tc39.github.io/ecma262/#sec-runtime-semantics-forin-div-ofheadevaluation-tdznames-expr-iterationkind ) that , "The mechanics and order of enumerating the properties is not specified but must conform to the rules specified below" and the only rule relevant to OwnPropertyKeys is "EnumerateObjectProperties must obtain the own property keys of the target object by calling its [[OwnPropertyKeys]] internal method"--which doesn't suggest for-in must iterate in the order of those keys. However, one could apparently use Object.getOwnPropertyNames safely as that refers to https://tc39.github.io/ecma262/#sec-getownpropertykeys which relies on OwnPropertyKeys which does abide by the first-in order, more or less.)

I hope there will be a JSON 2.0 which fixes this and the separator problem.

But regardless, the format proposed here can still be readily modified to be JSON-friendly (and even preferable to my earlier version since it allows multiple operations of the same type at the same level, e.g., there could be one add at the beginning and one at the end--and it also allows easier access to the operation type, and also if custom options were needed, they could be added as a third argument):

[
    ["basePaths", [
        ["path1", [
            // Operations here would use paths (e.g., path2) relative to the base path (i.e., path1)
            ["add", [
                ["path2", value]
            ]],
            ["basePaths", [
                ["path2", [
                    ["replace", [
                        ["path3", value]
                    ]]
                ]]
            ]]
        ]]
    ]],
    ["test", [
        ["/path1", value]
    ]],
    ["add", [
        ["/path1", value]
    ]],
    ["replace", [
        ["/path1", value]
    ]],
    ["move", [
        ["/fromPath1", "/destinationPath1"]
    ]],
    ["copy", [
        ["/fromPath1", "/destinationPath1"]
    ]],
    ["remove", ["/path1"]]
]

@fge
Copy link
Contributor Author

fge commented Apr 18, 2016

@brettz9 but why?

JSON Patch is an RFC, and implementations exist. Why go to the trouble of defining another alternative?

@erayd
Copy link

erayd commented Aug 30, 2017

@handrews Per your request here:

The reason $use is unsuitable for my purposes is because I want some way of achieving full inheritance, including overriding of validation keywords.

For example, $use can't do this:

{
  "definitions": {
    "parentSchema": {
      "description": "lots of complex validation rules that I don't want to copy / paste in a zillion places",
      "properties": {
        "propertyOne": {"type": "integer"},
        "propertyTwo": {"type": "string"},
        "propertyFiveHundred": {"type":"string", "pattern": "^complexRegex$"}
      }
    }
  },
  "properties": {
    "childOne": {
      "description": "same as /definitions/parentSchema, but without propertyOne, and with a different propertyTwentyFive"
    },
    "childTwo": {
      "description": "same as /definitions/parentSchema, but without propertyEightyOne and a different regex for propertyFiveHundred"
    }
  }
}

There's nothing wrong with $use if all you want to do is validate a ton of objects against a single schema, but give them all e.g. different descriptions for UI generation. However, it's patently unsuitable if you want to reuse an existing schema with minor changes to the validation logic. $use only solves a small subset of the issues that $merge / $patch can address completely.

@handrews
Copy link
Contributor

@erayd $use is definitely not attempting to solve those problems (intentionally).

Looking at your "but without..." are you trying to add, change, and remove properties? That's a surprising definition of "inheritance" to me. Usually it should be possible to treat a child as a parent: it should validate against the parent's schema, and behave the same way semantically to the extent that the behavior is defined by the parent.

How would you describe your desired parent/child behavior?

@erayd
Copy link

erayd commented Aug 30, 2017

@handrews

Yes, I know that $use isn't attempting to solve inheritance. However, based on previous discussion that I've read here, and please correct me if I'm wrong, there seems to be a big push to select either $use, or $merge / $patch - i.e. only one of those options will be implemented, and if $use is the chosen path, then inheritance will never happen.

...it should validate against the parent's schema...

I strongly disagree with this statement. If you override something in a child schema to behave differently (e.g. change the value of pattern, or change a required type) then something that validates against the child may well be invalid against the parent.

That said, arguing the semantics of how inheritance is defined is IMO irrelevant, the important thing is that the functionality is desirable, and $use can't do it.

How would you describe your desired parent/child behavior?

Removal is a nice-to-have, but I can do without it, it just saves a bit of duplication if it's available, and I know that $merge / $patch can do it. Changing stuff is essential, as is adding things.

@handrews
Copy link
Contributor

@erayd My apologies, I kind of intentionally left the outcome vague, which I realized (at the time, but forgot momentarily) was likely to produce either/or arguments. In a lot of cases I want that. But it's definitely possible that we will do more than one thing, or nothing at all (decisively, as in close all of it and declare things out-of-scope of validation, but perhaps in scope for code generation).

As for the definition of inheritance... in most languages, an instance of a child class can be treated as an instance of the parent class. That's what inheritance means to me.

What you describe is what I would call monkey-patching (that's the term in Python, not sure how common it is elsewhere). You take a thing, perform arbitrary edits on its data and methods, and use the resulting thing. It's a very important technique for many test frameworks, but I would not consider it inheritance. That's not to dismiss it as a use case, I'm just trying to sort out terminology here.

One of the main objections to $merge/$patch from many is that it is overly powerful (monkey-patching vs classical inheritance, for lack of a better term).

@erayd
Copy link

erayd commented Aug 30, 2017

@handrews

I see what you mean. For what it's worth, implementing both $use and $merge / $patch feels messy to me, noting that $use is essentially just a restricted subset of $merge. However, I'd far rather see both of them than be stuck with $use only.

As for the definition of inheritance... in most languages, an instance of a child class can be treated as an instance of the parent class. That's what inheritance means to me.

Are you able to provide some examples of what you mean? Your definition of inheritance doesn't sound like one I've ever encountered, unless I'm badly misunderstanding you.

Most implementations of inheritance I've come across don't force children to be treatable as an instance of the parent class, they only insist that the API be the same (or a superset). For example, a method that is overridden in a child class must have a declaration and return type that is compatible with the parent, but the actual implementation of that method can be anything you want -- and when called with the same arguments may return a very different result than the parent would. C++, Java and PHP are some examples of languages that work this way.

JSON schema doesn't exactly fit the classical definition of a class very well, as it has no API to speak of - perhaps that's why we're getting muddled over definitions?

@handrews
Copy link
Contributor

@erayd it's mostly that we're using different definitions of "behave the same", I think.

I'm not talking about the implementation of methods at all, those are irrelevant. But the semantics should be compatible, meaning that the child should not violate the parent's semantics (although the parent need not entirely satisfy the child's). Of course that's not enforced by the languages, but violating that would get thrown out in code review if not sooner in any reasonable environment.

JSON Schemas are constraint systems, so a "child" set of constraints should be the same as or more restrictive than a parent. Otherwise you're just splicing stuff together and it cannot reasonably be called inheritance in my view. That doesn't mean you can't want that splicing feature, and that sort of splicing is exactly what this issue proposes. Which is probably why it's the most controversial option.

@erayd
Copy link

erayd commented Aug 30, 2017

@handrews I think I see what you're getting at now :-).

JSON schema already allows for making more restrictive subsets via allOf, but I do feel there's a place for something that allows creating derived schemas that are not purely a subset of the parent. $merge / $patch feels like an ideal solution to this for me, but I'd be open to other ways of achieving the same outcome.

Would having a keyword to disable this kind of derivation resolve some of the controversy? Perhaps something like only allowing derivation from schemas which do not have "final": true set.

@Relequestual
Copy link
Member

If we accept this proposal, we should focus on making sure we have the required tests, and note the instnaces where we expect implementers to not support and thow errors. @epoberezkin
already has tests in his implementaion.

@handrews
Copy link
Contributor

@erayd I would be more likely to support this if there is a provision for opting out. I think I'd prefer something like "$atomic": true, which implies that this is not something that is split-able (short of a supercollider ;-)

I don't like final because it implies inheritance which I obviously think is the wrong conceptual model for these keywords.

Fundamentally, I don't need this feature (particularly not if "readOnly" becomes an array, which solves my biggest problem), so my main concern is being able to prevent people from abusing it.

If we have it anyway, it would remove the need for a specific solution to re-using partial Link Description Objects, I suppose.

@Relequestual agreed on the need for really good tests.

@handrews
Copy link
Contributor

I'm trying to think about how to balance this, and the best thing I can come up with is to:

  1. Implement "$schema" as an array, which is discussed (with other options) in Understanding extended meta-schemas #314.
  2. Allow using the array form to independently signal support for plain validation or hyper-schema, plus support for "$merge"/"$patch"

This would provide a mechanism for schema authors to declare whether monkeypatching is allowed, and allow implementations to choose whether to support "$merge"/"$patch" independent of everything else. We could even allow declaring support for "$merge" and "$patch" separately. I confess that while I would like to define "$patch" if we define "$merge", in practice I only used "$merge" when I used anything like this at all. Which I have not for several years.

@handrews
Copy link
Contributor

Alternatively, as the full problem being considered in #314 is rather complex, another option would be to add support for feature declaration:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$requires": ["$merge"],
  "type": "..."
}

This is a more straightforward solution to allow schema authors and implementations to negotiate behavior. By declaring that the schema requires "$merge" support, the schema author ensures that their schema will either be interpreted correctly, or rejected with a clear error (unsupported optional keyword). "$patch" would be another optional requirement.

This is different from "format" support, as an implementation that does not validate based on "format" will still work with schemas that use "format", and since "format" is useful for conveying semantics to applications, it is still a useful keyword even when validation support is not present.

On the other hand "$merge" and "$patch" are structural keywords. Attempting to interpret a schema that uses either without supporting them will not end well :-P

But requiring everyone to implement them, particularly "$patch', is burdensome. Especially when many people have been quite clear that they/we neither need nor want this functionality.

@handrews handrews added the core label Aug 31, 2017
@erayd
Copy link

erayd commented Aug 31, 2017

@handrews I really like that idea. Seems to me that feature declaration is a good middle ground, and it allows for adding things like $merge to the standard without requiring universal support. It also makes custom extensions easier to manage.

I do feel that the feature list needs to be namespaced somehow, to ensure that custom stuff can't conflict with 'official' names for things. Possibly using something like 'org.json-schema.$merge', which is both easily understood and is already a common namespacing format.

@handrews
Copy link
Contributor

@erayd yeah, that would make sense. I was just thinking of the $ being "our" namespace, but this would be more flexible.

This could also easily allow someone to declare support for something like "$use" if they want a less powerful tool. "$use" is actually more complex to implement, as it needs to detect when it's being, er... mis-used :-) But it would give us (or a 3rd-party) the ability to add it in a way that is compatible with what we have so far.

I'm going to file the feature flag thing as it's own issue as I'm sure a lot of people ignore this issue as it's approaching triple-digit numbers of comments. I'm tempted to throw in a few more just to hit 100...

@epoberezkin
Copy link
Member

see ajv-validator/ajv-merge-patch#14,

TL;DR: allow an array of schemas to be merged as well as one schema. Users seem to need it.

@handrews
Copy link
Contributor

@epoberezkin since $merge and $patch are defined in terms of media types, their behavior should not be changed. However, it should be possible to define behavior in terms of a further media type (for instance, Kubernetes defines their own patch media type which is like merge-patch+json but merges arrays based on some sort of internal code annotations).

One option might be to drop the dual keywords and instead just use one of them, and add a third field that indicates the media type of the with clause. That would be flexible and extensible (implementations could register additional patch media types like registering additional formats).

@epoberezkin
Copy link
Member

yes, the idea is to allow array in with, should not require media type change I think

@handrews
Copy link
Contributor

At the end of the vote-a-rama, I said that I would consolidate these issues to focus the discussion for draft-08. I've filed #515 which outlines the two possible competing approaches. It's pretty abstract, but once we choose an approach, deciding which exact keyword and behavior we want should be less controversial. Therefore I'm closing this in favor of #515.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests