Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive schema #117

Closed
wants to merge 4 commits into from
Closed

Conversation

RafaeLeal
Copy link

PR to start a conversation about recursive schemas, a prerequisite for json-schema->malli conversion

I added an entry for base-registry that accepts :recursive as a schema

This enables us to define recursive schemas even without using the registry.
Something like

[:map
  [:name string?]
  [:parent {:optional true} :recursive]]

I think we need to deal to schemas like
:recursive, [:and :recursive], [:or :recursive] or anything that doesn't have any real value and can lead to StackOverflow on evaluation

I was thinking in maybe support loop/recur, with something like:

[:map 
  [:family {:malli/loop :family-member}
    [:map 
      [:name string?]
      [:parent [:recursive :family-member]]]]]

What do you think?

@RokLenarcic
Copy link
Contributor

This would only allow one to recurse schemas that are in direct line. It would not allow for mutual recursion or recursion of siblings, I think.

Imagine you are serializing a tree where nodes are organizations and people. People can have organizations as children and organizations can have people as children. How would you define such a schema in Malli?

@RafaeLeal
Copy link
Author

Interesting. I haven't thought of that.
We could make :malli/loop receive a vector on :and and :or.

[:or {:malli/loop [:organization :person]}
   [:map
    [:organization/name string?]
    [:organization/children [:recursive :person]]]
   [:map
    [:person/name string?]
    [:person/children [:recursive :organization]]]]

What do you think?

@RokLenarcic
Copy link
Contributor

RokLenarcic commented Nov 15, 2019

Well generally the easiest way to solve this is to provide a way to reference a schema a context, and have the reference be used lazily. Ideally it would work like letfn macro.

; letfn like macro
(let-schema
  [::org [:map
             [:organization/name string?]
             [:organization/member [:vector [:ref ::person]]]
   ::person [:map
                   [:person/name string?]
                   [:person/parent [:ref ::person]]
                   [:person/orgs org [:vector [:ref ::org]]]]
(def person-schema (m/schema [:ref ::person]))
(def org-schema (m/schema [:ref ::org]))

This is cute, but there's many roadblocks for this, mainly this is not trivially serializable. This solution would play well with spec because they have a global registry, so to reference another schema, you could have [:ref ::org] and it would load ::org from the global registry lazily.

I am not sure how this could be implemented, but this syntax is what I think is a desirable syntax.

Overall I wouldn't call a recursive definition a loop or malli/loop, because loop is a procedural concept, and here we have a declarative schema specification.

@ikitommi
Copy link
Member

ikitommi commented Nov 16, 2019

Quick thinking aloud:

  • If the :ref is referencing a key in the :registry, it should work ok. There should be a way to validate a (immutable) registry value to see that references are wired correctly. Don't want to blow up at runtime for a missing referenced schema
  • for local recursion, what if the schema properties had a key :registry that is merged into current registry for all child schemas? this would allow to define:
[:and
 {:registry {::org [:map
                    [:organization/name string?]
                    [:organization/member [:vector [:ref ::person]]]]
             ::person [:map
                       [:person/name string?]
                       [:person/parent [:ref ::person]]
                       [:person/orgs [:vector [:ref ::org]]]]}}
 [:ref ::person]]
  • :id, :key, :name or :ref could also be used to support local recursion:
[:map {:ref ::country}
 [:name [:enum "FI" "PO"]]
 [:neighbors [:vector [:ref ::country]]]]

@ikitommi
Copy link
Member

ikitommi commented Nov 17, 2019

More though: do we even need the :ref at all? All schemas are looked by default from the registry, so just pointing to schema name should work.

with local registry

[:and
 {:registry {::org [:map
                    [:organization/name string?]
                    [:organization/member [:vector :person]]]
             ::person [:map
                       [:person/name string?]
                       [:person/parent ::person]
                       [:person/orgs [:vector ::org]]]}}
 ::person]

self recursion

[:and
 {:registry {::country [:map
                        [:name [:enum "FI" "PO"]]
                        [:neighbors [:vector ::country]]]}
  ::country}]

shortcut form (more compact, but not sure if this is really needed?)

[:map {:ref ::country}
 [:name [:enum "FI" "PO"]]
 [:neighbors [:vector ::country]]]

@RafaeLeal
Copy link
Author

Interesting. I like the direction that we're going for.
IMO feels weird to have a :and just for having properties. I'd add a generic :with-properties or specific :with-local-registry.

Either way, we need to define what is going to happen if you define something on the local registry that is already on the registry. Probably we should just override, but this could lead to confusion on large schemas with multiple local registries.

We would probably have a lot of code like this, also:

(def schemas {::org    [:map
                        [:organization/name string?]
                        [:organization/member [:vector :person]]]
              ::person [:map
                        [:person/name string?]
                        [:person/parent ::person]
                        [:person/orgs [:vector ::org]]]})

(def person [:and {:registry schemas} ::person])
(def org [:and {:registry schemas} ::org])

We could have a function malli.core/schemas that receive this registry and return a map with the schemas with the local registries.

Something like

(malli.core/schemas
  {::org    [:map
             [:organization/name string?]
             [:organization/member [:vector :person]]]
   ::person [:map
             [:person/name string?]
             [:person/parent ::person]
             [:person/orgs [:vector ::org]]]})

Regarding the need for a :ref, we need a point for lazy-loading the rest of the schema. This is an explicit way of saying this, but we could try to do something smarter.

@ikitommi
Copy link
Member

There was an earlier discussion about :schema or :with-properties schema, then considered as not-useful-enough. Agree on possible problems with registry collisions, but having a let like functionality would be coherent with Clojure. Maybe the opinion could be lifted to the library user via new option but still would need a default: allow or fail by default. Could be decided after using the feature at some time in pre/alpha (like most of the emerged decisions at malli).

@ikitommi
Copy link
Member

to go all in in aligning with clojure as a language, there would be :let schema, allowing silent overrides and a :lazy or :recur schema for references. Not suggesting anything, just thinking aloud :)

@nilern
Copy link
Contributor

nilern commented Nov 20, 2019

TL;DR: :ref is unnecessary. Recursion through registry is trivial. Local recursion should use :letrec scoping schema and be compiled to backpatching. Local recursion might not even be necessary at all.

Recursion can be implemented by

  1. Indirection: e.g. recursive defn:s first create the Var, then compile the function with calls to deref* for names that refer to vars (the Var object is resolved at compile time). I recall that in Lua global references go through the toplevel environment hashtable, basically the same except the string lookup happens at runtime.
  2. backpatching: First create values with mutable fields set to null, then initialize them with references to each other. I imagine this is how letfn works since Scheme letrec is definitely usually implemented like this (although RnRS does not specify the null/uninitialized value).
  3. fixpointing: Y-combinator etc. which I won't go into since it is hard to explain. Often it is also too limited or even slow.

None of these methods need explicit :ref, that sort of thing is only needed if you want separate compilation of parameterized things that are later linked together without indirection**. The compiler knows where names are bound whether they are local or global (at the moment they are all global -- bound in the registry).

We should probably use indirection for recursion through globals, i.e. just look up the schema from the registry at runtime or whatever the current implementation is. This allows interactive flexibility where things just keep working with a slightly different registry, just like namespaces.

Local recursion should probably be modeled after letfn, perhaps as the traditional :letrec:

[:letrec [:tree [:or :int :branch]
          :branch [:seq :tree]]
  :tree]

It seems natural to also implement it the same way, with backpatching. That would avoid the need for nested registries and runtime lookup. It is also obvious that this sort of scoping construct will do name shadowing. And making it a named schema like this means that it is trivially serializable.

I am not sure whether we need local recursion at all. I can't think of any schema library that has that capability or a use case where it would be absolutely necessary. Type theory has a concept of first-class recursive types with a fixpoint construct but practical statically typed languages never implement that; recursive types require global (or class/module level) definitions. But if we do decide to add :letrec, why not throw :let in there too.

I agree that :loop is not declarative. Nor would it be as generally useful here as in Clojure code. It would just allow schema checking for linked lists that are large enough to cause stack overflow. We hardly ever use hand-rolled linked lists since we have vectors, sequences and all that good stuff. Also the reasons Clojure does not do tail call optimization do not apply to malli so our schema compilers could do tail call optimization 😄

I think the local registry concept is too dynamic and confusing when compared to scoping schemas. Local recursion does not need the flexibility that registries/namespaces provide. It's kind of like R versus Scheme where R is weird and slow and Scheme is so elegant and fast 😜

*: Not actually the polymorphic deref but the Var-specific method.
**: See https://people.mpi-sws.org/~dreyer/papers/recursion/popl.pdf and https://people.mpi-sws.org/~rossberg/mixml/mixml-toplas.pdf for some extremely heavy reading on this.

@RafaeLeal
Copy link
Author

I agreed that :loop is not declarative and its a procedural concept that does not fit here.

I think that local recursion is interesting because I assume that we want serializable schemas to pass them around multiple nodes/instances. If we have only support recursion referencing the registry, we may need to sync all nodes registries, which could be a problem.
But maybe I'm just overthinking it.

@ikitommi
Copy link
Member

The registry map is always passed as a value, so it can be considered as immutable. The schema defined/local registries are serialized with schemas.

@nilern
Copy link
Contributor

nilern commented Nov 21, 2019

I think that local recursion is interesting because I assume that we want serializable schemas to pass them around multiple nodes/instances. If we have only support recursion referencing the registry, we may need to sync all nodes registries, which could be a problem.

Interesting point, I was not thinking of such dynamic use cases even though they are a key novelty in this library. I do suspect that some degree of registry syncing is inevitable.

The registry map is always passed as a value, so it can be considered as immutable. The schema defined/local registries are serialized with schemas.

I agree that local registries can be implemented and with little effort. I just find that they are too low level, which has both UX and performance implications:

  1. The registry should be just an implementation detail to schema authors, not leak into the schemas so that you have to think of a stack of registries or registry merge. Whereas the :letrec construct has higher level, obvious semantics: it is just like letfn, but with schemas.
  2. Both options can be implemented with a stack/merge of registries, but the :letrec one does not have to be. I think we want a 'zero cost abstraction'; you cannot hand-code it any faster. But if you use a stack of registries, I imagine you could hand-code a faster checker/coercer/etc. by using letfn or even loop!

@ikitommi
Copy link
Member

If we start with syntax: I would like to see better alternatives to:

[:and
 {:registry {::org [:map
                    [:organization/name string?]
                    [:organization/member [:vector :person]]]
             ::person [:map
                       [:person/name string?]
                       [:person/parent ::person]
                       [:person/orgs [:vector ::org]]]}}
 ::person]

, which is IMO, clean and simple.

@nilern
Copy link
Contributor

nilern commented Nov 21, 2019

, which is IMO, clean and simple.

The :letrec one is even more so (although I am not sure if it has the right amount of brackets or braces). :and has nothing to do with local definitions. Or do you propose allowing :registry in the property map of any schema? That would be madness, so easy to miss either the local definitions or the main schema when skimming.

@ikitommi
Copy link
Member

@ikitommi
Copy link
Member

So, this is bad:

[:and
 {:registry {::org [:map
                    [:organization/name string?]
                    [:organization/member [:vector :person]]]
             ::person [:map
                       [:person/name string?]
                       [:person/parent ::person]
                       [:person/orgs [:vector ::org]]]}}
 ::person]

and this is good:?

[:letrec
 [::org [:map
         [:organization/name string?]
         [:organization/member [:vector :person]]]
  ::person [:map
            [:person/name string?]
            [:person/parent ::person]
            [:person/orgs [:vector ::org]]]]
 ::person]

@nilern
Copy link
Contributor

nilern commented Nov 21, 2019

Good and bad are such relative terms. Local registries just feel like an unnecessarily leaky and confusing abstraction, to me.

@nilern
Copy link
Contributor

nilern commented Nov 21, 2019

Since the syntactic form of schemas (i.e. vectors and keywords) can be combined and dissected in arbitrary ways, any local scoping construct will have the same variable hygiene issues as macros. Properly tackling that (or even preventing common classes of errors with ad-hoc methods) would be quite a bit of yak-shaving.

I would find it acceptable that you can add groups of recursive definitions to the registry in a "transaction" (like ML val rec / let rec). However if you prefer the first-class data usage of Schema to the register everything approach of Spec this could feel arbitrary and annoying.

@green-coder
Copy link
Member

One reason why local recursion is useful and not just interesting, it is that it allows a better composition of schemas, in the exact same way Clojure's s-expressions compose together.

Another advantage is that on large schemas, local recursion allows the user to think locally. I can easily imagine people using Malli to model their entire front end's data store, where many independant business logic parts are sitting next to each other, some of them being recursive data structures that other parts don't have to know.

@ikitommi
Copy link
Member

ikitommi commented Jun 18, 2020

Did a quick poke on this, using :ref schema and :id property on schemas. Both -validate and -explain can be made lazy, so the impl is quite trivial. Methods like -accept is eager and need to have local bookkeeping not to walk over same models again and run into infinite loop.

But, currently looks like this:

(def ConsCell
  [:maybe {:id :cons}
   [:tuple int? [:ref :cons]]])

(m/explain ConsCell [1 [2 [3 [4 nil]]]])
; => nil

(m/explain ConsCell [1 [2 [3 [4]]]])
;{:schema [:maybe {:id :cons} [:tuple int? [:ref :cons]]],
; :value [1 [2 [3 [4]]]],
; :errors (#Error{:path [2 2 0 2 2 0 2 2 0 2],
;                 :in [1 1 1],
;                 :schema [:tuple int? [:ref :cons]],
;                 :value [4],
;                 :type :malli.core/tuple-size})}

Qurestion: if the same :id is used with different levels, should it throw or should the last one win? (like with clojure let)

@rschmukler
Copy link
Contributor

My gut would be to throw instead of letting last writer win. I could imagine that being a very confusing "bug" to try and track down when using the library. I don't see much upside in allowing the flexibility of overwriting - but maybe others see a use case I'm not seeing?

@green-coder
Copy link
Member

I think that the inner definition should win. If treat it as error, it means that the user can no longer blindly combine schemas together.

@ikitommi
Copy link
Member

Mutual recursion needs a way to define schema graphs. Maybe something like:

(explain
  [:registry
   {:schemas [[:maybe {:id :ping} [:tuple [:eq "ping"] [:ref :pong]]]
              [:maybe {:id :pong} [:tuple [:eq "pong"] [:ref :pong]]]]}
   [:ref :pong]]
  ["ping" ["pong" ["ping" nil]]])
: => nil

or:

(explain
  [:registry
   {:schemas {:ping [:maybe [:tuple [:eq "ping"] [:ref :pong]]]
              :pong [:maybe [:tuple [:eq "pong"] [:ref :pong]]]}}
   [:ref :pong]]
  ["ping" ["pong" ["ping" nil]]])
: => nil

Will make a separate PR of the impl. let's keep this open as the history.& discussion.

@rschmukler
Copy link
Contributor

Regarding the ability to blindly combine schemas together - that's a good consideration.

Is the worry for libraries that both provide schemas, or do you anticipate seeing this at the application level?

It's also worth considering that a user wouldn't be "stuck" - at the application level, I'd argue that they should just rename one of the IDs (or keep them in separate registries). At the library level they'd be required to perhaps modify one of the schemas using some of the malli.utils functions.

I'm not necessarily saying that we favor this approach, but just wanted to point out that unlike something like spec, developers can get around two libraries providing the same IDs. I would expect that registries are controlled by applications, not libraries.

Having to do the above work-around would definitely be work - and boilerplatey work at that - but it's work that could be directed by malli's error messages, as opposed to perhaps confusingly have one library unknowingly overwrite the other.

I'm not super strong in my convictions here - just enumerating some thoughts out loud.

@green-coder
Copy link
Member

green-coder commented Jun 19, 2020

@ikitommi I would recommend the second one, as order between bindings is not needed. AFAIK, the first one has no practical use case. I had initially implemented the first one in minimallist, but then changed it to the second one after some trials.

@ikitommi
Copy link
Member

ikitommi commented Jun 22, 2020

Not happy that the Option 2 would adds ambiguity. What is a schema has also :id?

[:registry
 {:registry 
  {:ping [:maybe {:id :pong} [:tuple [:= "ping"] [:ref :pong]]] ;; :ping or :pong?
   :pong [:maybe [:tuple [:= "pong"] [:ref :pong]]]}}
 [:ref :pong]]

I think the local definition (e.g. the key in :registry should win here), resulting:

{:ping [:maybe {:id :pong} [:tuple [:= "ping"] [:ref :pong]]]
 :pong [:maybe [:tuple [:= "pong"] [:ref :pong]]]}

, BUT if the :id attribute is not removed, it will cause problems when walking the schemas as the code might think the first one is still :pong. One option would be to rewrite the registry here, but it's extra code and doing that silently would be bad idea IMO, might not be the resolution user would expect. I'm starting to agree with @rschmukler that the default should be to throw. The action (to throw or to make the last one win) could be configured via options?

There would be three ways to do recursion, too many ways?

  1. schema-defined: using schema :id as :ref target
  2. local: using :registry schema :registry map keys as :ref target
  3. registry-based: using the normal schema registry entries

the 3 is a must anyway and I believe 2 is too - to allow sending schema graphs over the wire. 1 would be nice to avoid boilerplate, e.g. instead of:

[:registry
 {:schemas {:cons [:maybe [:tuple int? [:ref :cons]]]}}
 [:ref :cons]]

one could say:

[:maybe {:id :cons}
 [:tuple int? [:ref :cons]]]

If all three would be supported, the precedence could be 2->3->1 and the default would be to throw on overriding definitions? using fully qualified keys (or gensym) would allow fully local names that don't conflict.

Far from being sure how to implement this, so comments most welcome.

@ikitommi
Copy link
Member

ikitommi commented Jun 22, 2020

Related, from Slack:

About recursive schemas. It seems that because of those, should to to add a (-children [this]) method into Schema protocol. Why? Currently, there is generic m/children function that returns the Schema AST for the children. The AST is unaware of any instance bindings such as local recursion targets. Given a Schema:

[:maybe {:id :cons}
 [:tuple int? [:ref :cons]]]

, the generator for :tuple just sees children of (int? [:ref :cons]) and doesn’t know what :cons refers to and fails.

We could hack around this by pulling out options that created :tuple and using those to re-create the child schemas, providing all the needed local context. But, this is error-prone as one needs to wire-up all generators, visitors etc. using the original options from the father schema. Returning actual schemas instead of AST. m/map-entries already returns schemas, not AST:

(-> (m/map-entries [:map [:x int?]]) (first) (last) (type))
; => :malli.core/schema

(-> (m/children [:map [:x int?]]) (first) (last) (type))
; => clojure.lang.Symbol

(currently passing the local context via options, which seems to be a good way to do it)

Anyway, this works now:

malli-rec

with recursion, one can set the recursion limit, just like with spec. One setting, but each :ref is counted separatly, defaulting to 10.

@ikitommi
Copy link
Member

ikitommi commented Jun 22, 2020

One more thing related to silent overrides. There are two thing: overriding schemas in registry and just overriding :ref targets. Silent override of schemas in registry would enable things like this:

(m/validate
  [:registry
   {:registry {:and (m/-or-schema)}}
   [:and int? string?]]
  "fail")
; => true

@hkupty
Copy link

hkupty commented Jun 22, 2020

I've been experimenting writing a json-schema -> malli layer (future PR might come if I'm successful) and I guess what I'm missing is this. The {:id :thing}, [:ref :thing] pair maps well to json-schema, so I guess that'd be the way to go. Looking forward to this being merged.

@ikitommi ikitommi mentioned this pull request Jun 22, 2020
6 tasks
@ikitommi
Copy link
Member

looking forward to that @hkupty ! The new draft PR in #209 .

@ikitommi
Copy link
Member

Checked the mutual recursion case, looks like quite simple things: eaea510:

(mg/generate
  ::ping
  {:registry (mr/composite-registry
               (m/default-schemas)
               {::ping [:maybe [:tuple [:= "ping"] [:ref ::pong]]]
                ::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]})
   :size 7, :seed 86})
; => ["ping" ["pong" ["ping" ["pong" ["ping" nil]]]]]

tested also a fail-fast on ambiguity with refs, so instead of:

(mg/generate
  ::ping
  {:registry (mr/composite-registry
               (m/default-schemas)
               {::ping [:maybe {:id ::pong} [:tuple [:= "ping"] [:ref ::pong]]]
                ::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]})
   :size 7, :seed 86})
; => ["ping" ["ping" ["ping" ["ping" ["ping" nil]]]]]

the default code will throw instead:

Execution error (ExceptionInfo) at malli.core/fail! (core.cljc:80).
:malli.core/ambiguous-ref {:type :ref, :ref :user/pong}

@ikitommi
Copy link
Member

One idea about the ambiguity. Should there be a way to define some ref as being local? having an explicit local reference type, would allow fully local recursion (and normal references) without having errors with registry-level things. Something like:

1) new :local type:

(mg/generate
  ::ping
  {:registry (mr/composite-registry
               (m/default-schemas)
               {::ping [:maybe {:id ::pong} [:tuple [:= "ping"] [:local ::pong]]] ;; looks just local refs
                ::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]})
   :size 7, :seed 86})
; => ["ping" ["ping" ["ping" ["ping" ["ping" nil]]]]]

2) type-tagging :ref

(mg/generate
  ::ping
  {:registry (mr/composite-registry
               (m/default-schemas)
               {::ping [:maybe {:id ::pong} [:tuple [:= "ping"] [:ref {:type :local} ::pong]]] ;; looks just local refs
                ::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]})
   :size 7, :seed 86})
; => ["ping" ["ping" ["ping" ["ping" ["ping" nil]]]]]

@ikitommi
Copy link
Member

This works now too:

(mg/generate
  [:registry
   {:registry {::ping [:maybe [:tuple [:= "ping"] [:ref ::pong]]]
               ::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]}}
   [:ref ::ping]]
  {:size 7, :seed 86})
; => ["ping" ["pong" ["ping" ["pong" ["ping" nil]]]]]

comments/suggestions on :registry syntax welcome.

love when things are simple to implement: f7f760e.

@hkupty
Copy link

hkupty commented Jun 23, 2020

Seems like the ambiguity issue is regarding a "name" reference and an "id" reference. Wouldn't introducing wording like "local" make things confusing? if [:ref ::thing] tries to a schema either named as ::thing or with a ::thing id attribute, maybe something like [:ref {:lookup :id} ::thing]? or maybe always lookup to id, requiring ref-able schemas to have {:id ::thing}?

@rschmukler
Copy link
Contributor

I personally lean towards the [:ref {:type :local}] rather than the new :local type - since it seems to me that it is the same type with just a different configuration of how it behaves. This feels consistent with other property based modifications in the library (eg. the latest options in :string for ensuring length, trimming, etc.).

Regarding @hkupty's comment, I think as proposed by @ikitommi the :id is required for something to be locally referable. I think it may not be worth the overhead of making this field configurable as I imagine it could be extra confusing to have different schemas using different properties for it. That makes the consumer of [:ref] not only have to be aware of the value of the property, but the property itself.

I do wonder if the property :id is perhaps to generic / overloaded. Maybe :malli/local-id or something? Other libraries (or applications) may want to use the :id key for their own purposes, so I think we should consider it before squatting on the :id property of a top level schema.

@ikitommi
Copy link
Member

ikitommi commented Jun 24, 2020

Re-read the reference handling from JSON Schema. which supports all kind of pointers: relative, absolute and cross-document (absolute + relative).

{
  "$id": "http://example.com/path/to/user.json",
  
  "type": "object",
  "properties": {
    "email": {
      "$ref": "#/definitions/personal/email"
    },
    "birthday": {
      "$ref": "#/definitions/personal/birthday"
    },
    "settings": {
      "$ref": "user-settings.json#/definitions/settings"
    },
    "info": {
      "$ref": "../info.json#"
    },
    "root": {
      "$ref": "/other/path/to/schema.json#/definitions/root"
    },
    "external": {
      "$ref": "http://external.example.com/some-schema.json#/definitions/name"
    }
  },
  
  "definitions": {
    "personal": {
      "email": {
        "type": "string",
        "format": "email"
      },
      "birthday": {
        "type": "string",
        "format": "date"
      }
    }
  }
}

Some takeways:

  • the :definitions (e.g. :registry) can be defined at any level => should we do this too?
  • each schema document or element can have id property (was $id earlier), which can be relative or absolute
    • I believe when JSON Schema documents are loaded, all the id should be collected so that they can be referenced
  • absolute pointers can have two parts: pointer to the document + # + relative pointer within the document (or the id)
  • relative pointers are local within a JSON Schema document, allow pointing anywhere within a document BUT not outside od the current document: if you have nested :definitions, you can't point upwards unless using absolute pointer with the upward schema id + local reference there.
    • the current design of Malli differs here: :local definitions are nested, it pics the closest one, which could be anywhere in the schema tree upwards, e.g.
(m/validate
  [:registry {:registry {::id [:string {:min 1}]}}
   [:registry {:registry {::id2 [:string {:min 2}]}}
    [:ref ::id]]] ;; from level up
  "1")
; => true

I think the absolute pointing could be introduces later (if needed) by giving registries an scope or an id/name. Besides the default registry, one could have multiple co-existing registries and pointing to a scoped/named registry would be done using a path sequence, e.g. [registry-id, registry-key]

;; reference to default registry
:email ;; eager
[:ref :email] ;; lazy
[:ref {:type :registry} :email] ;; full notion

;; reference to email from registry named `:users`
[:ref [:users :email]] ;; always lazy
[:ref {:type :registry} [:users :email]] ;; full notion

;; local reference
[:ref {:type :local} :email] ;; always lazy

@ikitommi
Copy link
Member

ikitommi commented Jun 25, 2020

Greeting from the (morning) hammock. And big thanks for giving feedback on the issue. Would like to get this right from the beginning.

Some developments:

1) registries everywhere

Pushed a small spike about allowing the :registry keyword to exist in any schema properties, like in JSON Schema. So, instead of:

(mg/generate
  [:registry
   {:registry {::ping [:maybe [:tuple [:= "ping"] [:ref ::pong]]]
               ::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]}}
   ::ping])

One could say:

(mg/generate
  [:and
   {:registry {::ping [:maybe [:tuple [:= "ping"] [:ref ::pong]]]
               ::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]}}
   ::ping])

I'm happier with this one.

Any thoughts on this?

2) registry nesting and masking

Also, added option to create Schemas without checking the :refs, used in to create forms of the :registry so that it survives the de/serialization process.

With the current design, nested registries (with holes) work ok and it uses the let-style silent overrides:

(mg/generate
  [:and {:registry {::pong [:maybe [:tuple neg-int? [:ref ::ping]]]
                    ::ping [:and {:description "effectively masked"} any?]}}
   [:and {:registry {::ping [:maybe [:tuple pos-int? [:ref ::pong]]]}}
    ::ping]]
  {:seed 1})
; => [2 [-1 [2 nil]]]

As a side-note, one could override any core schema like :and with the local registries. This might not be a good thing(?). One option would be to limit the registry entries to fully qualified keywords, "the user schemas" or do some other separation of "syntax" and "user" schemas.

any thoughts on this?

3) need for local :refs

This was the target:

[:maybe {:id :cons}
 [:tuple int? [:ref :cons]]])

But this seems the suggested way to do it:

[:maybe {:malli/id :cons}
 [:tuple int? [:ref {:type :local} :cons]]])

Which is IMO worst than the official:

[:and {:registry {::cons [:maybe [:tuple int? [:ref ::cons]]]}}
 ::cons]

I think the local ref thing should either be removed or the ambiguity rules should be changed so that it would also override and would be standalone and simple:

[:maybe {:malli/id ::cons}
 [:tuple int? [:ref ::cons]]]

thoughts?

@rschmukler
Copy link
Contributor

rschmukler commented Jun 25, 2020

  1. Registries as schema properties make sense to me. I like it.

  2. My gut is that I wouldn't restrict users to having to use namespaced keywords. I understand that it's protecting from unintentional overrides, but to me it feels like stepping away from local registries and towards global ones. The benefit of the local registry is that I don't have to be concerned with what the global state is, nor do I have to modify my conventions (eg. un-namespaced keywords) to make something work. If we are enforcing namespaced keywords anyway, why not just have a global registry?

  3. Personally, I like the idea of doing away with the :mali/id or :id and just using :ref and registry based lookup to allow recursive schemas. I don't see much advantage in introducing a separate resolution mechanism for :ref, especially if we have local registries.

I think

[:and {:registry {::cons [:maybe [:tuple int? [:ref ::cons]]]}}
 ::cons]

makes the most sense. It is a bit explicit, but, recursive schemas are advanced, and this seems simpler than a separate resolution mechanism. I think any author who is writing recursive schemas will also have a good grasp of registries.

As an aside, it might be worth adding a shorthand to allow something like:

[{:registry {::cons [:maybe [:tuple int? [:ref ::cons]]}
  ::cons]

This shorthand would be useful for any schema that needs to anonymously define a property, as I have had a few cases where I've needed [:and properties schema] crop up in applications.

@ikitommi
Copy link
Member

Thanks for the comments. Implemented the recursion in #209. Removed the local recursion in favour of using local registries and added eager :schema and :malli.core/schema reference types.

The registry shorthand notation: Doing a generic walk (using m/accept) and adding :title property to all schemas would mean that the registry shorthand would get a new Schema :title registered so I think it's better to use the explicit :registry key.

The [:and properties schema] can be now presented as [:schema properties schema] and has always been available as [schema properties] (e.g. [string? {:title "string"}]).

Will run some more tests, if no new comments arrive, will merge the recursion to master and close both issues.

cheers.

@ikitommi
Copy link
Member

thanks @RafaeLeal for the original draft, and for everyone for the comments. merged #209 master. Post-comments still welcome.

@ikitommi ikitommi closed this Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants