-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive schema #117
Recursive schema #117
Conversation
This would only allow one to recurse schemas that are in direct line. It would not allow for mutual recursion or recursion of siblings, I think. Imagine you are serializing a tree where nodes are organizations and people. People can have organizations as children and organizations can have people as children. How would you define such a schema in Malli? |
Interesting. I haven't thought of that. [:or {:malli/loop [:organization :person]}
[:map
[:organization/name string?]
[:organization/children [:recursive :person]]]
[:map
[:person/name string?]
[:person/children [:recursive :organization]]]] What do you think? |
Well generally the easiest way to solve this is to provide a way to reference a schema a context, and have the reference be used lazily. Ideally it would work like
This is cute, but there's many roadblocks for this, mainly this is not trivially serializable. This solution would play well with I am not sure how this could be implemented, but this syntax is what I think is a desirable syntax. Overall I wouldn't call a recursive definition a |
Quick thinking aloud:
[:and
{:registry {::org [:map
[:organization/name string?]
[:organization/member [:vector [:ref ::person]]]]
::person [:map
[:person/name string?]
[:person/parent [:ref ::person]]
[:person/orgs [:vector [:ref ::org]]]]}}
[:ref ::person]]
[:map {:ref ::country}
[:name [:enum "FI" "PO"]]
[:neighbors [:vector [:ref ::country]]]] |
More though: do we even need the with local registry[:and
{:registry {::org [:map
[:organization/name string?]
[:organization/member [:vector :person]]]
::person [:map
[:person/name string?]
[:person/parent ::person]
[:person/orgs [:vector ::org]]]}}
::person] self recursion[:and
{:registry {::country [:map
[:name [:enum "FI" "PO"]]
[:neighbors [:vector ::country]]]}
::country}] shortcut form (more compact, but not sure if this is really needed?) [:map {:ref ::country}
[:name [:enum "FI" "PO"]]
[:neighbors [:vector ::country]]] |
Interesting. I like the direction that we're going for. Either way, we need to define what is going to happen if you define something on the local registry that is already on the registry. Probably we should just override, but this could lead to confusion on large schemas with multiple local registries. We would probably have a lot of code like this, also: (def schemas {::org [:map
[:organization/name string?]
[:organization/member [:vector :person]]]
::person [:map
[:person/name string?]
[:person/parent ::person]
[:person/orgs [:vector ::org]]]})
(def person [:and {:registry schemas} ::person])
(def org [:and {:registry schemas} ::org]) We could have a function Something like (malli.core/schemas
{::org [:map
[:organization/name string?]
[:organization/member [:vector :person]]]
::person [:map
[:person/name string?]
[:person/parent ::person]
[:person/orgs [:vector ::org]]]}) Regarding the need for a |
There was an earlier discussion about |
to go all in in aligning with clojure as a language, there would be |
TL;DR: Recursion can be implemented by
None of these methods need explicit We should probably use indirection for recursion through globals, i.e. just look up the schema from the registry at runtime or whatever the current implementation is. This allows interactive flexibility where things just keep working with a slightly different registry, just like namespaces. Local recursion should probably be modeled after [:letrec [:tree [:or :int :branch]
:branch [:seq :tree]]
:tree] It seems natural to also implement it the same way, with backpatching. That would avoid the need for nested registries and runtime lookup. It is also obvious that this sort of scoping construct will do name shadowing. And making it a named schema like this means that it is trivially serializable. I am not sure whether we need local recursion at all. I can't think of any schema library that has that capability or a use case where it would be absolutely necessary. Type theory has a concept of first-class recursive types with a fixpoint construct but practical statically typed languages never implement that; recursive types require global (or class/module level) definitions. But if we do decide to add I agree that I think the local registry concept is too dynamic and confusing when compared to scoping schemas. Local recursion does not need the flexibility that registries/namespaces provide. It's kind of like R versus Scheme where R is weird and slow and Scheme is so elegant and fast 😜 *: Not actually the polymorphic |
I agreed that I think that local recursion is interesting because I assume that we want serializable schemas to pass them around multiple nodes/instances. If we have only support recursion referencing the registry, we may need to sync all nodes registries, which could be a problem. |
The registry map is always passed as a value, so it can be considered as immutable. The schema defined/local registries are serialized with schemas. |
Interesting point, I was not thinking of such dynamic use cases even though they are a key novelty in this library. I do suspect that some degree of registry syncing is inevitable.
I agree that local registries can be implemented and with little effort. I just find that they are too low level, which has both UX and performance implications:
|
If we start with syntax: I would like to see better alternatives to: [:and
{:registry {::org [:map
[:organization/name string?]
[:organization/member [:vector :person]]]
::person [:map
[:person/name string?]
[:person/parent ::person]
[:person/orgs [:vector ::org]]]}}
::person] , which is IMO, clean and simple. |
The |
So, this is bad: [:and
{:registry {::org [:map
[:organization/name string?]
[:organization/member [:vector :person]]]
::person [:map
[:person/name string?]
[:person/parent ::person]
[:person/orgs [:vector ::org]]]}}
::person] and this is good:? [:letrec
[::org [:map
[:organization/name string?]
[:organization/member [:vector :person]]]
::person [:map
[:person/name string?]
[:person/parent ::person]
[:person/orgs [:vector ::org]]]]
::person] |
Good and bad are such relative terms. Local registries just feel like an unnecessarily leaky and confusing abstraction, to me. |
Since the syntactic form of schemas (i.e. vectors and keywords) can be combined and dissected in arbitrary ways, any local scoping construct will have the same variable hygiene issues as macros. Properly tackling that (or even preventing common classes of errors with ad-hoc methods) would be quite a bit of yak-shaving. I would find it acceptable that you can add groups of recursive definitions to the registry in a "transaction" (like ML |
One reason why local recursion is useful and not just interesting, it is that it allows a better composition of schemas, in the exact same way Clojure's s-expressions compose together. Another advantage is that on large schemas, local recursion allows the user to think locally. I can easily imagine people using Malli to model their entire front end's data store, where many independant business logic parts are sitting next to each other, some of them being recursive data structures that other parts don't have to know. |
Did a quick poke on this, using But, currently looks like this: (def ConsCell
[:maybe {:id :cons}
[:tuple int? [:ref :cons]]])
(m/explain ConsCell [1 [2 [3 [4 nil]]]])
; => nil
(m/explain ConsCell [1 [2 [3 [4]]]])
;{:schema [:maybe {:id :cons} [:tuple int? [:ref :cons]]],
; :value [1 [2 [3 [4]]]],
; :errors (#Error{:path [2 2 0 2 2 0 2 2 0 2],
; :in [1 1 1],
; :schema [:tuple int? [:ref :cons]],
; :value [4],
; :type :malli.core/tuple-size})} Qurestion: if the same |
My gut would be to throw instead of letting last writer win. I could imagine that being a very confusing "bug" to try and track down when using the library. I don't see much upside in allowing the flexibility of overwriting - but maybe others see a use case I'm not seeing? |
I think that the inner definition should win. If treat it as error, it means that the user can no longer blindly combine schemas together. |
Mutual recursion needs a way to define schema graphs. Maybe something like: (explain
[:registry
{:schemas [[:maybe {:id :ping} [:tuple [:eq "ping"] [:ref :pong]]]
[:maybe {:id :pong} [:tuple [:eq "pong"] [:ref :pong]]]]}
[:ref :pong]]
["ping" ["pong" ["ping" nil]]])
: => nil or: (explain
[:registry
{:schemas {:ping [:maybe [:tuple [:eq "ping"] [:ref :pong]]]
:pong [:maybe [:tuple [:eq "pong"] [:ref :pong]]]}}
[:ref :pong]]
["ping" ["pong" ["ping" nil]]])
: => nil Will make a separate PR of the impl. let's keep this open as the history.& discussion. |
Regarding the ability to blindly combine schemas together - that's a good consideration. Is the worry for libraries that both provide schemas, or do you anticipate seeing this at the application level? It's also worth considering that a user wouldn't be "stuck" - at the application level, I'd argue that they should just rename one of the IDs (or keep them in separate registries). At the library level they'd be required to perhaps modify one of the schemas using some of the I'm not necessarily saying that we favor this approach, but just wanted to point out that unlike something like spec, developers can get around two libraries providing the same IDs. I would expect that registries are controlled by applications, not libraries. Having to do the above work-around would definitely be work - and boilerplatey work at that - but it's work that could be directed by malli's error messages, as opposed to perhaps confusingly have one library unknowingly overwrite the other. I'm not super strong in my convictions here - just enumerating some thoughts out loud. |
@ikitommi I would recommend the second one, as order between bindings is not needed. AFAIK, the first one has no practical use case. I had initially implemented the first one in minimallist, but then changed it to the second one after some trials. |
Not happy that the Option 2 would adds ambiguity. What is a schema has also [:registry
{:registry
{:ping [:maybe {:id :pong} [:tuple [:= "ping"] [:ref :pong]]] ;; :ping or :pong?
:pong [:maybe [:tuple [:= "pong"] [:ref :pong]]]}}
[:ref :pong]] I think the local definition (e.g. the key in {:ping [:maybe {:id :pong} [:tuple [:= "ping"] [:ref :pong]]]
:pong [:maybe [:tuple [:= "pong"] [:ref :pong]]]} , BUT if the There would be three ways to do recursion, too many ways?
the 3 is a must anyway and I believe 2 is too - to allow sending schema graphs over the wire. 1 would be nice to avoid boilerplate, e.g. instead of: [:registry
{:schemas {:cons [:maybe [:tuple int? [:ref :cons]]]}}
[:ref :cons]] one could say: [:maybe {:id :cons}
[:tuple int? [:ref :cons]]] If all three would be supported, the precedence could be 2->3->1 and the default would be to throw on overriding definitions? using fully qualified keys (or Far from being sure how to implement this, so comments most welcome. |
Related, from Slack: About recursive schemas. It seems that because of those, should to to add a [:maybe {:id :cons}
[:tuple int? [:ref :cons]]] , the generator for We could hack around this by pulling out options that created (-> (m/map-entries [:map [:x int?]]) (first) (last) (type))
; => :malli.core/schema
(-> (m/children [:map [:x int?]]) (first) (last) (type))
; => clojure.lang.Symbol (currently passing the local context via options, which seems to be a good way to do it) Anyway, this works now: with recursion, one can set the recursion limit, just like with spec. One setting, but each |
One more thing related to silent overrides. There are two thing: overriding schemas in registry and just overriding (m/validate
[:registry
{:registry {:and (m/-or-schema)}}
[:and int? string?]]
"fail")
; => true |
I've been experimenting writing a json-schema -> malli layer (future PR might come if I'm successful) and I guess what I'm missing is this. The |
Checked the mutual recursion case, looks like quite simple things: eaea510: (mg/generate
::ping
{:registry (mr/composite-registry
(m/default-schemas)
{::ping [:maybe [:tuple [:= "ping"] [:ref ::pong]]]
::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]})
:size 7, :seed 86})
; => ["ping" ["pong" ["ping" ["pong" ["ping" nil]]]]] tested also a fail-fast on ambiguity with refs, so instead of: (mg/generate
::ping
{:registry (mr/composite-registry
(m/default-schemas)
{::ping [:maybe {:id ::pong} [:tuple [:= "ping"] [:ref ::pong]]]
::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]})
:size 7, :seed 86})
; => ["ping" ["ping" ["ping" ["ping" ["ping" nil]]]]] the default code will throw instead: Execution error (ExceptionInfo) at malli.core/fail! (core.cljc:80).
:malli.core/ambiguous-ref {:type :ref, :ref :user/pong} |
One idea about the ambiguity. Should there be a way to define some ref as being local? having an explicit local reference type, would allow fully local recursion (and normal references) without having errors with registry-level things. Something like: 1) new :local type:(mg/generate
::ping
{:registry (mr/composite-registry
(m/default-schemas)
{::ping [:maybe {:id ::pong} [:tuple [:= "ping"] [:local ::pong]]] ;; looks just local refs
::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]})
:size 7, :seed 86})
; => ["ping" ["ping" ["ping" ["ping" ["ping" nil]]]]] 2) type-tagging :ref(mg/generate
::ping
{:registry (mr/composite-registry
(m/default-schemas)
{::ping [:maybe {:id ::pong} [:tuple [:= "ping"] [:ref {:type :local} ::pong]]] ;; looks just local refs
::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]})
:size 7, :seed 86})
; => ["ping" ["ping" ["ping" ["ping" ["ping" nil]]]]] |
This works now too: (mg/generate
[:registry
{:registry {::ping [:maybe [:tuple [:= "ping"] [:ref ::pong]]]
::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]}}
[:ref ::ping]]
{:size 7, :seed 86})
; => ["ping" ["pong" ["ping" ["pong" ["ping" nil]]]]] comments/suggestions on love when things are simple to implement: f7f760e. |
Seems like the ambiguity issue is regarding a "name" reference and an "id" reference. Wouldn't introducing wording like "local" make things confusing? if |
I personally lean towards the Regarding @hkupty's comment, I think as proposed by @ikitommi the I do wonder if the property |
Re-read the reference handling from JSON Schema. which supports all kind of pointers: relative, absolute and cross-document (absolute + relative). {
"$id": "http://example.com/path/to/user.json",
"type": "object",
"properties": {
"email": {
"$ref": "#/definitions/personal/email"
},
"birthday": {
"$ref": "#/definitions/personal/birthday"
},
"settings": {
"$ref": "user-settings.json#/definitions/settings"
},
"info": {
"$ref": "../info.json#"
},
"root": {
"$ref": "/other/path/to/schema.json#/definitions/root"
},
"external": {
"$ref": "http://external.example.com/some-schema.json#/definitions/name"
}
},
"definitions": {
"personal": {
"email": {
"type": "string",
"format": "email"
},
"birthday": {
"type": "string",
"format": "date"
}
}
}
} Some takeways:
(m/validate
[:registry {:registry {::id [:string {:min 1}]}}
[:registry {:registry {::id2 [:string {:min 2}]}}
[:ref ::id]]] ;; from level up
"1")
; => true I think the absolute pointing could be introduces later (if needed) by giving registries an scope or an id/name. Besides the default registry, one could have multiple co-existing registries and pointing to a scoped/named registry would be done using a path sequence, e.g. ;; reference to default registry
:email ;; eager
[:ref :email] ;; lazy
[:ref {:type :registry} :email] ;; full notion
;; reference to email from registry named `:users`
[:ref [:users :email]] ;; always lazy
[:ref {:type :registry} [:users :email]] ;; full notion
;; local reference
[:ref {:type :local} :email] ;; always lazy |
Greeting from the (morning) hammock. And big thanks for giving feedback on the issue. Would like to get this right from the beginning. Some developments: 1) registries everywherePushed a small spike about allowing the (mg/generate
[:registry
{:registry {::ping [:maybe [:tuple [:= "ping"] [:ref ::pong]]]
::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]}}
::ping]) One could say: (mg/generate
[:and
{:registry {::ping [:maybe [:tuple [:= "ping"] [:ref ::pong]]]
::pong [:maybe [:tuple [:= "pong"] [:ref ::ping]]]}}
::ping]) I'm happier with this one. Any thoughts on this? 2) registry nesting and maskingAlso, added option to create Schemas without checking the With the current design, nested registries (with holes) work ok and it uses the (mg/generate
[:and {:registry {::pong [:maybe [:tuple neg-int? [:ref ::ping]]]
::ping [:and {:description "effectively masked"} any?]}}
[:and {:registry {::ping [:maybe [:tuple pos-int? [:ref ::pong]]]}}
::ping]]
{:seed 1})
; => [2 [-1 [2 nil]]] As a side-note, one could override any core schema like any thoughts on this? 3) need for local :refsThis was the target: [:maybe {:id :cons}
[:tuple int? [:ref :cons]]]) But this seems the suggested way to do it: [:maybe {:malli/id :cons}
[:tuple int? [:ref {:type :local} :cons]]]) Which is IMO worst than the official: [:and {:registry {::cons [:maybe [:tuple int? [:ref ::cons]]]}}
::cons] I think the local ref thing should either be removed or the ambiguity rules should be changed so that it would also override and would be standalone and simple: [:maybe {:malli/id ::cons}
[:tuple int? [:ref ::cons]]] thoughts? |
I think [:and {:registry {::cons [:maybe [:tuple int? [:ref ::cons]]]}}
::cons] makes the most sense. It is a bit explicit, but, recursive schemas are advanced, and this seems simpler than a separate resolution mechanism. I think any author who is writing recursive schemas will also have a good grasp of registries. As an aside, it might be worth adding a shorthand to allow something like: [{:registry {::cons [:maybe [:tuple int? [:ref ::cons]]}
::cons] This shorthand would be useful for any schema that needs to anonymously define a property, as I have had a few cases where I've needed |
Thanks for the comments. Implemented the recursion in #209. Removed the local recursion in favour of using local registries and added eager The registry shorthand notation: Doing a generic walk (using The Will run some more tests, if no new comments arrive, will merge the recursion to master and close both issues. cheers. |
thanks @RafaeLeal for the original draft, and for everyone for the comments. merged #209 master. Post-comments still welcome. |
PR to start a conversation about recursive schemas, a prerequisite for json-schema->malli conversion
I added an entry for base-registry that accepts
:recursive
as a schemaThis enables us to define recursive schemas even without using the registry.
Something like
I think we need to deal to schemas like
:recursive
,[:and :recursive]
,[:or :recursive]
or anything that doesn't have any real value and can lead to StackOverflow on evaluationI was thinking in maybe support loop/recur, with something like:
What do you think?