Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try adding support for JsonNode and full resolution of Object Ids, without need for @JsonIdentityInfo #120

Open
cowtowncoder opened this issue Feb 14, 2019 · 16 comments
Labels
yaml Issue related to YAML format backend

Comments

@cowtowncoder
Copy link
Member

(note: lots of background, and follow-up for #98)

So: handling of YAML Anchors, References, is problematic since: JSON has no directly equivalent notion. And although similar logical concept has been added by Jackson -- Object Id, via @JsonIdentityInfo -- it is both bit awkward to use and limited in scope (i.e. not handling all cases).

But it might be possible to implement full, no-annotation style resolution just for "Tree Model" (that is, JsonNode). Since there is already

JsonParser.canReadObjectId()

to indicate if format is capable of expressing Object Ids natively (returning true for YAML), it might be possible to combine this with another setting that essentially indicates that automatic resolution should be used.
I am not 100% sure if this is indeed doable, especially with 2.x, but I think it is worth exploring.
And with Jackson 3.0 it should be even more likely to be doable.

So. If and when I have time to investigate this, I'd like to see if this can be made.

The obvious corollary, then, would be that much of YAML processing would first use readTree() (and equivalents) to resolve all references, and then either use that or use mapper.treeToValue() to get to actual POJOs, ones that need not handle anchors/refs at all.

@cowtowncoder cowtowncoder added the yaml Issue related to YAML format backend label Feb 14, 2019
@nkiesel
Copy link

nkiesel commented Feb 14, 2019

Interesting (I'm still trying to process all the info from #120 and #121). A JSON array and a JSON object are both JsonNode, correct?

For reference, here is a (shortened) version of a YAML file i'm trying to parse:

vertices:
  - &v1
      name: V1
      description: First vertex
  - &v2
      name: V2
      description: Second vertex
edges:
  - [*v1, *v2]

@nkiesel
Copy link

nkiesel commented Feb 14, 2019

Should this not also get the 3.x label?

@cowtowncoder
Copy link
Member Author

cowtowncoder commented Feb 14, 2019

@nkiesel wrt 3.x probably, but I have not yet made up my mind that it is impossible to do with 2.10.

And as to array, object, yes, JsonNode is the base type of ArrayNode, ObjectNode. YAML can be mapped quite easily to those, although there is no place to store Object Ids or Type Ids.
Come to think of that, this might be problematic wrt latter.

I still can't read YAML fluently but would this be similar to what you'd expect to see in tree model (equivalent to this json)?

{
  "vertices" : [{
      { "name" : "V1", .... },
      { "name" : "V2", .... }
   }]
  "edges" : [ { ... v1 properties... } , { ... v2 properties... } ]
}

This also underscores one obvious challenge I think I mentioned in the other issue: handling of cyclic dependencies... JsonNode does not handle those well. Or, rather, callers might not.
But on plus side, object model itself theoretically allows them; and actual values would not need to be copied either.

@nkiesel
Copy link

nkiesel commented Feb 15, 2019

Your JSON is close, you just have one extra pair of {} inside the vertices array and a missing pair of [] inside the edges:

{
  "vertices": [
        {"name":"V1","description":"First vertex"},
        {"name":"V2","description":"Second vertex"}
  ],
  "edges": [
    [ {"name":"V1","description":"First vertex"}, {"name":"V2","description":"Second vertex"} ]
  ]
}

@nkiesel
Copy link

nkiesel commented Feb 15, 2019

Attached are 2 little Python scripts (renamed to ".txt" to be able to attach them) I use for converting between JSON and YAML. One caveat is that j2y does not try to find and create references (arguably correct; just because 2 nodes are the same might not mean that they should be treated as identical)
j2y.txt
y2j.txt

@nkiesel
Copy link

nkiesel commented Feb 15, 2019

BTW: YAML only allows to reference "anchors" that were defined earlier and thus does not allow forward references. Thus, you cannot create cycles using YAML anchors and aliases.

@nkiesel
Copy link

nkiesel commented Feb 15, 2019

Just found this in the YAML 1.2 specification: "The alias refers to the most recent preceding node having the same anchor.". Thus, these anchors do not have to be unique in the YAML document. This could put a dent in your "object id" idea, no?

@cowtowncoder
Copy link
Member Author

Interesting: I did not realize that only backward references are allowed. This does simplify life a little bit; however, ability to reuse anchors would be quite problematic as Jackson assumes uniqueness of ids (within scope, usually type). But I wonder if reuse is commonly used....

@nkiesel
Copy link

nkiesel commented Feb 19, 2019

I guess I'm underestimating the problem but if we would only cover anchors in parsing, then it seems what would be needed to manage an updateable Map<String, JsonNode> and replace all references with their map value as the YAML is parsed.

Generating YAML with anchors is much harder because it would require to detect common JsonNode. Also, a given PoJo could be converted into multiple semantically equivalent YAML strings.

@cowtowncoder
Copy link
Member Author

The problem is not complex at logical level, but rather at implementation. Databind that deals with Java objects (including representations like JsonNode) are fully format-agnostic, and operate through limited set of shared abstractions, modelled based on original JSON use case.
System is designed to support forward references as well (something not need with YAML, apparently), which is good except that it does complicate handling.

@nkiesel
Copy link

nkiesel commented Feb 25, 2019

Understood.

Regarding re-use of forward references: not in my use cases, I exclusively use them to have a more compact - and easier to understand - structure. But not sure if that is the case generally. I could envision a YAML file where someone uses these references to always point to the "current item of type ".

@jpassaro
Copy link

Regarding re-use of forward references: not in my use cases, I exclusively use them to have a more compact - and easier to understand - structure.

Seconded!

@bentmann
Copy link

combine this with another setting that essentially indicates that automatic resolution should be used.

Just to emphasize the importance of being able to disable automatic resolution: The feature can be misused and leads to fun stuff like https://nvd.nist.gov/vuln/detail/CVE-2017-18640

@cowtowncoder
Copy link
Member Author

@bentmann Just to make sure: I have no intention of enabling anything that would resolve references to external resources, but only to document nodes within same input document. I am well aware of security issues that come from resolving to external resources (many xml issues for that, some yaml too as you rightly point out).

@bentmann
Copy link

It's not just about resolving external references but generally defending against expansion bombs, cf. https://en.wikipedia.org/wiki/Billion_laughs_attack#Variations

@cowtowncoder
Copy link
Member Author

Ah. Yes, there's that too, good point (I am familiar with that problem from xml context).

It would only be applicable if Object Id was resolved by expanding contents; my original thinking was (... I think) to retain actual identity and use loops. This would not expand size of content, although code using such structures (possibly cyclic graphs, if YAML [and more specifically, SnakeYAML] allows this -- I am not sure YAML does actually) would need to be aware that to avoid problems at level.

So ability to prevent expansion probably makes sense no matter what.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
yaml Issue related to YAML format backend
Projects
None yet
Development

No branches or pull requests

4 participants