-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 494- Define "map" XDM type #511
Conversation
@cmathis, @harleensahni, @lrosenthol, @ogoldman please take a look. This proposal is based on the conversation on the initial design given by @cmathis, modified based on a conversation I had with @lrosenthol and @ogoldman earlier this week. |
Doesn't the meta:xdmType = map already tell you it should be treated as a map? For example:
If a customer sends data in the form of JSON it would look like: The consumer can see that mapField should be interpreted as a Map and store all the key/value pairs that way. The other way this could be stored is as a struct which is more of a traditional object. In this case could we continue using the meta:xdmType (which everyone is looking for) and instead have meta:xdmType = object to imply this data should be stored as a dynamically growing struct/object? |
Hey @cmathis, after speaking with @lrosenthol, @ogoldman a couple of concerns were raised with that approach:
|
So what is the xdmType for a field defined as:
|
There isn't one @cmathis - that's the point. We don't want to use |
So let's say I have an XDM schema like this:
If I'm trying to map this schema into a parquet schema, how do I know that From the registry we use to not have an xdmType for the JSON schema type:array and type:object fields and that resulted in tickets which users flagged as a bug and demanded that every field have an xdmType. I cannot surface fields that do not have explicit data types that are exposed the same way across all fields. |
The way you know what should be treated as a struct vs. a map is
|
We are being 100% consistent.
|
I understand. I guess I was considering a "map" as a type of scaler (which I agree it's not). I just wish it was a little more intuitive. Could we use meta:storageHint for everything - xdm:storeAsByte, xdm:storeAsLong, xdm:storeAsDate, etc. This is really what users are using the xdmType for today - how should I represent this field in my database, parquet, etc. |
I'm with @cmathis on this one: call it storageHint, and use it consistently, instead of xdmType. (Otherwise, we're not really using the JSON Schema type system; we're using two type systems at once. In that case we may as well introduce a map type. But I fear this will be endlessly complicated to maintain.) |
@cmathis is there something we can do with the naming of the two keys which might help you? |
@ogoldman The (known) problem with the JSON Schema typing model is that it is limited and the JSON-LD community is solving it the wrong way (IMO) via the |
@lrosenthol - This may be one of those areas where we pull in Product Management or some of the end users (individuals from Vasanthi Holtcamp's data on-boarding team) to get their opinion. When we started talking about Map support in XDM, I assumed it would be in the form of an xdmType=map. I may be the only one that thought this way. Would be interesting to see how PM was planning to define a map field in the UI editor and if they planned on adding a Map type next to the existing Integer, Long, Double, etc. This may be a separate discussion, but as @ogoldman mentioned the xdmType has introduced a second type system which has been a pain. As you know on the XC side users are expecting an XDM schema to be translatable into Postgress tables, Parquet, CosmoDB, Java objects, you name it. This is what demanded the creation of a richer set of data types (xdmType), but it's not an intuitive process for users trying to define these types in JSON schema. According to the documentation (https://wiki.corp.adobe.com/pages/viewpage.action?spaceKey=DMSArchitecture&title=XDM+Architecture#XDMArchitecture-XDMDataTypes) XDM supports long, date, short, integer, etc. I'm constantly responding to Jira tickets that say "I tried to define a schema with |
Can we circle back on why It seems like a natural fit for map fields as it would still be metadata that made it easy to understand what was implied by the |
I definitely feel that we need to keep the definition of "meta:xdmType" for scalar properties. As @lrosenthol observed, that tag is intended on giving clear and specific semantics to the JSON types, where the JSON type system is too broadly defined compared to the type systems we interop with. Those definitions are type definitions, in the same way that the JSON types are, although they are carefully defined to also be fully compatible with the underlying JSON types. For me, the only question is whether a "map" is really a new type, or is just a modifier on the object type. This PR says the later: maps are still objects, but with an unconstrained keyset which may be desirable to store differently than a regular object. But we could also go with the other interpretation, that map is a type distinct from object, in which case we would include it directly in "xdmType". |
So let's say that we opened up 1 - Why do we allow it for objects and not arrays? So either we document what we have today (+ |
@lrosenthol - I'm not sure what you mean by oddity number 1. Arrays in XDM are defined as a fields with Yes, because we are using JSON schema to represent richer data types we are going to need a documentation no matter what we do. Users need to know that sometimes a raw Anyone apposed to me posting a poll in the #xdm-questions channel to see what the broader audience would expect to see? |
Just spoke with @kstreeter a little about this. If we go the route of using the xdmType for defining richer scaler types only, then every consumer of XDM schemas (ETL connectors, Data Ingestion, Unified Profile, Campaign, etc.) that need to map the fields to a storage format will need to do the following:
You basically have three locations to check now. If everyone feels like this is the best way to go then so be it, but let's please send this out as a formal announcement to all platform users so they are aware of this requirement. We will need to update the ETL vendor documentation as well. This needs to be published as the formal way to find the storage format (some would say actual data type) so I do not get tickets against the registry saying that every field is suppose to have an xdmType which is the way it's been. |
Corrected typos, no semantic change.
@cmathis, @kstreeter, @lrosenthol -> I agree with the PR as it stands. Just catching up on this. The salient question to me after reading the thread is do we decide to: My perspective is that we need to adhere consistently to the the best technical design (a). Intermingling schema definition with value formats is wrong. Usability is important but this will just make people who understand JSON-Schema/JSON-LD feel that they cannot trust their natural understanding of XDM. Usability will have to be delivered by helpful UX and for APIs effective validation with complete error messaging. @cfraser - to your comment about what will we then do if/when we have other non-scalar types. The pattern for when we have a concrete new complex type is to externalize it as its own schema file for inclusion, via an |
Catching up on this as well and a few comments:
|
@cmathis That is incorrect. |
Since I seem to be outnumbered here, I am willing to concede on using |
thanks @lrosenthol that is reasonable we will exclude arrays |
This PR addresses issue #494. It introduces the concept of a "storage hint" that can be applied to a model or property definition to give storage and transmission systems additional hints on how to efficiently handle the data, without changing the semantic meaning of the underlying property.
The initial use of this is to support storage of "map-like" data in XDM. This support is described in the docs.