Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a need for both an authored and a canonical manifests? #31

Closed
iherman opened this issue Aug 13, 2019 · 51 comments
Closed

Is there a need for both an authored and a canonical manifests? #31

iherman opened this issue Aug 13, 2019 · 51 comments

Comments

@iherman
Copy link
Member

iherman commented Aug 13, 2019

At the moment, there is an authored and a canonical manifest, with a separate canonicalization step to transform the authored manifest into the canonical one. The goal is to allow the author to express data more succinctly (eg, use only simple file names instead of complete LinkedResource instances or person names instead of Person structures).

It was raised, in #11, that the price being paid for having this is too high:

For the few people typing these by hand, [...] but for the vast majority of implementers (i.e. CMS's generating these manifests), I think they'd find the consistency of using the canonical representation (and the lack of "overhead" of needing it to be canonicalized every time...) to be a win. (#11 (comment))

Question: do we want to simplify the manifest by removing this extra step and defining the manifest purely in terms of what is currently called the canonical manifest?

@iherman
Copy link
Member Author

iherman commented Aug 13, 2019

@iherman
Copy link
Member Author

iherman commented Aug 13, 2019

@iherman iherman changed the title Is there a need for an authored and canonical manifest? Is there a need for an authored and canonical manifests? Aug 13, 2019
@iherman iherman changed the title Is there a need for an authored and canonical manifests? Is there a need for both an authored and a canonical manifests? Aug 13, 2019
@iherman
Copy link
Member Author

iherman commented Aug 13, 2019

My own opinion: as a technical person, I would be happy to use a canonical manifest only, i.e., to remove the notion of an authoring manifest. It would undeniably simplify the spec and make the technical content cleaner.

This is, however, a usability issue, and the question really relies on what the community would accept or not; I do not have the necessary experience to have an informed opinion on this.

@laudrain
Copy link

IMO, I don't see "people typing these by hand" (@BigBlueHat).
Our ebooks are build using data transformation processes or export plugins, then I'm in favor of only one manifest expression, the canonical one.

@llemeurfr
Copy link
Contributor

For reading systems, the mapping from the authored manifest to the canonical one is undoubtely some work. We're doing exactly that, from the "authored" Readium Webpub manifest to the in-memory object processed by reading software.

For instance
"author": "James Joyce"
will become the in-memory equivalent of
"author": [
{ "name": ["en": "James Joyce"],
"identifier": null,
"sortAs": null,
"links": null
}
]

This is not a huge work however. Those curious can look at the Readium Mobile / Kotlin code:
https://github.com/readium/r2-shared-kotlin/blob/develop/r2-shared/src/main/java/org/readium/r2/shared/Contributor.kt#L76

A JSON schema capable of dealing with flexible structures is also doable; this is done e.g. in https://github.com/readium/webpub-manifest/blob/master/schema/contributor-object.schema.json.

In conclusion yes, there is a price to this flexibility but developers can overcome it.
So the question is more: would the group feel better specifying
"author": [ { "name": ["en": "James Joyce"]}] ?

Re. other shortcuts in the authored manifest, there is also the question of JSON-LD types (e.g. "type": "LinkedResource"). The @type associated with the property definition in the context file is enough to define the type and I don't see why "type": "LinkedResource" should be added in each resource.
The type issue only happens for contributors, as we can't know for sure if it is a Person or Organization from the context. But that is another story.

@iherman
Copy link
Member Author

iherman commented Aug 13, 2019

@llemeurfr

In conclusion yes, there is a price to this flexibility but developers can overcome it.
So the question is more: would the group feel better specifying
"author": [ { "name": ["en": "James Joyce"]}] ?

Exactly. That is the core question.

@mattgarrish
Copy link
Member

Wasn't the main motivator for the canonicalization steps (relating to expanding values, at least) that schema.org metadata values aren't rigidly enforced, so whether machines generate the manifest or not, there's the problem of whatever creates the file following general conventions, like: "author": "So and So"

I thought the idea was to be practical and have the steps to sanitize the data for the user agent rather than fail on it or not process it, since the compact forms don't seem to hinder seo processing?

But if we go this route, we should be thorough about it and also remove the steps about obtaining information from the html document that references/embeds the manifest.

We should figure out why we want/need full json-ld conformance, too, when we don't expect user agents to be json-ld processors.

@iherman
Copy link
Member Author

iherman commented Aug 13, 2019

Wasn't the main motivator for the canonicalization steps (relating to expanding values, at least) that schema.org metadata values aren't rigidly enforced, so whether machines generate the manifest or not, there's the problem of whatever creates the file following general conventions, like: "author": "So and So"

That was certainly part of it. I suspect schema.org has some similar procedure as our canonicalization.

That being said, if we decide we are stricter than schema.org this does not "harm" our data v.a.v. schema.org.

But if we go this route, we should be thorough about it and also remove the steps about obtaining information from the html document that references/embeds the manifest.

Oops, that is true. If we do not do any canonicalization at all, this is a consequence... The biggest "loss" would be the potential reuse of the <title> element.

We should figure out why we want/need full json-ld conformance, too, when we don't expect user agents to be json-ld processors.

By "full conformance" I presume you mean being able to use all JSON-LD feature, right? At this moment I do not see any reason for having it…

@dauwhe
Copy link

dauwhe commented Aug 13, 2019

In case of conflict, consider users over authors over implementors over specifiers over theoretical purity.

@mattgarrish
Copy link
Member

By "full conformance" I presume you mean being able to use all JSON-LD feature, right? At this moment I do not see any reason for having it…

Right, we've never said that a user agent must be a fully-conforming json-ld processor, only that it be able to process the json in the manifest into an internal structure. Maybe I'm wrong, but json-ld only seemed to enter the equation as a means of allowing search crawlers to get at the information without duplicating the metadata. (Is that (still) a primary use case?)

In terms of discrepancies between the graphs, sure, I don't think it's ideal, either, but I thought the point was finding a balance? If you don't care about the seo angle, or it's not relevant to your format, you aren't burdened with strict authoring of json-ld. If it does matter, you can choose to be stricter in authoring.

But maybe this is where layering comes in if we want to formalize this duality.

@iherman
Copy link
Member Author

iherman commented Aug 13, 2019

Maybe I'm wrong, but json-ld only seemed to enter the equation as a means of allowing search crawlers to get at the information without duplicating the metadata. (Is that (still) a primary use case?)

That is certainly the main motivation, yes.

One could imagine a full JSON-LD engine which would then also allow a simple and frictionless extensibility of the metadata by adding new terms, vocabularies, etc, using the JSON-LD facilities. This is what, e.g., Verifiable Claims do. But I do not see that as a use case for the publication manifest, at least not in this version.

@mattgarrish
Copy link
Member

But I do not see that as a use case for the publication manifest, at least not in this version.

I'm admittedly not too keen on another major overhaul of the specification at this stage. We got where we are because there wasn't consensus on requiring strict authoring, so reversing course now seems a bit fraught. We'd be undoing a lot.

@BigBlueHat
Copy link
Member

schema.org has some similar procedure as our canonicalization.

Schema.org provides no algorithm for processing--it's only a vocabulary definition and documentation.

However, SEO bots and tools (i.e. SDTT) which consume it attempt to "clean up" things like "author": "me" by defaulting to Schema.org's Thing class for anything they're not sure about. That works for SEO, because it's like horseshoes and handgrenades--close enough is close enough.

We, however, have a host of consumers for these manifest documents: SEO bots, publisher metadata/management systems, CMS's, metadata management for distribution, personal archives, reading systems, etc.

Consequently, the clearest and most complete manifest is the one that should go into the publication itself. Prior "author-friendly" formats can exist (like YAML => JSON or Markdown => HTML), but have a pre-publication use (vs. post-publication existence).

@BigBlueHat
Copy link
Member

One could imagine a full JSON-LD engine which would then also allow a simple and frictionless extensibility of the metadata by adding new terms, vocabularies, etc, using the JSON-LD facilities. This is what, e.g., Verifiable Claims do. But I do not see that as a use case for the publication manifest, at least not in this version.

Publishers want this very thing, and many of us use graph-based data formats (JSON-LD chief among them) to accomplish this extensibility while still maintaining interoperability (see also Web Annotations, VCs, etc).

@mattgarrish
Copy link
Member

mattgarrish commented Aug 13, 2019

Could the presence/absence of the context be a trigger to canonicalization and/or the media type of the manifest? For example:

sweet, sweet json-ld --> <link rel="publication" href="manifest.json" media-type="application/ld+json"/>
ah, canonicalize me!!! --> <link rel="publication" href="manifest.json" media-type="application/json"/>

If a context is set, data is strictly interpreted and no canonicalization occurs. If the context is not set, the data is assumed to be json and the canonicalization steps have to be run to obtain the common data structure.

It would then be up to implementations to allow one or both serializations, and authors to decide which they prefer.

Right now we seem to be stuck on trying to force some measure of json-ld on simple authoring, and then arguing over the inevitable lack of perfection that results. If people actually want something less than json-ld, let's just give it to them. Otherwise, abandon the idea.

It would also be good to hear who prefers which option either on a call or by email survey, so we have a practical idea of where the group is on this. This discussion could go on a long time if we're just discussing pros and cons.

@iherman
Copy link
Member Author

iherman commented Aug 14, 2019

It would also be good to hear who prefers which option either on a call or by email survey, so we have a practical idea of where the group is on this. This discussion could go on a long time if we're just discussing pros and cons.

+1

@iherman
Copy link
Member Author

iherman commented Aug 14, 2019

(Summarizing, to help WG members who were not part of the discussions so far but whose opinion is necessary at this point)

There are two questions that need a final and urgent decision in order to move forward.

  1. Do we allow the usage of full JSON-LD for the publication manifest, or only a restricted "subset" (or shape) thereof. Put it another way, do we expect reading systems that use the manifest to include a full JSON-LD processor? (This is, in fact, issue Should we use JSON schemas as part of the spec? #32.)
  2. Do we need the differentiation (and corresponding conversion method) between an "authored" manifest and a "canonical" manifest, where the former is a simplified version of the latter (e.g., allowing the author to use simple convention to express the manifest information in its full complexity)? (See the example of @llemeurfr's example in Is there a need for both an authored and a canonical manifests? #31 (comment) to illustrate it)

At the moment, the draft:

  1. uses a subset of JSON only and does not rely on a full JSON-LD processor
  2. contains both the "authored" and the "canonical" manifest

Giving a clear yes or no answer to both questions is necessary at this point to move ahead with the draft; otherwise we are stuck.

(Note that our primary goal are audiobooks at this point, although we should look forward to other usages of the manifest.)

Cc @wareid @GarthConboy

@mattgarrish
Copy link
Member

My preference is still to leave this alone given the work it's taken to get here.

I only wonder if the idea of generating a "canonical manifest" needs some additional clarification. We describe the process as though an actual json-ld document has to be the end result, but I don't believe this is required. It's just a way of explaining the process and resulting data structure.

In other words, where we say in the canonical manifest definition:

The Canonical Publication Manifest is a version of the manifest created by user agents when they process the authored manifest

Isn't what we really mean more like:

The Canonical Publication Manifest is the final internal representation of manifest data created by user agents when they process the authored manifest

A user agent should have the option to do things differently than the lifecycle algorithm, like read the manifest into an internal data structure and then sanitize the data by the rules, never creating a "canonical manifest" in the sense of there still being a json-ld representation.

That's at least what confuses me about the idea that canonicalization represents an unnecessary step. Even if we had full json-ld, the process can't go away as it only takes out some expansion steps. You still have to get the manifest, internalize it and sanitize the data (checking properties are set, values are conforming, etc.).

@llemeurfr
Copy link
Contributor

We describe the process as though an actual json-ld document has to be the end result, but I don't believe this is required. It's just a way of explaining the process and resulting data structure.

You're right in that the result will be an in-memory object reflecting the structure of the "canonical" manifest.

@dauwhe
Copy link

dauwhe commented Aug 14, 2019

No matter how we do things, life is going to be complicated for an ordinary web developer:

var foo = { "author": "Herman Melville" };
var bar = { "author": [ "Herman Melville" ] };
var baz = { "author": [ { "name": "Herman Melville" }, { "type": "person" } ] };

>> foo.author
<< "Herman Melville"

>> bar.author
<< Array [ "Herman Melville" ]

>> baz.author
>> Array [ {…}, {…} ]

>> bar.author[0]
<< "Herman Melville"

>> foo.author[0]
<< "H"

>> baz.author[0]
<< Object { name: "Herman Melville" }

>> baz.author[0].name
<< "Herman Melville"

@mattgarrish
Copy link
Member

mattgarrish commented Aug 14, 2019

baz should be:

var baz = { "author": [ { "name": "Herman Melville" , "type": "Person" } ] };

But I wouldn't call processing json all that complicated if I can do it. :) It's just a bit tedious in that you always have to test what kind of data you've encountered:

if (Array.isArray(baz.author)) {
 
}
else if (typeof(baz.author) === 'object') {

}
else {

}

@llemeurfr
Copy link
Contributor

@dauwhe @mattgarrish good to see your developer skills. Did you have a look at the code linked from #31 (comment)?

@mattgarrish
Copy link
Member

Yes, and I agree that processing isn't the big challenge here.

I also agree that there isn't a lot of value to having to author information that can be inferred, like types. The JSON-LD specification appears to agree, too.

But I've maybe become jaded by experiences in epub where no matter how elaborate we've tried to make the metadata, the extra information just hasn't proven useful. How many systems need to know whether an author is a person, organization or thing?

@iherman
Copy link
Member Author

iherman commented Aug 14, 2019

The processing is not a big deal. I have done this in https://github.com/iherman/WPManifest (it includes stuff that is irrelevant by now, ie, getting hold of the manifest itself as described in WPUB, and it may not be up-to-date).

@iherman
Copy link
Member Author

iherman commented Aug 14, 2019

But I've maybe become jaded by experiences in epub where no matter how elaborate we've tried to make the metadata, the extra information just hasn't proven useful. How many systems need to know whether an author is a person, organization or thing?

As far as I could see schema.org processors (at least the structural testing tool) complains if the type is not explicit. I believe this was the only reason we required it.

But, as @llemeurfr said in another comment, whether the type is required or not is another issue, let us not mix it with the fundamental questions above...

@dauwhe
Copy link

dauwhe commented Aug 14, 2019

I really want to avoid "author": [ { "name": ["en": "James Joyce"]}] just because it's hard to write and read.

Aside from questions of syntax, what benefit do we get from this extra information? Are we obligated to research the name of every author? What does it mean for a proper name to have a language associated with it? Is Yann Martel fr-CA or en-CA?

@laudrain
Copy link

@dauwhe nobody will have to write this by hand.
On your "aside", internationalization is an issue: there may be several language and script for the same author. And globally this is a question of metadata quality.

@mattgarrish
Copy link
Member

mattgarrish commented Aug 14, 2019

As far as I could see schema.org processors (at least the structural testing tool) complains if the type is not explicit.

From what I see the testing tool just defaults to Thing whereas we default to Person.

The concerns about different graphs are real, but I'm just not swayed that the compact forms will cause much real harm in practice, and you can always avoid them. We were trying to be flexible by going this route, just as schema.org metadata processors have to be.

But I agree we don't need to get into all the details. I'm only raising this to agree with your second point that keeping the simplifications allowed in the authored manifest is fine with me.

@dauwhe
Copy link

dauwhe commented Aug 14, 2019

nobody will have to write this by hand.

I have personally edited probably thousands of EPUB package files.

Even if the majority of manifests are created by tools, I still think it's important to maintain as much human-readability as possible. It helps with troubleshooting and makes it easier for developers, who are also human :)

@mattgarrish
Copy link
Member

I still think it's important to maintain as much human-readability as possible.

I would phrase this more as it's important to maintain an authoring syntax that people are already familiar with when authoring schema.org metadata.

I believe we've achieved that in allowing strings and defining how to make objects from them. Let's not go further astray.

@iherman
Copy link
Member Author

iherman commented Aug 14, 2019

@llemeurfr
Copy link
Contributor

Re. allowing "property":"value" AND "property":["value","value"], did you spot that JSON-LD gives the bad example, e.g. in https://www.w3.org/TR/json-ld11/#specifying-the-type with
"@type": "http://schema.org/Person"
and
"@type": [
"http://schema.org/Person",
"http://xmlns.com/foaf/0.1/Person"
]
?

A precision on @mattgarrish reference to the JSON-LD spec. From what I understand, the referenced section is about how a JSON-LD processor can infer the type of an object from the properties it contains. This is not the use case I was talking about: in fact I was thinking about "type coercion" but discovered that JSON-LD does not support it for complex types, ref. w3c/json-ld-syntax#31. This would have made the JSON-LD context an equivalent of an RDF Schema, which IMHO would have been smart.

@iherman
Copy link
Member Author

iherman commented Aug 14, 2019

I was thinking about "type coercion" but discovered that JSON-LD does not support it for complex types, ref. w3c/json-ld-syntax#31.

Indeed. JSON-LD is "only" an RDF serialization, and such inferences are in RDF's purview.

@BigBlueHat
Copy link
Member

If the context is not set, the data is assumed to be json and the canonicalization steps have to be run to obtain the common data structure.

If the context is not set (i.e. it doesn't include http://schema.org/ at least), then there's no SEO value...

Also, that discovery step would be required or a unique MIME media type (beyond application/json) would need to be used. Whereas with JSON-LD-based documents, the vocabulary is referenced from within the content itself (i.e. in @context).

A user agent should have the option to do things differently than the lifecycle algorithm, like read the manifest into an internal data structure and then sanitize the data by the rules, never creating a "canonical manifest" in the sense of there still being a json-ld representation.

Perhaps this is the core of the confusion/tension. The canonicalization algorithm is meant to provide a "canonical manifest," but that "manifest" isn't actually ever...manifest. It only exists (according to the spec currently) as an "internal representation of the data structure." That's not what JSON-LD is for...it's what WebIDL and internal APIs are for.

From the introduction:

This section describes the steps a user agent follows to process an authored manifest into an internal representation of the data structure it contains.

So, I'd conclude (per the aim of this issue) that...

  • there should be an "authored manifest"
  • there should not but a "canonical manifest"
  • (but) there should be an "internal representation" defined via WebIDL describing expected internal APIs that implementers should consistently use (assuming we want compatibility at that layer)

Alternatively, there might only be one "manifest" format/style and UA's can define whatever internal representation they want/need.

@BigBlueHat
Copy link
Member

Re. allowing "property":"value" AND "property":["value","value"], did you spot that JSON-LD gives the bad example, e.g. in https://www.w3.org/TR/json-ld11/#specifying-the-type with
"@type": "http://schema.org/Person"
and
"@type": [
"http://schema.org/Person",
"http://xmlns.com/foaf/0.1/Person"
]
?

That example's correct, @llemeurfr JSON-LD supports multiple @type values per subject (i.e. @id).

You are correct about JSON-LD not supporting node "type coercion" (i.e. you can't turn a "Thing" into a "Person").

Anything making those sorts of additions or changes to the original data (like the canonicalization algorithm or the Structured Data Testing Tool) is doing so beyond what can be understood via a JSON-LD context or even a JSON Schema--as they both describe expectations or understandings of the data...not transformations.

@iherman
Copy link
Member Author

iherman commented Aug 15, 2019

So, I'd conclude (per the aim of this issue) that...

  • there should be an "authored manifest"
  • there should not but a "canonical manifest"
  • (but) there should be an "internal representation" defined via WebIDL describing expected internal APIs that implementers should consistently use (assuming we want compatibility at that layer)

That is a viable approach indeed, although I am not yet sure how to change what is there editorially. But surely that can be done, @mattgarrish and I can look into this editorial option...

@iherman
Copy link
Member Author

iherman commented Aug 15, 2019

Related to the possible consensus on re-branding the canonicalization: we use WebIDL as some sort of a data structure description language. It is not ideal; nobody is/was fully happy with it, because WebIDL is usually used to describe a data structure.

However... anybody has a better idea? Something that is clean and easily readable by programmers and does not give the impression that we bind this to a single programming language (although the latter can be mitigated by a clever explanation...).

If I give up the programming language independent view, then I would consider TypeScript (but I am not very familiar with it): it is close enough to Javascript that people should understand it, but it has information about the datatypes, which Javascript does not have.

Any ideas?

@danielweck @rdeltour @llemeurfr (as users, afaik, of Typescript in Readium...)

@danielweck
Copy link
Member

TypeScript, sure why not.
ReasonML could fit the bill too.
https://reasonml.github.io/

But if "canonicalization" is an algorithm, then why not use pseudo-code in the same way that HTML5 defines processing model / parser logic?

@iherman
Copy link
Member Author

iherman commented Aug 15, 2019

Well... the way I understand it, the HTML's logic is how to produce a DOM entry which is defined by... WebIDL.

The current algorithm describes how the representation of the (JSON) manifest is transformed into a representation of another JSON document. Instead, we can explicitly refer to a datastructure defined in some language. We could keep that target as the WebIDL explicitly in the algorithm, and that would work. (This is analogous to the HTML spec.) Except that WebIDL was not liked, so if we have a better alternative, then it may worth taking it.

It is the first type I hear of ReasonML, to be honest...

@danielweck
Copy link
Member

It is the first type I hear of ReasonML

For some reason, this type typo makes me smile every time :)

@mattgarrish
Copy link
Member

If the context is not set (i.e. it doesn't include http://schema.org/ at least), then there's no SEO value...

We're not making web publications, though, so I'm not sure how important a use case this is, at least at this time.

But I'll try to make some changes to reflect the discussions in this thread before Monday. On the road for a couple of more days.

@iherman
Copy link
Member Author

iherman commented Aug 15, 2019

We're not making web publications, though, so I'm not sure how important a use case this is, at least at this time.

I think that it is important, no matter how it is used, that our vocabulary reuses another well known vocabulary as schema. It allows the usage of the manifest in a SEO setting, if needed...

@BigBlueHat
Copy link
Member

Except that WebIDL was not liked, so if we have a better alternative, then it may worth taking it.

If our WebIDL were describing an internal data model / API for user agents, then WebIDL would be the right choice--and would make sense (I'd reckon) to the TAG and developers alike.

Essentially, once a manifest is consumed by a UA, developers should expect compatible data representations within that UA to match the expressed WebIDL--which I think maps to what the canonical manifest was attempting to state/provide (afaict).

@mattgarrish
Copy link
Member

I think that it is important ... that our vocabulary reuses another well known vocabulary as schema.

Sure, I'm not suggesting we drop it. All I asked was whether it was necessary for every implementation to author json-ld, or whether the context can be inferred, and processing carried out, if a pure json file is authored.

But it sounds like if we clarify the canonical "manifest" then there isn't such an issue with using compact expressions, in which case I don't care anymore. :)

@HadrienGardeur
Copy link
Member

Here's my take on this:

  • the spec and associated JSON Schema should only define the authored manifest
  • the concept of a canonical manifest and the steps to generate it should be dropped from the document
  • the idea of a "canonical manifest" should only exist through WebIDL
  • that said WebIDL doesn't make a lot of sense for the use case being considered for audiobooks right now (primarily an interchange format between authors/publishers/distributors/retailers)
  • given the current scope (audiobooks), JSON-LD doesn't really matter that much, this will be treated as JSON during the ingestion process at various retailers/distributors

@iherman
Copy link
Member Author

iherman commented Aug 16, 2019

@HadrienGardeur

I think that does reflect the current consensus we are heading for, except for:

  • the idea of a "canonical manifest" should only exist through WebIDL

I think the spec does require defining precisely how the WebIDL is created from the manifest, due to the fact that there are a number of cases when different manifest forms produce the same WebIDL expression (the usage or not of an array is a typical case). This is what the current canonicalization spec section does, and should be reformulated as producing the WebIDL instead. (I believe @mattgarrish will, eventually, change that part of the spec when he is back from traveling.)

@llemeurfr
Copy link
Contributor

llemeurfr commented Aug 16, 2019

Re. @iherman 's request I think we would have a hard time finding a better way than WebIDL for expressing in a clear way the object model of a manifest. UML class diagram are not understood widely and are not accessible, Typescript is a specific language and we don't want to be language specific. WebIDL was not created for such a think, but it makes the trick correctly.

@iherman
Copy link
Member Author

iherman commented Aug 16, 2019

Typescript is a specific language and we don't want to be language specific

That is correct... Nevertheless: does anybody know a precedence (not necessarily in W3C land) for a specification that uses Typescript as such a specification language?

@BigBlueHat
Copy link
Member

* JSON-LD doesn't really matter that much, this will be treated as JSON during the ingestion process at various retailers/distributors

If we go with a "plain JSON" document, it will need its own media type so that the terms (and meaning) can be mapped to the vocabulary its using (which would only be defined in prose in a spec). It will also need to provide it's own extension mechanism--if we expect publishers to extend it with their own idiosyncratic data (and they will).

In the end, there will be much repeating of what JSON-LD provides--which is why so many specifications ship JSON-based specs with JSON-LD contexts +/- their own media types when additional processing semantics is required (beyond either JSON or JSON-LD processing).

@HadrienGardeur
Copy link
Member

In the end, there will be much repeating of what JSON-LD provides--which is why so many specifications ship JSON-based specs with JSON-LD contexts +/- their own media types when additional processing semantics is required (beyond either JSON or JSON-LD processing).

Sure, I'm not suggesting that we drop JSON-LD, just pointing out that these documents won't get processed as such.

IMO using JSON-LD and having a specific media type is the way to go.

@iherman
Copy link
Member Author

iherman commented Aug 19, 2019

This issue was discussed in a meeting.

  • RESOLVED: (1) only a shape of JSON-LD is required; this will be further defined through an (informative) reference to a JSON schema. This should close issue #32. (2) instead of the canonical manifest only an internal data structure is used, and the canonicalization algo. maps onto this. This closes issue #31
View the transcript manifest discussion issue
Wendy Reid: See Issue #31 “Is there a need for both an authored and a canonical manifests”
Wendy Reid: See Issue #32 “Should we use JSON schemas as part of the spec?”
Wendy Reid: See summary of the discussions and decisions to be taken, read here:
Do we allow the usage of full JSON-LD for the publication manifest, or only a restricted “subset” (or shape) thereof. Put it another way, do we expect reading systems that use the manifest to include a full JSON-LD processor? (This is, in fact, issue #32.)
Do we need the differentiation (and corresponding conversion method) between an “authored” manifest and a “canonical” manifest, where the former is a simplified version of the latter (e.g., allowing the author to use simple convention to express the manifest information in its full complexity)? (See the example of @llemeurfr’s example in #31 (comment) to illustrate it)
Ivan Herman: we moved a bit last week and getting to a consensus among those who discussed all this. The proposed answer to the first question is no, ie, we would use just a specific “shape” of JSON-LD. There would be an informal reference to a JSON-Schema to define that shape.
George Kerscher: A fully implemented json processor would be able to process this subset and so would a reading system so we have this covered, yes?
Ivan Herman: yes.
… the other issue is canonical manifest. The proposed consensus is the get rid of the term ‘canonical’ manifest.
… There is already a (WebIDL) definition in the document that is used by the processor. Matt is working with the conversion algorithm that says here is the manifest and here is how I use it to convert it into the data structure defined by WebIDL. I think that the discussion on the issue shows that we are in agreement
… matt’s work is not yet done and so this will not be in the first public working draft
… the question is if there is a consensus
Wendy Reid: are there any comments?
Benjamin Young: the overall direction is the right one but we do need to review the writing when it’s done because it will clarify some of the confusion
… shape is a better term in this case than subset
Proposed resolution: (1) only a shape of JSON-LD is required; this will be further defined through an (informative) reference to a JSON schema. This should close issue #32. (2) instead of the canonical manifest only an internal data structure is used, and the canonicalization algo. maps onto this. This closes issue #31 (Ivan Herman)
Ivan Herman: +1
Wendy Reid: +1
Luc Audrain: +1
Nellie McKesson: +1
Deborah Kaplan: +1
Matt Garrish: +1
Rachel Comerford: +1
Tim Cole: +1
Romain Deltour: +1
Mateus Teixeira: +1
Geoff Jukes: +1
Benjamin Young: +1
Marisa DeMeglio: +1
Laurent Le Meur: +1
Bill Kasdorf: +1
Joshua Pyle: +1
Resolution #5: (1) only a shape of JSON-LD is required; this will be further defined through an (informative) reference to a JSON schema. This should close issue #32. (2) instead of the canonical manifest only an internal data structure is used, and the canonicalization algo. maps onto this. This closes issue #31
4. M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants