Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the need for wrapping comment-open/comment-close text in data blocks #97

Merged
merged 8 commits into from
Dec 1, 2018

Conversation

BigBlueHat
Copy link
Member

@BigBlueHat BigBlueHat commented Nov 28, 2018

Addresses the concerns expressed in #96.

This PR also includes putting the <base> handling content in its own section, and clarifies the extraction of a single, targeted data block. I can move those to separate PRs if desired, as they do introduce requirements unrelated to the topic in #96. They do clarify/specify the use of targeting vs. not targeting specific data blocks. But my apologies for the mixed PR.


Preview | Diff

index.html Outdated
of the script element within the HTML document located by a URL (see [[!DOM]]).
A <a>JSON-LD processor</a> MUST extract only the specified data block's contents
parsing it as a standalone <a>JSON-LD document</a>
and MUST NOT create a <a>dataset</a>.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why must it not create a dataset; what does it create if not a dataset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would want a targeted extraction to extract exactly what's expressed in the data block, and not alter it by putting it within a dataset.

If I'm extracting all data blocks into a single "thing", then I'd expect that thing to be a dataset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, wording needs to be changed, as the embedded JSON-LD may, itself, create a dataset, which should not be prohibited. I think you mean that an quads extracted from a targeted script should not be combined into a graph/dataset comprised of other markup in the document, including other JSON-LD script elements, embedded Mircodata, RDFa, or whatever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 63017bd in this branch should resolve this, and I think we can merge.

@gkellogg
Copy link
Member

This will require a corresponding change in the API doc.

gkellogg added a commit to w3c/json-ld-api that referenced this pull request Nov 29, 2018
@iherman
Copy link
Member

iherman commented Nov 29, 2018

I am fine with this. Thanks to @BigBlueHat and @gkellogg!

index.html Outdated
"http://schema.org/name": [{"@value": "Encoding Issues"}],
"http://schema.org/description": [
{"@value": "Issues list such as unescaped &lt;/script&gt; or --&gt;"}
{"@value": "Issues list such as unescaped </script> or -- >"}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my other comment--would expansion of the raw JSON-LD result in this variation or would those retain the entities from the original?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "other comment" (since it's from a different repo's PR review) w3c/json-ld-api@b969d20#r31502674

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original included JSON-LD in a script element with escaped entities. When extracted, and subsequently processed, the entities are resolved.

There's some trickery (-- > vs. -->) because of issues in the spec itself, as -- can not appear in a comment, which it would otherwise. Code in both common.js and extract-examples explicitly manages this.

index.html Outdated
@@ -8434,7 +8434,7 @@ <h3>Graph Containers</h3>
JSON-LD content can be easily embedded in HTML [[HTML52]] by placing
it in a <a data-cite="HTML52/semantics-scripting.html#the-script-element">Script element</a> with the <code>type</code> attribute set to
<code>application/ld+json</code>. Doing so creates a
<a data-cite="https://www.w3.org/TR/html52/semantics-scripting.html#data-block">data block</a>.</p>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Meant to go back and fix that once I saw how the magic worked. 😉

BigBlueHat and others added 7 commits November 30, 2018 10:35
JSON-LD script elments MUST only contain valid JSON.

HTML entity-based character encoding is used to avoid
accidental script tag parsing problems--as described in
the note in Restrictions for contents of <script> elements:
https://www.w3.org/TR/html52/semantics-scripting.html#restrictions-for-contents-of-script-elements
I was not able to sort out when only the first script element
would be chosen as the preceding text talks about always creating
a data set.
@gkellogg gkellogg force-pushed the html-data-block-tweaks branch from 4861a36 to a52fb4c Compare November 30, 2018 18:42
@gkellogg
Copy link
Member

@BigBlueHat can we pull this separate from w3c/json-ld-api#51? We need to resolve the dataset description, but handling entity escapes in script tags seems to be a bigger knot that I'd rather not hole the rest of this PR request up for, and maybe should be handled differently in w3c/json-ld-api#51 as a separate issue as well.

…ing combined with any other content from the same HTML document.
@iherman
Copy link
Member

iherman commented Dec 1, 2018

This issue was discussed in a meeting.

  • No actions or resolutions
View the transcript JSON-LD script elements (data blocks) MUST only contain valid JSON
Benjamin Young: #96
Benjamin Young: issue related to #97
Gregg Kellogg: what I pushed up there addresses the different aspects of the HTML script content.
… HTML provides 2 mechanisms: specific backslash escapes in HTML comments, or HTML entities
… Benjamin proposed to remove the comments, to keep it valid JSON, consistently with the declared content-type.
Benjamin Young: it’s similar to injecting HTML tags in a <textarea>, without breaking the textarea.
Rob Sanderson: to be sure i understand, the issue is if you have: <script>{"label": "

@gkellogg
Copy link
Member

gkellogg commented Dec 1, 2018

Merging, as the remaining issue is separated to #100.

@gkellogg gkellogg merged commit b97a16d into master Dec 1, 2018
@gkellogg gkellogg deleted the html-data-block-tweaks branch December 1, 2018 22:37
gkellogg added a commit to w3c/json-ld-api that referenced this pull request Dec 1, 2018
@iherman
Copy link
Member

iherman commented Dec 8, 2018

This issue was discussed in a meeting.

  • RESOLVED: close #97 as fixed with recent merges; other issues raised in #100
View the transcript 4.1. Remove the need for wrapping comment-open/comment-close text in data blocks
Benjamin Young: #97
Gregg Kellogg: most of these are related to PRs just discussed, but not all
Gregg Kellogg: #96
Benjamin Young: current draft talks now about html entities for escaping
Gregg Kellogg: given that it has to be valid json-ld, how do we deal with escaped entities
… leaves us with problem of how to represent <script> tags and comments in json?
… so we could close 97 and leave 100 for further discussion
Benjamin Young: one option is to mark propose closing and discuss more later
Gregg Kellogg: DanBri involved and noted in discussion (of this or another related issue 100)
Proposed resolution: close #97 as fixed with recent merges; other issues raised in #100 (Benjamin Young)
Gregg Kellogg: +1
David Newbury: +1
Tim Cole: +1
Benjamin Young: +1
Ivan Herman: +1
Resolution #2: close #97 as fixed with recent merges; other issues raised in #100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants