Remove the need for wrapping comment-open/comment-close text in data blocks #97

BigBlueHat · 2018-11-28T21:09:52Z

Addresses the concerns expressed in #96.

This PR also includes putting the <base> handling content in its own section, and clarifies the extraction of a single, targeted data block. I can move those to separate PRs if desired, as they do introduce requirements unrelated to the topic in #96. They do clarify/specify the use of targeting vs. not targeting specific data blocks. But my apologies for the mixed PR.

Preview | Diff

gkellogg · 2018-11-28T22:55:17Z

index.html

+      of the script element within the HTML document located by a URL (see [[!DOM]]).
+      A <a>JSON-LD processor</a> MUST extract only the specified data block's contents
+      parsing it as a standalone <a>JSON-LD document</a>
+      and MUST NOT create a <a>dataset</a>.</p>


Why must it not create a dataset; what does it create if not a dataset?

I would want a targeted extraction to extract exactly what's expressed in the data block, and not alter it by putting it within a dataset.

If I'm extracting all data blocks into a single "thing", then I'd expect that thing to be a dataset.

Okay, wording needs to be changed, as the embedded JSON-LD may, itself, create a dataset, which should not be prohibited. I think you mean that an quads extracted from a targeted script should not be combined into a graph/dataset comprised of other markup in the document, including other JSON-LD script elements, embedded Mircodata, RDFa, or whatever.

Commit 63017bd in this branch should resolve this, and I think we can merge.

gkellogg · 2018-11-28T22:56:19Z

This will require a corresponding change in the API doc.

…f script escape sequences. This supports w3c/json-ld-syntax#97.

iherman · 2018-11-29T08:46:26Z

I am fine with this. Thanks to @BigBlueHat and @gkellogg!

BigBlueHat · 2018-11-29T18:08:35Z

index.html

        "http://schema.org/name": [{"@value": "Encoding Issues"}],
        "http://schema.org/description": [
-          {"@value": "Issues list such as unescaped &lt;/script&gt; or --&gt;"}
+          {"@value": "Issues list such as unescaped </script> or -- >"}


Similar to my other comment--would expansion of the raw JSON-LD result in this variation or would those retain the entities from the original?

This "other comment" (since it's from a different repo's PR review) w3c/json-ld-api@b969d20#r31502674

The original included JSON-LD in a script element with escaped entities. When extracted, and subsequently processed, the entities are resolved.

There's some trickery (-- > vs. -->) because of issues in the spec itself, as -- can not appear in a comment, which it would otherwise. Code in both common.js and extract-examples explicitly manages this.

BigBlueHat · 2018-11-29T18:09:15Z

index.html

@@ -8434,7 +8434,7 @@ <h3>Graph Containers</h3>
    JSON-LD content can be easily embedded in HTML [[HTML52]] by placing
    it in a <a data-cite="HTML52/semantics-scripting.html#the-script-element">Script element</a> with the <code>type</code> attribute set to
    <code>application/ld+json</code>. Doing so creates a
-    <a data-cite="https://www.w3.org/TR/html52/semantics-scripting.html#data-block">data block</a>.</p>


Thanks. Meant to go back and fix that once I saw how the magic worked. 😉

JSON-LD script elments MUST only contain valid JSON. HTML entity-based character encoding is used to avoid accidental script tag parsing problems--as described in the note in Restrictions for contents of <script> elements: https://www.w3.org/TR/html52/semantics-scripting.html#restrictions-for-contents-of-script-elements

I was not able to sort out when only the first script element would be chosen as the preceding text talks about always creating a data set.

gkellogg · 2018-11-30T18:45:58Z

@BigBlueHat can we pull this separate from w3c/json-ld-api#51? We need to resolve the dataset description, but handling entity escapes in script tags seems to be a bigger knot that I'd rather not hole the rest of this PR request up for, and maybe should be handled differently in w3c/json-ld-api#51 as a separate issue as well.

…ing combined with any other content from the same HTML document.

iherman · 2018-12-01T07:54:55Z

This issue was discussed in a meeting.

No actions or resolutions

View the transcript

JSON-LD script elements (data blocks) MUST only contain valid JSON
Benjamin Young: #96
Benjamin Young: issue related to #97
Gregg Kellogg: what I pushed up there addresses the different aspects of the HTML script content.
… HTML provides 2 mechanisms: specific backslash escapes in HTML comments, or HTML entities
… Benjamin proposed to remove the comments, to keep it valid JSON, consistently with the declared content-type.
Benjamin Young: it’s similar to injecting HTML tags in a <textarea>, without breaking the textarea.
Rob Sanderson: to be sure i understand, the issue is if you have: <script>{"label": "

gkellogg · 2018-12-01T22:37:09Z

Merging, as the remaining issue is separated to #100.

…f script escape sequences. This supports w3c/json-ld-syntax#97.

iherman · 2018-12-08T05:56:28Z

This issue was discussed in a meeting.

RESOLVED: close #97 as fixed with recent merges; other issues raised in #100

View the transcript

4.1. Remove the need for wrapping comment-open/comment-close text in data blocks
Benjamin Young: #97
Gregg Kellogg: most of these are related to PRs just discussed, but not all
Gregg Kellogg: #96
Benjamin Young: current draft talks now about html entities for escaping
Gregg Kellogg: given that it has to be valid json-ld, how do we deal with escaped entities
… leaves us with problem of how to represent <script> tags and comments in json?
… so we could close 97 and leave 100 for further discussion
Benjamin Young: one option is to mark propose closing and discuss more later
Gregg Kellogg: DanBri involved and noted in discussion (of this or another related issue 100)
Proposed resolution: close #97 as fixed with recent merges; other issues raised in #100 (Benjamin Young)
Gregg Kellogg: +1
David Newbury: +1
Tim Cole: +1
Benjamin Young: +1
Ivan Herman: +1
Resolution #2: close #97 as fixed with recent merges; other issues raised in #100

gkellogg reviewed Nov 28, 2018

View reviewed changes

gkellogg added a commit to w3c/json-ld-api that referenced this pull request Nov 29, 2018

Remove support for embedded JSON-LD inside XML comments and the use o…

b969d20

…f script escape sequences. This supports w3c/json-ld-syntax#97.

gkellogg approved these changes Nov 29, 2018

View reviewed changes

BigBlueHat commented Nov 29, 2018

View reviewed changes

gkellogg mentioned this pull request Nov 29, 2018

Add example and description for issue #56. #95

Merged

azaroth42 approved these changes Nov 30, 2018

View reviewed changes

BigBlueHat and others added 7 commits November 30, 2018 10:35

Rephrase embedded intro; focus on data blocks

db4998a

Remove confusing Otherwise clause

4a1089c

I was not able to sort out when only the first script element would be chosen as the preceding text talks about always creating a data set.

Add requirements when targeting one data block

169deb3

Move base element handling into dedicated section

3e9c788

Update script and example for unescaping HTML.

7716b18

Fix some HTML errors and deal with - - >

a52fb4c

gkellogg force-pushed the html-data-block-tweaks branch from 4861a36 to a52fb4c Compare November 30, 2018 18:42

Replace bit about targeted scripts not creating a dataset with not be…

63017bd

…ing combined with any other content from the same HTML document.

iherman mentioned this pull request Dec 1, 2018

JSON-LD script elements (data blocks) MUST only contain valid JSON #96

Closed

gkellogg mentioned this pull request Dec 1, 2018

Handling HTML character references in embedded JSON-LD #100

Closed

gkellogg merged commit b97a16d into master Dec 1, 2018

gkellogg deleted the html-data-block-tweaks branch December 1, 2018 22:37

gkellogg added a commit to w3c/json-ld-api that referenced this pull request Dec 1, 2018

Remove support for embedded JSON-LD inside XML comments and the use o…

57e7c9c

…f script escape sequences. This supports w3c/json-ld-syntax#97.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the need for wrapping comment-open/comment-close text in data blocks #97

Remove the need for wrapping comment-open/comment-close text in data blocks #97

BigBlueHat commented Nov 28, 2018 •

edited by pr-preview bot

Loading

gkellogg Nov 28, 2018

BigBlueHat Nov 29, 2018

gkellogg Nov 29, 2018

gkellogg Nov 30, 2018

gkellogg commented Nov 28, 2018

iherman commented Nov 29, 2018

BigBlueHat Nov 29, 2018

BigBlueHat Nov 29, 2018

gkellogg Nov 29, 2018

BigBlueHat Nov 29, 2018

gkellogg commented Nov 30, 2018

iherman commented Dec 1, 2018

gkellogg commented Dec 1, 2018

iherman commented Dec 8, 2018

Remove the need for wrapping comment-open/comment-close text in data blocks #97

Remove the need for wrapping comment-open/comment-close text in data blocks #97

Conversation

BigBlueHat commented Nov 28, 2018 • edited by pr-preview bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkellogg commented Nov 28, 2018

iherman commented Nov 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkellogg commented Nov 30, 2018

iherman commented Dec 1, 2018

gkellogg commented Dec 1, 2018

iherman commented Dec 8, 2018

BigBlueHat commented Nov 28, 2018 •

edited by pr-preview bot

Loading