-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manifest processing model, what if null base URL? (related to origin issue) #12
Comments
If the manifest is embedded, the only way this can happen (see w3c/wpub#321 (comment)) is if the value of |
One step further in https://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-baseURI:
|
Related to the original question: I am fine modifying the processing model stating that if this happens, the processing stops. |
I my original comment I mentioned |
@danielweck I must admit I do not understand your remark with the data URL. Can you give a somewhat more detailed example of what this would be and mean? |
In the following edge case example, the Please ignore the lack of character escaping, this is pseudo-code:
Let's not try to explain why such convoluted markup would exist in the first place. Let's just handle the edge case regardless of its possible causes. I see two options:
|
(2) is of course sounds as a viable and reasonable option, except that I would expect many reading systems would want to parse and interpret the manifest directly for the purposes of publications without relying on a full-blown json-ld processor. I.e., relying on that may be an issue. On (1) yes, there are discussions on the JSON-LD but on (other) edge cases of embedding a manifest (e.g., is it required to escape certain HTML terms within the script element). I actually do not think this type of edge case has been discussed or not. Yes, the WebPub model can isolate itself, but I would think it is better to align with the JSON-LD WG. Bottom line, I think this question should be raised in the JSON-LD WG. I can of course raise the issue, but it may be better if you did it (on https://github.com/w3c/json-ld-syntax/issues). Do you know what will the baseURI value be on the DOM element for <script>? Will it be null (which I expect to be)? |
Quick test: <html>
<body>
<iframe
width="100%"
height="100%"
src="data:text/html;base64,CjxodG1sPgo8aGVhZD4KPGJhc2UgaHJlZj0iaHR0cHM6Ly9kb21haW4ub3JnL3BhdGgvIiAvPgoKPHNjcmlwdCBpZD0ic2NyaXB0IiB0eXBlPSJ0ZXh0L2phdmFzY3JpcHQiPgogIGRvY3VtZW50LmFkZEV2ZW50TGlzdGVuZXIoIkRPTUNvbnRlbnRMb2FkZWQiLCBmdW5jdGlvbihldmVudCkgewogICAgY29uc29sZS5sb2coIkRPTUNvbnRlbnRMb2FkZWQiKTsKICAgIAogICAgLy8gd2luZG93LmxvY2F0aW9uLm9yaWdpbiB0b28KICAgIGxldCB0MSA9ICJ3aW5kb3cub3JpZ2luOiAiICsgd2luZG93Lm9yaWdpbjsKICAgIGNvbnNvbGUubG9nKHQxKTsKICAgIGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCJfMSIpLmlubmVySFRNTCA9IHQxOwogICAgCiAgICBsZXQgdDIgPSAiZG9jdW1lbnQuYmFzZVVSSTogIiArIGRvY3VtZW50LmJhc2VVUkk7CiAgICBjb25zb2xlLmxvZyh0Mik7CiAgICBkb2N1bWVudC5nZXRFbGVtZW50QnlJZCgiXzIiKS5pbm5lckhUTUwgPSB0MjsKCiAgICBsZXQgdDMgPSAibG9jYXRpb24uaHJlZjogIiArIGxvY2F0aW9uLmhyZWY7CiAgICBjb25zb2xlLmxvZyh0Myk7CiAgICBkb2N1bWVudC5nZXRFbGVtZW50QnlJZCgiXzMiKS5pbm5lckhUTUwgPSB0MzsKCiAgICBsZXQgdDQgPSAic2NyaXB0LmJhc2VVUkk6ICIgKyBkb2N1bWVudC5nZXRFbGVtZW50QnlJZCgic2NyaXB0IikuYmFzZVVSSTsKICAgIGNvbnNvbGUubG9nKHQ0KTsKICAgIGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCJfNCIpLmlubmVySFRNTCA9IHQ0OwogIH0pOwo8L3NjcmlwdD4KPC9oZWFkPgo8Ym9keT4KPGgxIGlkPSJfMSI+MTwvaDE+CjxoMSBpZD0iXzMiPjM8L2gxPgo8aDEgaWQ9Il8yIj4yPC9oMT4KPGgxIGlkPSJfNCI+NDwvaDE+CjwvYm9keT4KPC9odG1sPg=="
/>
</body>
</html>
<!--
<html>
<head>
<base href="https://domain.org/path/" />
<script id="script" type="text/javascript">
document.addEventListener("DOMContentLoaded", function(event) {
console.log("DOMContentLoaded");
// window.location.origin too
let t1 = "window.origin: " + window.origin;
console.log(t1);
document.getElementById("_1").innerHTML = t1;
let t2 = "document.baseURI: " + document.baseURI;
console.log(t2);
document.getElementById("_2").innerHTML = t2;
let t3 = "location.href: " + location.href;
console.log(t3);
document.getElementById("_3").innerHTML = t3;
let t4 = "script.baseURI: " + document.getElementById("script").baseURI;
console.log(t4);
document.getElementById("_4").innerHTML = t4;
});
</script>
</head>
<body>
<h1 id="_1">1</h1>
<h1 id="_3">3</h1>
<h1 id="_2">2</h1>
<h1 id="_4">4</h1>
</body>
</html>
--> Result:
If the Based on this simple experiment, I am starting to wonder whether ; just like opaque Thoughts? |
Taking this out from @danielweck's long comment for an easier reference:
I got to a similar conclusion, so I wholeheartedly agree. Although weird, the example with I think for both this issue and w3c/wpub#321 we should try to find a blanket formulation in the processing which says that if a processing step runs into an error (or a OWP related error?), then the processing would stop and there would be no manifest. (We could put there an note giving examples for such situations, and we can refer to the origin or the baseURI null problem, but that should only be an informal note.) I am not sure how exactly to formulate that, but maybe @mattgarrish can come with the best terminology... |
N.B. I have raised an explicit issue by the JSON-LD WG (w3c/json-ld-syntax#103), a.k.a. passing over the buck:-) |
Thanks Ivan! Let me also clarify this statement:
If the former (i.e. the WP specification describes "parsing" rules, probably as an extension to the JSON-LD processing model), then the manifest algorithm must be clear about what happens when an absolute URL cannot be resolved:
|
All this in a new setting, where we are "only" talking about the strict vocabulary and not the processing models anymore... Looking at the canonicalization algorithm the only place where the base is used is in step 11, i.e., when relative URL-s are turned into absolute ones. I see two simple options:
In fact, the consequence of (2) is still (1), in the sense that the processor specification should still define what a relative URI means within the publication. How is that formally defined in EPUB? I mildly in favor of (2), i.e., allowing an explicit base setting but falling back on the processor behavior if not used. Note that if we decide for (1) that makes #11 moot as well. |
I am sorry, the right link is https://w3c.github.io/pub-manifest/#canonical-manifest |
I certainly wouldn't want to fork. (1) and (2) is to be silent about the issue in the canonicalization... |
This issue was discussed in a meeting.
View the transcript5. Issue #12 Manifest processing model, what if null base URL?Garth Conboy: Is Daniel on the call to talk about (?) … issue 12 Manifest processing model, what if null base URL? Garth Conboy: See Issue #12 Wendy Reid: I need to read this over before I have any opinions… I think we can save this one for discussion. Maybe Ivan has more info? Ivan Herman: Related to what I said before - at the moment we have the publication manifest, where the base comes from is up to the various profiles… … it was all about what happens if web content has an iframe, what is the base URL? … we haven’t solved this issue, but it’s not relevant any more for the manifest… Garth Conboy: Was that a ‘leave to TPAC’ or ‘close now’? Ivan Herman: Leave to TPAC… Garth Conboy: We’ll have Laurent with us at TPAC, so that makes sense. |
This issue was discussed in a meeting.
View the transcriptWendy Reid: #12Wendy Reid: this is my favorite issue! … what if there’s a null base URL? … in light of recent changes to the specification, we have gotten rid of the canonicalization model algorithm … so maybe this is a non-issue Benjamin Young: we don’t know where these json files are used … we don’t have an origin now … if LPF would be to go to REC, we might have to figure out how the base url is calculated … but until this JSON file is related to some HTML document that can express a base URL, we don’t need to say anything … it’s blank/null by default … there are other concerns, but this issue is not an issue Proposed resolution: Close Issue #12, the canonicalization algorithm has been removed, origin is no longer a concern for Publication Manifest (Wendy Reid) Benjamin Young: before we vote … the canonicalization thing has not been removed but renamed … maybe leave that bit out … just say it’s a json data document thingy. might not be at a URL Ralph Swick: do you want to capture bigbluehat’s thought that this will be a concern in the future when the manifest is is included in some future transfer protocol(s) Proposed resolution: Close Issue #12, the canonicalization algorithm has been changed, origin is no longer a concern for Publication Manifest, but should be considered for specifications concerning discovery (Wendy Reid) Benjamin Young: +1 Wendy Reid: +1 Laurent Le Meur: +1 Gregorio Pellegrino: +1 Juan Corona: +1 Dave Cramer: +1 with an error of 1 Brady Duga: +1 Toshiaki Koike: +1 Charles LaPierre: +1 Resolution #3: Close Issue #12, the canonicalization algorithm has been changed, origin is no longer a concern for Publication Manifest, but should be considered for specifications concerning discovery |
Issue originally raised in the "opaque origin" conversation:
w3c/wpub#321 (comment)
The text was updated successfully, but these errors were encountered: