-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SVG DTD not accepted #1114
Comments
B.t.w., formally, the latest Recommendation is still SVG 1.1. SVG2 is a Candidate Recommendation. |
I guess this issue should be discussed in the EPUB CG? EPUBCheck conforms to the EPUB spec here. I personally agree the rule is probably too strict, especially for SVG used as images and not as content docs. But any deviation from the spec should be approved by the CG 😊 |
This would have to be a change to the core spec, as external references are strictly forbidden from DTDs per https://www.w3.org/publishing/epub3/epub-spec.html#confreq-xml-extmarkupdecl It was put in at least in part for security reasons, as I recall, but perhaps there's a case to allow specific external references outside of HTML files. Definitely needs discussion. |
A similar issue may actually come up with MathML, if the DTD is used: I would think that the spec (and check) should relax on the standard, official DTD-s for the accepted content types. I am not sure why a reading system would even look at those DTDs in the first place. |
Have any publishers complained? If no publishers complain, I do not want to fix EPUB3. |
I am not a publisher:-), but I stumbled on this issue through a script that turns W3C recommendations into EPUB. I realize that is a small and not-very-important example. However, we see (at last!) a number of tools producing EPUB files from, e.g., popular editors (Apple Pages, Google Doc, LibreOffice, soon MS Word). Which is of course great, it makes EPUB becoming some sort of commodity; production of EPUB is not, and should not, be done only by bona fide publishers. However, alongside this evolution, we will also see, for example, more SVG content coming into EPUB from tools like Adobe Illustrator. (The specific SVG file that hit me was an Illustrator output.) That means such issues will come. B.t.w., such a change can be done in a very mild way, without affecting any deployed content. The essence of the restriction may remain in place; the only exception would be to ignore the official DTD-s of contents that are listed in the EPUB spec (MathML, SVG, SMIL, ...). I do not see any harm doing that. |
Is this also a matter of making EPUBs self-contained? If you need a network request to parse an XML file, how would the EPUB work in the proverbial train tunnel? The security implications seem non-trivial: https://en.wikipedia.org/wiki/XML_external_entity_attack |
Ya, but the solution in epub tends to be. Reading systems should be the ones not resolving external entity references. There's not much value in restricting authoring of such references, as if they're being used maliciously are you really running them through epubcheck? |
I think maintaining the restriction may be a good idea. However, there may be a set of well known, and standard DTD, whose parsing is probably unnecessary anyway, ie, that serve only some sort of identification. That should include the DTD-s for the content documents that are allowed in EPUB in the first place, ie, SVG, MathML, or SMIL (I did not check whether all of them depend on entities). |
In EPUBCheck, we use a URL resolver and local copies of DTDs and other references. I guess a similar approach would be taken by a reading system to prevent unnecessary network requests. As for security implications, they're real indeed, but again this is the kind of things that are better dealt with at the software level than at the spec level, IMO. For instance, a few years ago, EPUBCheck was reported a vulnerability to XML external entity attack (CVE-2016-9487), which we fixed in version 4.0.2. The spec didn't prevent this happening. |
This is a very old and annoying issue to me. The original XML people tried to solve it, but it still bites us. Allowing external DTD subsets might open a can of worms, but we can probably avoid them. First, the standard DTD for SVG 1.1 does not define parsed entities (unless people customize the DTD). Thus, XML processors that do not examine the external DTD subset have no problems in parsing SVG documents. Second, this is not true for MathML. A number of parsed entities (such as ↔ for U+2194 (LEFT RIGHT ARROW)) are defined or borrowed by the MathML DTD. But the restriction in EPUB 3.2 prohibits the use of such parsed entities in EPUB, unless people define parsed entities in the internal DTD subset So, we can safely assume that people use Unicode characters rather than parsed entities of MathML. Is this true? > @TzviyaSiegman Here is my proposal.
P.S. Some XML geeks might use the internal DTD subset for defining parsed entities. We might want to require a declaration in the internal DTD subset for each of the parsed entities used in the XML document (unless they are defined in the XML recommendation.) |
I would certainly hope so... MathML 3, which is the latest recommendation of MathML, does not require the use of DTD-s (just as SVG1.2 will also do away with a required DTD). As you said, the specific mathematical characters can be taken care of by their Unicode equivalents these days. (What I do not know, however, is what various converters, like LaTeX->MathML, do. I would think that most of the authors provide their equations in LaTeX, which is then converted into MathML. Some of those tools may be older and rely on DTD-s, just like the SVG files generated by Adobe Illustrator...) |
SMIL also uses DTD-s with entities. Do we allow explicit SMIL files in an EPUB? As far as I can see SSML does not require DTD-s :-) |
Media Overlay uses SMIL documents. But MO defines a very small subset of SMIL, which does not contain entities. |
But the DTD does... Ie, even if a small subset is used but the specific SMIL file includes the DTD then we have a problem. It is exactly the same problem as with mine with SVG which started this thread :-( |
We could also adopt an approach similar to the HTML Standard’s XML parsing rules:
|
But this problem is theoretical, since nobody would like to create SMIL documents containing document type declarations. I also think that references to parsed entities are not needed in MO documents.
Makes sense.
Are you proposing to allow references to entities defined or borrowed by the MathML DTD? If we allow them, XML processors that do not retrieve external DTD subsets cannot handle them.
I also think that XML processors that retrieve external DTD subsets are not very useful for EPUB. But I am not sure if we have to say "should not attempt to retrieve" them, since somebody might want to validate SVG or MathML using DTDs. |
Should this issue be addressed by publishing errata to EPUB 3.2 or by the next version of EPUB 3 (by the upcoming WG)? |
If there is an erratum for EPUB 3.2, it ought to be taken up by the upcoming WG anyway. I do not think there is an urgency... |
I've discovered this old issue yesterday and may be missing something, but:
for instance for excluding this:
In conclusion, I don't see the point of this issue. |
This statement is, however, a bit ambiguous:
Two ways of reading this is that the
At the minimum, this ambiguity must be clarified in the spec if indeed only the second item is meant. That should then probably be followed up in the EPUB Issue.
This is the issue on epubcheck and not on the spec. There is a practical problem, as described in the original description of this issue. This issue is very much relevant until epubcheck is not changed to accept bona fide SVG files as valid. If your interpretation is correct, then maybe the EPUB Issue can be closed although I think that, at the minimum, a clarification in the spec is necessary. (We should stick to one issue to discuss all this, it is not helpful to switch between two different repositories...) |
You are not the only person confused by the definition of DTDs. A DTD is a pair of an external DTD subset and an internal DTD subset. Laurent's example document contains a DTD, which has an internal DTD subset only. No external entities can occur in XML documents unless they are defined by DTDs. |
@murata2makoto I certainly yield to your XML knowledge. But then
|
I agree with your first point. Do not understand your second. |
If your interpretation of the EPUB Spec text is correct, then there is a bug in epubckeck. It should not flag an SVG DTD as an invalid content; it does today. |
We could solve the issue by looking at it from another point of view: XML external entities can either be used by an XML instance (as in my previous sample) or by the DTD itself, as in the modularized SVG 1.1 DTD. The first case is harmful, the second is not (the reason being that reading systems don't validate XML instances before display). We solve the issue by re-phrasing the constraint "External identifiers MUST NOT appear in the document type declaration" as (the context being the Publication Resource) "It MUST NOT make use of XML external entities defined in a document type declaration". EPUBCheck will have to be modified to accept the presence of external entities in DTD and block only if such entites are used in XML instances. |
(Admin comment) @llemeurfr linking this to the ongoing discussion on w3c/epub-specs#1338, in particular w3c/epub-specs#1338 (comment)... We should really continue this discussion on one thread only... |
Just a quick clarification, as I read the discussion of the past few days: The EPUB 3.2 XML Conformance states that for any XML publication resource:
This unambiguously says that the SVG doctype declaration given by @iherman as an example is disallowed, which EPUBCheck reports (correctly, for now). I don't see what's to be "interpreted" here, or what's the bug in EPUBCheck. Am I missing something? Note that I'm not saying the spec is good there. The spec can be made less restrictive (I personally believe it should), but then again, that's for the WG to decide. I'm marking that issue as blocked, as it depends on updating the spec. |
Well, looking at #1114 (comment), #1114 (comment), #1114 (comment) it seems that things are not that clear. |
OK, so let's put this differently 😊:
Is it the second bullet above that you find ambiguous? 🤔 All I'm saying is that the EPUB conformance criteria doesn’t even goes into looking at the DTD content. It only says, "the doctype must not have a public identifier nor a system identifier". Regarding the comments you reference:
You say (slightly reorganizing your comment):
The first one is true, as a corollary to the original statement. But really, the original statement suffice in itself to reject the SVG 1.1 doctype, with no need to a special "way of reading" or "interpretation" 😊 |
@rdeltour o.k., I agree that the issue here can be closed. I got a bit messed up by some of the reactions on this issue, to be honest. My real issue is w3c/epub-specs#1338, i.e., that the current restrictions in the EPUB spec is (in my view) unreasonably stringent. |
This issue was discussed in a meeting of the EPUB WG.
View the transcriptProblem with XML content requirements and external entities: make MathML and SVG1.1 invalidDave Cramer: w3c/epub-specs#1338 Dave Cramer: #1114 Dave Cramer: External entities. Dave Cramer: https://www.w3.org/publishing/epub32/epub-spec.html#confreq-xml-extmarkupdecl Dave Cramer: Lots of XML statements in specs: External identifiers MUST NOT appear in the document type declaration [XML]. Dave Cramer: This is a problem. … Breaks normal things — like inline SVG with DOCTYPE. … I think that’s bad. … Forcing folks to edit machine created content seems like a waste of effort. Brady Duga: Agree with the summary and history Dave Cramer: EPUB RS’s are not validating SVG’s with DOCTYPE of against that. … Should not expect an RS to run an validator. … Hope RS’s don’t need to be validating XML parsers. … Maybe just say “ignore DOCTYPE’s” — don’t use content that depending on processing DTD features Brady Duga: Stunned if anybody other than epubcheck is doing said validation. Wendy Reid: +1 we do this too Dave Cramer: Assuming WV’s doing to such validation. Brady Duga: What is this XML of which you speak? … RS’s likely display HTML, not XML, content. … Natively not XHTML in RS’s. Wendy Reid: +1 Dave Cramer: Principle in HTML that we should try to match reality — maybe we should do the same in here? Zheng Xu (徐征): +1 to not restrict Dave Cramer: Resolve tonight restricting DTDs in XML documents that are part of an epub. Garth Conboy: I think we should try it! Wendy Reid: In line with what Dave has say it — try it for now. It will come done to testing too. … Can that bridge when we come to it. Proposed resolution: remove bullet “External identifiers MUST NOT appear in the document type declaration [XML].” Add reading system conformance saying that reading systems SHOULD NOT validate against DTDs. (Dave Cramer) Garth Conboy: +1 Wendy Reid: +1 Dave Cramer: RS’s have no validation requirement against DTDs that may be found. Shinya Takami (高見真也): +1 Brady Duga: +1 Yu-Wei Chang (Yanni): +1 Ben Schroeter: +1 Wendy Reid: If you use inline SVG’s out of ID (and other commercial tools) you won’t get unexpected errors. Matthew Chan: 0 Wendy Reid: Found W3C listing DTDs that should be used; seems strange to prohibit. Laura Brady: +1 Wendy Reid: No objection; relax restriction on external identifiers in DTDs. Resolution #2: remove bullet “External identifiers MUST NOT appear in the document type declaration [XML].” Add reading system conformance saying that reading systems SHOULD NOT validate against DTDs. |
EPUB 3.3. now allows a reserved set of external identifiers in doctype declarations of documents with select media types. See: https://www.w3.org/TR/epub-33/#app-identifiers-allowed This commit: - adds those as special cases to the XML parser code - totally removes entity fetching for EPUB 3.3 - keeps forbidding external entities in the internal subset Fix #1192, Fix #1114
(This seems to be related to #173 (comment), although not to
rdf:RDF
, which is the original topic of that issue.)The W3C SVG Logo is not accepted by epubcheck, complaining about "External DTD entities are not allowed.". That is problematic.
The SVG file itself does not include any external DTD entity. However, the Logo file does contain:
Indeed, the DTD file itself contains loads of entities. It is also correct that the latest version of SVG (SVG2) has removed the reference to a DTD (see change section). But flagging this is as an error does create backward incompatibility issues with bona fide SVG1.1 content which did rely on the DTD (e.g., all examples in SVG1.1 include that DTD reference, but also all SVG content produced by earlier versions of Adobe Illustrator, to take just this example). Note that all browsers accept such SVG 1.1 content without further ado.
I would think that epubcheck should simply ignore the SVG DTD and shouldn't report any error or warning on that one.
The text was updated successfully, but these errors were encountered: