-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misrendered PDF (unreadable trailing "d") #5507
Comments
This seems like a datatracker's HTMLized PDF generation issue since |
There is also an issue with the asterisks in the list following the figure. The short answer here is that weasyprint's CSS support is real bad. We need to find a hack that makes it work. |
Here are reviewer comments on https://datatracker.ietf.org/doc/pdf/draft-ietf-jsonpath-base-13
Honestly, I have no idea why we are torturing people with this form of pdfization if it is so easy to do the real thing. |
You mean PDFize the HTML? Happy to switch to that, but users will need to be OK with that (past feedback wanted the text version PDFs.) |
I'm not sure I understand the terminology, but I don't understand why we have to have a different (and vastly inferior) rendering from the (mostly) debugged one that is provided by xml2rfc. |
(Trying to interpret the terminology: |
They are based on the plaintextified HTML using @martinthomson's CSS. (Same as is shown for HTMLized.) |
Yes. Again: I don't see the point to do this when we can do the real thing. Of course, using typewriter style conceals the fact that we cannot do standard typography correctly yet. |
As I said, we can "do the real thing". It will require the community to agree that this is what they want. Past feedback indicated they wanted PDFs of text versions. |
Note that this isn't an alternative. We could produce both real and fake PDFs, if the latter are really needed, just like we have html and "htmlized" (which no longer is). |
If we can remove codepaths and alternate representations in favor of a smaller, easier to explain set, then we should do that. But I don't think we can. Right now we do not require submission in xml. If someone provides only plain text, we do htmlize and then pdfize that. We cannot use xml2rfc to produce html or pdf. We can't really stop providing the -ized formats when we do have v3 xml, because that would force people (and systems like wikipedia) to have to learn to point to different types of things depending on the underlying arcana of the submission, which is really a non-starter. So, I don't think we can make less, and I cringe at the proposal to make more because of the confusion it causes. |
On 2023-04-20, at 00:07, Robert Sparks ***@***.***> wrote:
If we can remove codepaths and alternate representations in favor of a smaller, easier to explain set, then we should do that. But I don't think we can.
Right now we do not require submission in xml. If someone provides only plain text, we do htmlize and then pdfize that.
That is fine for me — punishes the plain text submitters as they deserve :-)
We cannot use xml2rfc to produce html or pdf.
We can't really stop providing the -ized formats when we do have v3 xml, because that would force people (and systems like wikipedia) to have to learn to point to different types of things depending on the underlying arcana of the submission, which is really a non-starter.
I’m having a hard time believing that would be a problem for PDF.
I don’t even think this would be a problem for HTML.
The RFC editor can provide proper HTML and proper PDF for those RFCs that have that (8650+), and get by with -ized surrogates for the others.
What is so special about datatracker that it can’t do that?
So, I don't think we can make less, and I cringe at the proposal to make more because of the confusion it causes.
I don’t know why we need the confusion between htmlized plaintext, typewriterized html, and real html. Having plaintext’s page numbers might be a reason, but we have botched that, too.
Grüße, Carsten
|
And remember that the reason this ticket exists appears to be that the CSS that comes with typewriterized html blows the little mind of weasyprint. Getting rid of typewriterized HTML would solve this problem right there. |
i don't know how to solve this problem, but i appreciate that y'all are looking into it. Maybe it's worth looping in the WeasyPrint developers as well? @grewn0uille, perhaps you have some hints about how the datatracker can align its CSS with what WeasyPrint supports? or maybe WeasyPrint can use the current datatracker CSS as a source of feature requests/plans for improvement? |
We could of course PDFize the text versions, but that would mean no links in PDF and ASCII figures. |
Hi @dkg, |
@martinthomson, can you provide a minimized reproducer to @grewn0uille, or at least point him to the inputs (html + CSS) passed to weasyprint for draft-ietf-lamps-e2e-mail-guidance-06 by the datatracker? |
Two issues appear to be going on here:
The figure (1) is drawn using flexbox. There are three items there. The first is a 3ch gutter, generated with a My theory is that the offending line ( I might lean more toward the font metric hypothesis, because it looks like one of these line drawing characters is being substituted from a different font as there is a small discontinuity that doesn't show in a browser. However, there are other examples further down the document that lose 2 or more characters, which either suggests that maybe that theory doesn't hold or the substitute font has very different metrics. As for a reproducer, try this: <style>div, pre { margin: 0; border: 0; padding: 0; font-family: monospace; }</style>
<div style="width: 72ch">
<div style="flex-wrap: nowrap; align-items: end; display: flex;">
<pre style="flex: 0 0 content; max-width: 72ch;">
Cryptographic Protections: none
H └┬╴multipart/mixed
J ├─╴[protected part, may be arbitrary MIME subtree]
L └─╴[footer, typically text/plain]
</pre>
<div style="flex: 0 0 1ch;">x</div>
</div>
</div> This isn't perfect, because it doesn't cut off the text, but it at least shows that the x is rendered in the wrong place. For the list (2), we're just using a marker. The list is pretty simply styled with |
Another issue here might be related to the fonts used on the datatracker. I note that the pdf of draft -06 embeds six fonts:
I don't know my pdf details well enough to know which embedded fonts were used in each section, or for each calculation, but it might be worth looking into whether the presence of specific fonts causes (or minimizes) the problem. |
It's not a font issue. I have a PR in #5688 that uses the normal fonts when generating the PDF, and the issue is still there. (But the document line-wraps now as it should at least.) |
This is what I now see:
I don't actually know where |
This problem is fixed by 0a03e3d. For the record: it only happens on the longest line of preformatted text, only when the previous line includes non-ASCII characters, and only when preformatted block’s width is set to maximum content width (in this case, it’s a flex item).
I’ll check that, but it probably comes from WeasyPrint’s limited support of the flexbox layout.
It’s used as a fallback font for characters that are not included in Noto Mono (for example ↧ or ⇩). |
This issue has already been reported: Kozea/WeasyPrint#1557. |
WeasyPrint supports |
Thanks for looking into this @liZe. |
On Mon 2023-05-29 06:21:21 -0700, Guillaume Ayoub wrote:
For the record: it only happens on the longest line of preformatted
text, only when the previous line includes non-ASCII characters, and
only when preformatted block’s width is set to maximum content width
(in this case, it’s a flex item).
Wow, this isn't just a corner case. It's a corner case of a corner case
of a corner case. Thank you for tracking this down and fixing it,
Guillaume.
|
Describe the issue
Section 4.1.2.2 of the PDF form of draft-ietf-lamps-e2e-mail-guidance-06 makes it look like there is a missing "d".
The text in that section says:
But it renders without the trailing "d" on
application/pgp-encrypted
.I can't replicate this misrendering with my own toolchain (this pdf was generated by the datatracker directly), so i don't know what specifically is causing the problem.
Code of Conduct
The text was updated successfully, but these errors were encountered: