Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character set warning after page break #2453

Closed
pwaehnert opened this issue Oct 4, 2023 · 6 comments
Closed

Character set warning after page break #2453

pwaehnert opened this issue Oct 4, 2023 · 6 comments
Assignees
Milestone

Comments

@pwaehnert
Copy link

If a sufficiently long text is followed by a block image such that the image must be positioned on a new page I got a warning about not fully convertible characters. The minimal example is a bit dull:

// test.adoc
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt
...
ipsum dolor sit amet.

// repeat that previous paragraph as often as necessary such that the following block image is placed on a new page

image:test.png[]

This file must be converted by the following call:

> asciidoctor-pdf -a pdf-theme=base -w -v -t test.adoc
asciidoctor: WARNING: The following text could not be fully converted to the Windows-1252 character set:
| ⁣?

The automatic page break seems to be the problem. But interestingly enough the base theme seems to play an important role too, since omitting it fixes the issue.

@mojavelinux
Copy link
Member

Please provide a full reproducible example so I can run it. If this is, in fact, an issue, it seems to depend on a very specific set of circumstances that I don't want to have to spend time trying to figure out how to reproduce.

@pwaehnert
Copy link
Author

Here's a minimal AsciiDoctor file:

// test.adoc
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

image:test.png[]

And this might be an example image test.png:

test

This minimal example is converted by the following command line call:

> asciidoctor-pdf -a pdf-theme=base -w -v -t test.adoc
asciidoctor: WARNING: The following text could not be fully converted to the Windows-1252 character set:
| ?

And I'm using Windows 10. I suspect the warning won't appear on GNU/Linux nor macOS.

@pwaehnert
Copy link
Author

I also tried to minimize the base theme. It is indeed possible to reduce the used theme to an empty file. But it is nonetheless essential to refer to a theme, even if it is empty. I think that the default values without any given theme contain something special that triggers this encoding warning.

@mojavelinux
Copy link
Member

mojavelinux commented Oct 5, 2023

I see what's happening here. The text for the fragment of an inline image is temporarily set to a placeholder character (\u2063) in order to reserve space where the image will be inserted. That placeholder character is never rendered. However, when the text containing that fragment is advanced to a new page, it causes the text to be normalized again. If the font is an AFM font, as is the case when using the base theme, it checks that the character can be encoded into the Windows 1252 character set, a requirement of using an AFM font. In this case, \u2063 cannot be encoded. This is when the converter normally looks for a fallback character. However, the fallback character for \u2063 is not defined in the map. (See https://github.com/asciidoctor/asciidoctor-pdf/blob/v2.3.9/lib/asciidoctor/pdf/ext/prawn/font/afm.rb#L6-L12).

The fix that is needed here is to add a fallback character for \u2063 to the aforementioned map so that the text normalization operation succeeds. That character is never rendered, so the fallback value doesn't really matter.

I will apply a fix and add a test for this.

With that said, the base theme is not intended to be used directly. Rather, it is intended as a theme that you extend to add your own fonts and styles. The base theme defaults to AFM fonts, but these fonts are extremely limited. You are encouraged to use TrueType fonts, as described in the docs at https://docs.asciidoctor.org/pdf-converter/latest/theme/font-support/.

If you use the default theme instead of the base theme, you would not receive this warning. Also, this is just a pedandic/verbose warning, not an error. It just communicates when the converter encounters a character that it cannot deal with. If that character isn't important, which is the case here, it does not impact the result.

@mojavelinux mojavelinux self-assigned this Oct 5, 2023
@mojavelinux mojavelinux added the bug label Oct 5, 2023
@mojavelinux mojavelinux added this to the v2.3.x milestone Oct 5, 2023
mojavelinux added a commit to mojavelinux/asciidoctor-pdf that referenced this issue Oct 5, 2023
mojavelinux added a commit that referenced this issue Oct 5, 2023
@pwaehnert
Copy link
Author

Thank you for the explanation and the immediate bug fix! Are there any plans when Version 2.3.10 will be published?

I don't use the base theme directly but extend it in my own theme. But you're right, I didn't overwrite the font properties.

I already suspected that this warning doesn't mean much. But I'd like to rise the failure level to warnings in order to fetch other bugs early enough. For example if images are missing I'd like to abort the conversion. It is very tedious to scan manually through our large documentation in order to spot those missing images.

@mojavelinux
Copy link
Member

mojavelinux commented Oct 5, 2023

Are there any plans when Version 2.3.10 will be published?

When I get to it. Though I am interested in getting a release out as the fixes are starting to accumulate.

But I'd like to rise the failure level to warnings in order to fetch other bugs early enough.

You could still do that, just take away the -v flag. The -v flag is adding pedantic warnings, which shouldn't be correlated with a fail fast since they are not warnings which are guaranteed to be problems. In other words, they are intended to be informational.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants