Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PDF/A conformance when data item contains TAB character #1949

Merged

Conversation

hvbtup
Copy link
Contributor

@hvbtup hvbtup commented Oct 23, 2024

This replaces TAB characters with spaces when Data items contain TAB control characters, by replacing the TABs with spaces in the transform method.

@speckyspooky
Copy link
Contributor

Hello @hvbtup ,
I take a look into your change and I'm confused a little bit because you try to fix PDF/A
but your change will have an impact of all PDF not only PDF/A-type.

So we have a changed behavior of all and is some cases Tab-sign should be used and not removed in generally for all PDF-documents.

grafik

@hvbtup
Copy link
Contributor Author

hvbtup commented Oct 23, 2024

There are no use-cases where a TAB characters makes sense in a PDF document.

The rendering of a TAB character, if it exists in the text content of a PDF, is not specified. When I tested my report (before I changed the code), the TAB character caused no spacing between the characters before and after it, and it didn't work when I tried to copy the text from the PDF into e.g. a text editor or MS Word.

In contrast to a space character or a letter etc, a TAB character does not have a logical width property, which is the main reason why it cannot work in PDF.

It seems that TAB characters cannot really be rendered in a PDF, and while this merely causes unspecified behavior usually, it is a bug for the PDF/A format.

This is an issue not only with BIRT, but with other applications as well.
See also

I know that replacing a TAB with a SPACE is usually not the text author's intention: What they actually want is "advance the insert cursor position to the next "tab stop position". This concept stems from the age of type-writers decades ago.
Today, we find it in two concepts:

  • With a fixed width font, go ahead to the next character position which is a multiple of N or the like. The value of N once was 8 by convention, but now the value is usually depending on the environment (might be 4 in an editor).

  • In a word processor, we can explicitly define TAB stops e.g at 1.27cm, 4cm and 6cm. The expected behavior is similar to the fixed-width case: Advance the cursor to the next defined stop position.

Both of these concepts are neither supported by BIRT nor in PDF.

Adding TAB stop support to BIRT would probably require more or less difficult changes in the layout logic code, and I assume it would be impossible for other emitters.

Replacing TAB characters with spaces for PDF output is a reasonable second-best solution IMHO.

@hvbtup hvbtup merged commit a26bf12 into eclipse-birt:master Oct 24, 2024
3 checks passed
@hvbtup hvbtup deleted the Fix-PDF/A-TAB-character#1948 branch October 24, 2024 12:21
@speckyspooky speckyspooky added the BugFix Change to correct issues label Oct 24, 2024
@speckyspooky speckyspooky added this to the 4.18 milestone Oct 24, 2024
@speckyspooky
Copy link
Contributor

Yes, you are right. There is no really good alternative of the tab sign for PDF.
The change to have a configurable tab sign wouldn't be really helpfull because the only way would be to use multiple spaces.
Other options wil not work on other emitters.
So excel display the tab sign very different. You see the tab on the excel edit screen but on cell level it is completely removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugFix Change to correct issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants