Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tagpdf support #27

Open
u-fischer opened this issue Jul 1, 2021 · 12 comments
Open

tagpdf support #27

u-fischer opened this issue Jul 1, 2021 · 12 comments
Labels
long term This will be kept open for a long time other package Used with other packages

Comments

@u-fischer
Copy link

We are working on a project to enhance LaTeX so that it can produce tagged pdf.
https://www.latex-project.org/news/2020/11/30/tagged-pdf-FS-study/

For a tabular this means that one need to add commands quite similar to html-table commands to cells and rows.

So to successfully tag a tabular, one needs at least

  • places to inject tagging code at the begin and end of a cell and of a row (at the end even if the row it not fully filled)
  • a way to identify header rows and header columns as the code is different there.
  • a way to mark decorative elements like lines as "artifacts".

The code for the cells and rows should at best have access to data like the current row/column number.

It would be nice if tabularray would add suitable hooks for this.

@lvjr
Copy link
Owner

lvjr commented Jul 1, 2021

Sorry I know little about these at this time. I have given you write access to this repository. Please feel free to add anything you want.

@u-fischer
Copy link
Author

Thanks for the invitation. I'm sorry I don't have the time now to think about it, and in the project handling tabulars is for a good reason in a later phase of the project as this is not trivial.

But I think it is important that you consider in your code not only if you get the right visual appearance but also consider how the structure of the table is encoded. This is important if one wants to copy&paste a table or export it to html, or if people want to define layouts in a css-like manner eg as "make all header cells bolder"

@lvjr
Copy link
Owner

lvjr commented Jul 1, 2021

Yes, it is useful. I will leave this issue open and hope to come back for it one day.

@lvjr lvjr added feature request New feature or request future plan 🚀 Something for the future labels Jul 1, 2021
@lvjr lvjr removed the feature request New feature or request label Jul 2, 2021
@u-fischer
Copy link
Author

Here a very simple example (it needs a current tagpdf 0.9). It marks up a table with one column which has a header and two rows. I think it gives an impression of the code we need to inject (it is even more as I left out a few details like attributes).

If you compile this and then upload the pdf at https://ngpdf.com/loadFile you can check the html and it will give something like this

<!DOCTYPE html>
<html><head>
<title>test-utf8</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
</head>
<body lang="en-US">
 <div data-pdf-se-type="Document">
  <table data-pdf-se-type="Table">
   <thead data-pdf-se-type="THead">
    <tr data-pdf-se-type="TR">
     <th data-pdf-se-type="TH">Header</th>
    </tr>
   </thead>
   <tr data-pdf-se-type="TR">
    <td data-pdf-se-type="TD">row1</td>
   </tr>
   <tr data-pdf-se-type="TR">
    <td data-pdf-se-type="TD">row1</td>
   </tr>
  </table>
 </div>
</body></html>
\RequirePackage{pdfmanagement-testphase}
\DeclareDocumentMetadata{uncompress}
\documentclass{article}
\usepackage{tagpdf,array}
\tagpdfsetup{activate}

\begin{document}

\tagstructbegin{tag=Table}
\begin{tabular}{l}
\tagstructbegin{tag=THead}%
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TH}%
\tagmcbegin{tag=TH}%
Header
\tagmcend
\tagstructend
\tagstructend
\tagstructend
\\
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TD}%
\tagmcbegin{tag=TD}%
row1
\tagmcend
\tagstructend
\tagstructend
\\
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TD}%
\tagmcbegin{tag=TD}%
row2
\tagmcend
\tagstructend
\tagstructend
\end{tabular}
\tagstructend

\end{document}

@lvjr
Copy link
Owner

lvjr commented Jul 3, 2021

Yes, it is very interesting.

@lvjr
Copy link
Owner

lvjr commented Nov 30, 2022

I will close this issue and further comments could be leaved in issue #197.

@lvjr lvjr closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2022
@lvjr lvjr added duplicate issue This issue or pull request already exists and removed future plan 🚀 Something for the future labels Nov 30, 2022
@lvjr lvjr added other package Used with other packages and removed duplicate issue This issue or pull request already exists labels Feb 11, 2023
@lvjr lvjr changed the title hooks to add structure informations tagpdf support Feb 11, 2023
@lvjr
Copy link
Owner

lvjr commented Feb 11, 2023

I decide to reopen this issue to record experiments with tagpdf here.

@lvjr lvjr reopened this Feb 11, 2023
lvjr added a commit that referenced this issue Feb 11, 2023
@lvjr
Copy link
Owner

lvjr commented Feb 11, 2023

With the newly added public hooks and variables (#197) in trial/tabularray.sty, now we can correctly tag <table>, <tr> and <td> in the above commit.

image

<!DOCTYPE html>
<html><head>
<title>test-tagpdf-01</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body lang="en-US">
 <div data-pdf-se-type="Document" id="ID.001">
  <p data-pdf-se-type="P" id="ID.002"><span id="page-0" role="doc-pagebreak"></span>Some text.</p>
  <table data-pdf-se-type="Table" id="ID.003">
   <tbody><tr data-pdf-se-type="TR" id="ID.004">
    <td data-pdf-se-type="TD" id="ID.005"><p data-pdf-se-type="P" id="ID.006">Alpha</p></td>
    <td data-pdf-se-type="TD" id="ID.007"><p data-pdf-se-type="P" id="ID.008">Beta</p></td>
    <td data-pdf-se-type="TD" id="ID.009"><p data-pdf-se-type="P" id="ID.010">Gamma</p></td>
    <td data-pdf-se-type="TD" id="ID.011"><p data-pdf-se-type="P" id="ID.012">Delta</p></td>
   </tr>
   <tr data-pdf-se-type="TR" id="ID.013">
    <td data-pdf-se-type="TD" id="ID.014"><p data-pdf-se-type="P" id="ID.015">Epsilon</p></td>
    <td data-pdf-se-type="TD" id="ID.016"><p data-pdf-se-type="P" id="ID.017">Zeta</p></td>
    <td data-pdf-se-type="TD" id="ID.018"><p data-pdf-se-type="P" id="ID.019">Eta</p></td>
    <td data-pdf-se-type="TD" id="ID.020"><p data-pdf-se-type="P" id="ID.021">Theta</p></td>
   </tr>
   <tr data-pdf-se-type="TR" id="ID.022">
    <td data-pdf-se-type="TD" id="ID.023"><p data-pdf-se-type="P" id="ID.024">Iota</p></td>
    <td data-pdf-se-type="TD" id="ID.025"><p data-pdf-se-type="P" id="ID.026">Kappa</p></td>
    <td data-pdf-se-type="TD" id="ID.027"><p data-pdf-se-type="P" id="ID.028">Lambda</p></td>
    <td data-pdf-se-type="TD" id="ID.029"><p data-pdf-se-type="P" id="ID.030">Mu</p></td>
   </tr>
  </tbody></table>
  <p data-pdf-se-type="P" id="ID.031">More text.</p>
 </div>
</body></html>

@Witiko
Copy link

Witiko commented Oct 8, 2024

The Markdown package for TeX currently uses package tabularray to render CSV tables through the package csvsimple. However, we also wish to support PDF tagging and the package tabularray is listed as incompatible in latex3/tagging-project#177 and in https://latex3.github.io/tagging-project/tagging-status/:

image

@lvjr: You seem to have made some effort over the past two years in supporting PDF tagging in #197 and in #27 (comment). To what extend would you say that the package supports PDF tagging and what are your plans going forward?

@lvjr
Copy link
Owner

lvjr commented Oct 8, 2024

@Witiko I always keep tabularray minimal and extendable. Therefore real tex4ht/lwarp/tagpdf code should and could be maintained by any other people. Code in trial folder will get them started. And they can make some pull requests if more hooks or other code adjustment is needed.

@u-fischer
Copy link
Author

well hooks are nice, but they can be used by everyone and so you loose precise control over the places where the code is inserted. In the example below the xxxx inserted by someone else are lost in the structure.

Another problem with tagging is that we tag paragraphs automatically, and this means if you insert something into a paragraph or start a paragraph you have to carefully keep track if you want it to open a structure and if you have to close an MC-chunk or not. So the example below works fine when I deactivate the paratagging and put the tblr in a paragraph of its own, but fails with active paragraph tagging. Tracking down what is going on here and which \par and \leavevmode needs special handling, requires tests and some knowledge of the code, and is not easily done from the outside.

So the following code can get you started, but if you want to support the tagged PDF project, you will have to delve a bit deeper.

\DocumentMetadata{uncompress,pdfversion=2.0,pdfstandard=ua-2,testphase={phase-III}} 
\documentclass{book}
\usepackage{tabularray}
\UseTblrLibrary{hook}
\begin{document}
\ExplSyntaxOn
% for testing the colspan attribute:
\tagpdfsetup{
  role/new-attribute = {tblr-colspan-2}{/O /Table /ColSpan~2}}
   
\AddToHook{tabularray/cell/before}{xxxx} %foreign code not in the structure
   
\AddToHook{tabularray/trial/before}{\SuspendTagging{\tblr}}
\AddToHook{tabularray/trial/after}{\ResumeTagging{\tblr}}
\AddToHook{tabularray/table/before}{\tagstructbegin{tag=Table}}
\AddToHook{tabularray/table/after}{\tagstructend}
\AddToHook{tabularray/row/before}{\tagstructbegin{tag=TR}}
\AddToHook{tabularray/row/after}{\tagstructend}
\AddToHook{tabularray/cell/before}
 {\bool_if:NF \lTblrCellOmittedBool 
   { 
    \int_compare:nNnTF \lTblrCellColSpanTl > {1}
      {
       \tagstructbegin{tag=TD,attribute-class=tblr-colspan-\lTblrCellColSpanTl}\tagmcbegin{}
      }
      {
        \tagstructbegin{tag=TD}\tagmcbegin{}
      }
   }
 }
\AddToHook{tabularray/cell/after}
 {\bool_if:NF \lTblrCellOmittedBool {\tagmcend\tagstructend}}

\ExplSyntaxOff

\tagpdfparaOff %with para tagging the tagging of the tbl fails

\begin{tblr}{ll}
 a & b\\
 c    \\
 \SetCell[c=2]{c} d 
\end{tblr}

\tagpdfparaOn


\end{document}

image

@lvjr
Copy link
Owner

lvjr commented Nov 24, 2024

I decide to upload tagpdf-tabularray.sty in trial folder as a standalone package to CTAN next year, stating that it only works partially for simple tables and is unmaintained.

There are two benefits: (1) anyone can try the package; (2) anyone can adopt the package.

@lvjr lvjr added the long term This will be kept open for a long time label Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
long term This will be kept open for a long time other package Used with other packages
Projects
None yet
Development

No branches or pull requests

3 participants