Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass HTML through to TeX4ht #90

Closed
8 tasks done
Tracked by #120
Witiko opened this issue Aug 8, 2021 · 16 comments
Closed
8 tasks done
Tracked by #120

Pass HTML through to TeX4ht #90

Witiko opened this issue Aug 8, 2021 · 16 comments
Labels
feature request latex Related to the LaTeX interface and implementation lua Related to the Lua interface and implementation tex4ht Related to support for the TeX4ht system for converting documents written in TeX/LaTeX/ConTeXt/etc. tug 2021 Related to the TUG 2021 conference
Milestone

Comments

@Witiko
Copy link
Owner

Witiko commented Aug 8, 2021

Since version 2.3.0, the Markdown package has supported the html Lua option, which makes the Lua parser recognize display and inline HTML. Since version 2.10.0, inline HTML comments have been actionable, but other HTML nodes have been removed during the conversion to TeX. In order to enable the conversion from Markdown to HTML using @michal-h21's make4ht without the loss of HTML, as discussed in #63 (comment), we should:

We should add these definitions to markdown.dtx, not to the TeX4ht literate sources, since users may completely change the semantics of the HTML nodes by redefining the renderers. In that case, we don't want the renderers to produce \HCode.

@Witiko Witiko added feature request lua Related to the Lua interface and implementation latex Related to the LaTeX interface and implementation labels Aug 8, 2021
@Witiko Witiko added this to the 2.11.0 milestone Aug 8, 2021
@Witiko Witiko self-assigned this Aug 8, 2021
@Witiko Witiko changed the title Pass-through HTML to TeX4ht Pass HTML through to TeX4ht Aug 8, 2021
@Witiko Witiko removed their assignment Aug 8, 2021
@Witiko Witiko added the tug 2021 Related to the TUG 2021 conference label Aug 8, 2021
@michal-h21
Copy link

That's a great idea! I already tested the Markdown package with TeX4ht and it works in most cases. I think that I experienced some errors, but I don't remember what was the cause.

Anyway, regarding passing HTML from Markdown, I think that just \HCode isn't sufficient, because it won't correctly close paragraphs. You will need something like this:

\documentclass{article}
\ifdefined\HCode
\newcommand\htmlcode[1]{\ifvmode\IgnorePar\EndP\fi\HCode{#1}}
\else
\newcommand\htmlcode[1]{}
\fi
\begin{document}

Hello 

\htmlcode{<div>}\htmlcode{<span>}world\htmlcode{</span>}\htmlcode{</div>}

Text continues, we may also try an \htmlcode{<i>}inline HTML\htmlcode{</i>}.

\end{document}

It produces a following HTML with make4ht:

<!-- l. 9 --><p class='noindent'>Hello
</p>
   <div><span>world</span></div>
<!-- l. 13 --><p class='indent'>   Text continues, we may also try an <i>inline HTML</i>.
</p>

Without \ifvmode\IgnorePar\EndP\fi, you would get <p> around the <div> element, which is non-valid in HTML.

@Witiko Witiko modified the milestones: 2.11.0, 2.12.0 Oct 1, 2021
@Witiko Witiko modified the milestones: 2.12.0, 2.13.0, 2.14.0 Dec 30, 2021
@Witiko
Copy link
Owner Author

Witiko commented Feb 7, 2022

@michal-h21 I have added todos for this issue (see #90 (comment)) and I plan to tackle them by the end of the month.

I have a couple of concerns:

  • Adding a renderer for inline HTML seems simple enough. However, for multi-paragraph block elements, it may be easier to pass the HTML around in an auxiliary file, so that it can be appended as-is to the output without expansion. Is there any such mechanism in TeX4ht? Otherwise, it may be practical to only support inline HTML for the moment.

  • Since TeX4ht is not a format, it does not make much sense to include it in unit tests. However, it would be useful to have an example document for make4ht. I will write one, but I will appreciate review. In the long run, I will want to convert the user manual from Pandoc to TeX4ht, so that we are fully self-sufficient, but that does not seem terribly important right now.

Witiko added a commit that referenced this issue Feb 10, 2022
Witiko added a commit that referenced this issue Feb 10, 2022
@michal-h21
Copy link

@Witiko yes, it is possible to input contents of HTML file, using the \special command. There is still a issue with paragraphs. They need to be closed before inputting the HTML snippet, and then opened after.

The following code shows the concept:

\documentclass{article}
\begin{document}

This is text from the TeX file

\ifvmode\IgnorePar\fi\EndP
\special{t4ht*<hello.html}
\par\ShowPar

Here continues text from the TeX file
\end{document}

The \ifvmode\IgnorePar\fi\EndP is needed to close the paragraph, and \par\ShowPar opens paragraph after the inputed file. `\special{t4ht*<filename} includes the file.

I can surely help with the example.

@Witiko
Copy link
Owner Author

Witiko commented Feb 11, 2022

@michal-h21 Thank you, this is much appreciated!

@Witiko Witiko modified the milestones: 2.14.0, 2.15.0 Feb 27, 2022
Witiko added a commit that referenced this issue Feb 27, 2022
@Witiko Witiko modified the milestones: 2.15.0, 2.14.0 Feb 27, 2022
Witiko added a commit that referenced this issue Feb 27, 2022
Witiko added a commit that referenced this issue Feb 27, 2022
Witiko added a commit that referenced this issue Feb 27, 2022
Witiko added a commit that referenced this issue Feb 27, 2022
Witiko added a commit that referenced this issue Feb 27, 2022
@Witiko
Copy link
Owner Author

Witiko commented Feb 27, 2022

@michal-h21 In b00280b, I added an example document. The good news is that the HTML pass-through works as expected:

Here is some <b>HTML code</b> mixed *with Markdown*. In pdf \TeX, the HTML code
will be silently ignored, whereas in \TeX 4ht, the HTML code will be passed
through to the output:

<table border="1">
  <tr>
    <td>Emil</td>
    <td>Tobias</td>
    <td>Linus</td>
  </tr>
  <tr>
    <td>16</td>
    <td>14</td>
    <td>10</td>
  </tr>
</table>

output of the above code

@Witiko
Copy link
Owner Author

Witiko commented Feb 27, 2022

The bad news is that with up-to-date TeX Live 2021, the compilation produces some errors that seem pretty arcane to me:

Untitled

To see what was going on, I ran the following command. The error log indicates that at some point \cur:rule is not defined, but I did not dig deeper.

$ htlatex latex.tex "xhtml,html5,mathml,charset=utf-8" " -cunihtf -utf8" "" -shell-escape

output of the above command

Worse yet, make4ht seems to crash our continuous integration with TeX Live 2018 and 2019:

$ make4ht -m clean latex
Output dir: 	
Compiler: 	latex
Latex options: 	 -jobname=latex 
tex4ht.sty :	xhtml,
tex4ht	
build_file	latex.mk4
Output format	html5
/usr/local/texlive/2018/bin/x86_64-linux/make4ht:62: attempt to index local 'formatter' (a nil value)

Since I would like to release today or tomorrow, I plan to merge commit 44affc3 and ship Markdown 2.14.0 without an example document TeX4ht, which I would merge later when ready. However, if we could get the example document in order today or tomorrow at the latest, we could ship it with Markdown 2.14.0 already.

Witiko added a commit that referenced this issue Feb 27, 2022
@michal-h21
Copy link

@Witiko this issue seems to be caused by tables. \cur:rule is used by the TeX4ht configuration for Booktabs, but you will get the underlying error even when you comment out Booktabs in the LaTeX document.

I think this issue is caused by these lines in markdown.sty:

\@ifpackageloaded{booktabs}{
  \let\markdownLaTeXTopRule\toprule
  \let\markdownLaTeXMidRule\midrule
  \let\markdownLaTeXBottomRule\bottomrule
}{
  \let\markdownLaTeXTopRule\hline
  \let\markdownLaTeXMidRule\hline
  \let\markdownLaTeXBottomRule\hline
}

As TeX4ht overwrites \hline, \toprule, and the rest of the table rule commands with code that inserts HTML code, you get an error when you save their unpatched versions in \markdownLaTeXTopRule etc. Simply using \AtBeginDocument around this code block seems to fix this. It has also the advantage that it will support Booktabs loaded after the Markdown package.

Witiko added a commit that referenced this issue Feb 28, 2022
@Witiko
Copy link
Owner Author

Witiko commented Feb 28, 2022

Always a pleasure to see someone more capable at work! I added \AtBeginDocument as suggested and replaced \let with \def for further robustness in 03a444a. A quick search shows that make4ht won't work in TeX Live 2018 and 2019, so I replaced your make4ht command with htlatex in the Makefile in 67a5830, but mentioned the corresponding make4ht command in a comment. As soon as we have lost support for TeX Live 2018 and 2019, we can replace htlatex with make4ht.

@Witiko
Copy link
Owner Author

Witiko commented Feb 28, 2022

The example document seems passable except for the imbalanced <center> tag that we can see already in the output of make4ht in #90 (comment) and which causes all the text below the blockquote to be centered:

> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
   <div class="center-quotation"><table class="quotation" 
border="0" cellpadding="0" cellspacing="15"><tr><td> 
        <blockquote class="quotation">
     <!--l. 87--><p class="indent" >     This is the first level of quoting.
</p>
        <div class="center-quotation"><table class="quotation" 
border="0" cellpadding="0" cellspacing="15"><tr><td> 
       <blockquote class="quotation">
    <!--l. 89--><p class="indent" >     This is nested blockquote.</p></blockquote>
     </td></tr></table></center>
        Back to the first level.</blockquote>
</td></tr></table></center>

image of the above code

It seems like TeX4ht should close the opening <div class="center-quotation"> with a closing </div> rather than </center>?

@michal-h21
Copy link

@Witiko this is strange, it shouldn't produce any <table class="quotation">, we use just <blockquote> for quotations. And the sample from your previous message produces this on my system:

        <blockquote class='quotation'>
     <!-- l. 3 --><p class='noindent'>This is the first level of quoting.
       </p><blockquote class='quotation'>
    <!-- l. 5 --><p class='indent'>     This is nested blockquote.</p></blockquote>
     <!-- l. 7 --><p class='indent'>     Back to the first level.</p></blockquote>

@Witiko
Copy link
Owner Author

Witiko commented Feb 28, 2022

@michal-h21 Interesting. I have the following minimal example document example.tex:

\documentclass{article}
\usepackage{markdown}
\begin{document}
\begin{markdown}

> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.

\end{markdown}
\end{document}

Running htlatex example "xhtml,fn-in,html5,charset=utf-8" " -cunihtf -utf8" '' -shell-escape or make4ht --shell-escape example fn-in produces the following output with up-to-date TeX Live 2021:

   <div class="center-quotation"><table class="quotation" 
border="0" cellpadding="0" cellspacing="15"><tr><td> 
        <blockquote class="quotation">
     <!--l. 3--><p class="indent" >    This is the first level of quoting.
</p>
        <div class="center-quotation"><table class="quotation" 
border="0" cellpadding="0" cellspacing="15"><tr><td> 
       <blockquote class="quotation">
    <!--l. 5--><p class="indent" >    This is nested blockquote.</p></blockquote>
     </td></tr></table></center>
        Back to the first level.</blockquote>
</td></tr></table></center>

@michal-h21
Copy link

I've found that this happens with the amsmath package for me. It seems to come from the configuration for the aligned environment. I need to investigate it a bit, but it seems like a bug on TeX4ht side.

@Witiko
Copy link
Owner Author

Witiko commented Feb 28, 2022

Thank you. In that case, I think we can merge and release this branch.

@michal-h21
Copy link

I hope so. I've already found the cause of the HTML validation error, but I am still not sure why we execute this code when Amsmath is loaded.

@Witiko Witiko closed this as completed in 2f5dcba Feb 28, 2022
@michal-h21
Copy link

I've found also the underlying issue, there was some line that configured quotation as if it was a math environment when Amsmath is loaded. I've fixed that in TeX4ht sources.

@Witiko
Copy link
Owner Author

Witiko commented Feb 28, 2022

For future reference, the issue has been fixed in revisions 1085 and 1086 of TeX4ht.

@Witiko Witiko added the tex4ht Related to support for the TeX4ht system for converting documents written in TeX/LaTeX/ConTeXt/etc. label Sep 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request latex Related to the LaTeX interface and implementation lua Related to the Lua interface and implementation tex4ht Related to support for the TeX4ht system for converting documents written in TeX/LaTeX/ConTeXt/etc. tug 2021 Related to the TUG 2021 conference
Projects
None yet
Development

No branches or pull requests

2 participants