Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LaTeX environments in Markdown -> HTML conversion #1938

Closed
juliangilbey opened this issue Feb 11, 2015 · 35 comments
Closed

Support LaTeX environments in Markdown -> HTML conversion #1938

juliangilbey opened this issue Feb 11, 2015 · 35 comments

Comments

@juliangilbey
Copy link

The following piece of LaTeX-enriched markdown:

This is some math.

\begin{aligned}
x&=1\label{eq:1}\\
y&=2
\end{aligned}

End of math. \eqref{eq:1}

converts beautifully to LaTeX with pandoc -f markdown -t latex. However, when converting to html5, even with the --mathjax option, I can't figure out any way to persuade pandoc to maintain the aligned environment or the \eqref, despite the fact that MathJax can handle these.

Any suggestions?

Thanks!

@jgm
Copy link
Owner

jgm commented Feb 12, 2015

  1. Math in pandoc needs to be inside $..$ or $$..$$ delimiters. Your example worked for latex/pdf output because pandoc passes through raw tex to these formats (but not to HTML).
  2. Labels and references don't work with pandoc math.
  3. It occurs to me that it might make sense to pass through raw latex environments to HTML in the special case where --mathjax is used. This would solve your problem nicely.

@timtylin
Copy link
Contributor

Actually, surround the \begin\end{aligned} with the $$..$$ delimiter, and surround \eqref{eq:1} with the inline $..$. Output to HTML with MathJax. Should work.

@timtylin
Copy link
Contributor

Oh, and make sure you enable standalone mode

@nkalvi
Copy link

nkalvi commented Feb 12, 2015

Very good!

This

This is some math.

$$
\begin{aligned}
x&=1\label{eq:1}\\
y&=2
\end{aligned}
$$

End of math. $\eqref{eq:1}$

with

pandoc math.txt -t html -s -o test.html --mathjax=https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML

Displays as this in Safari (as mentioned above, labels don't work)

screen shot 2015-02-11 at 7 33 46 pm

@timtylin
Copy link
Contributor

Ah, the (???) is actually a recent MathJax bug.

See: mathjax/MathJax#1020

@nkalvi
Copy link

nkalvi commented Feb 12, 2015

It does work as expected with the following changes (@timtylin the bug you mentioned is limited to multi-line labels, and there's workaround for it):

  1. Change aligned to align
  2. Include MathJax function needed for numbering to HTML header

Modified source:

This is some math.

$$
\begin{align}
x&=1\label{eq:1}\\
y&=2
\end{align}
$$

End of math. $\eqref{eq:1}$

Addition to HTML header:

  <script type="text/x-mathjax-config">
    MathJax.Hub.Config({ TeX: { equationNumbers: {autoNumber: "all"} } });
  </script>

Pandoc command:

pandoc math.txt -t html -s -o test.html --mathjax=https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML -H mathjax-header-include.txt 

Result:
screen shot 2015-02-11 at 10 16 16 pm

@juliangilbey
Copy link
Author

Duh. I meant to originally post the following:

This is some math.

\begin{align}
x&=1\label{eq:1}\\
y&=2
\end{align}

End of math. \eqref{eq:1}

since, of course, the aligned environment only works within a $$...$$ section (and this works fine with pandoc). The align environment could be replaced by align*, equation, gather and so on. The align environment does automatic equation numbering, which is very nice, and mathjax can handle this.

@juliangilbey
Copy link
Author

(I would reopen this but I don't know how to - would it be better to open a new issue with the correct report?)

@nkalvi
Copy link

nkalvi commented Feb 12, 2015

I'm not sure why you'd want to reopen it. Wouldn't the method suggested above work for you?

@jgm
Copy link
Owner

jgm commented Feb 12, 2015

I think it might be worth preserving the suggestion I made, that raw LaTeX
blocks should be passed through to HTML when --mathjax is used.

+++ nkalvi [Feb 12 15 14:46 ]:

I'm not sure why you'd want to reopen it. Wouldn't the method suggested above work for you?


Reply to this email directly or view it on GitHub:
#1938 (comment)

@nkalvi
Copy link

nkalvi commented Feb 13, 2015

It looks like that's what Pandoc is doing - am I wrong? Here's the output:

<body>
<p>This is some math.</p>
<p><span class="math">\[
\begin{align}
x&amp;=1\label{eq:1}\\
y&amp;=2
\end{align}
\]</span></p>
<p>End of math. <span class="math">\(\eqref{eq:1}\)</span></p>
</body>

Pardon me if I'm not getting it - I'm quite new to Pandoc and LaTex. It'd be helpful if you could post what the desired output is.

@jgm
Copy link
Owner

jgm commented Feb 13, 2015

% pandoc -f markdown -t native -t html --mathjax
\begin{aligned}
x = 1 & y = 2\\
\end{aligned}
^D

(output is empty)

Note: I'm talking about raw latex environments that are not included
in $$ delimiters.

+++ nkalvi [Feb 12 15 16:21 ]:

It looks like that's what Pandoc is doing - am I wrong? Here's the output:

<body>
<p>This is some math.</p>
<p><span class="math">\[
\begin{align}
x&amp;=1\label{eq:1}\\
y&amp;=2
\end{align}
\]</span></p>
<p>End of math. <span class="math">\(\eqref{eq:1}\)</span></p>
</body>

Reply to this email directly or view it on GitHub:
#1938 (comment)

@nkalvi
Copy link

nkalvi commented Feb 13, 2015

Thanks for clarifying.

@juliangilbey
Copy link
Author

Ah, I see - that does work (putting stuff in $$...$$). It's a shame that the raw environments are not passed through, but it's not a show-stopper. It's also weird that \eqref's have to be placed in $...$ signs.

@nkalvi
Copy link

nkalvi commented Feb 13, 2015

$...$ is 'in line' with the specs :)
http://docs.mathjax.org/en/latest/options/tex2jax.html

inlineMath: [['(',')']]
Array of pairs of strings that are to be used as in-line math delimiters. The first in each pair is the initial delimiter and the second is the terminal delimiter. You can have as many pairs as you want. For example,

inlineMath: [ ['$','$'], ['(',')'] ]
would cause tex2jax to look for $...$ and (...) as delimiters for inline mathematics. (Note that the single dollar signs are not enabled by default because they are used too frequently in normal text, so if you want to use them for math delimiters, you must specify them explicitly.)

@juliangilbey
Copy link
Author

Yes, but \eqref is a text-mode LaTeX command: it's a reference to an equation number, not a piece of mathematics. :-)

@nkalvi
Copy link

nkalvi commented Feb 13, 2015

Aha, you're right. MathJax doesn't require $...$ for references outside inline maths. So rewriting it (since it will be stripped away in html output) produces the desired output:

End of math. <span>\\eqref{eq:1}</span>

Now I understand even better jgm's suggestion about passing raw LaTex blocks.

@juliangilbey
Copy link
Author

The problem is actually not solved by $$\begin{equation}...\end{equation}$$ and similar: while this is parsed correctly by mathjax when it is converted to html5, when the markdown is converted to latex, the resulting LaTeX is again $$\begin{equation}...\end{equation}$$, on which LaTeX barfs, as \begin{equation} starts by trying to enter math mode, so LaTeX throws up an error.

So it looks like the only solution is to allow pandoc to pass LaTeX blocks and \eqref{...} etc. to HTML raw when --mathjax is specified. (Perhaps with some other command line parameter to control this behaviour, eg -f markdown+raw_tex?)

@bpj
Copy link

bpj commented Feb 13, 2015 via email

@nkalvi
Copy link

nkalvi commented Feb 13, 2015

@juliangilbey Could you please give examples of input and output? When I tried some samples with http://johnmacfarlane.net/pandoc/try/ and http://www.tlhiv.org/ltxpreview/ it seems to work fine.

Markdown input:

$$
 \frac{1}{\displaystyle 1+
   \frac{1}{\displaystyle 2+
   \frac{1}{\displaystyle 3+x}}} +
 \frac{1}{1+\frac{1}{2+\frac{1}{3+x}}}
$$

Output from pandoc:

\[
 \frac{1}{\displaystyle 1+
   \frac{1}{\displaystyle 2+
   \frac{1}{\displaystyle 3+x}}} +
 \frac{1}{1+\frac{1}{2+\frac{1}{3+x}}}
\]

LaTex preview:
screen shot 2015-02-13 at 11 19 53 am

@bpj
Copy link

bpj commented Feb 13, 2015

The filter I suggested is here:

https://gist.github.com/baf84ac52dd47205e5cb

Requires perl and some (listed) CPAN modules.

@jgm
Copy link
Owner

jgm commented Feb 13, 2015

+++ Julian Gilbey [Feb 13 15 07:58 ]:

The problem is actually not solved by
$$\begin{equation}...\end{equation}$$ and similar: while this is parsed
correctly by mathjax when it is converted to html5, when the markdown
is converted to latex, the resulting LaTeX is again
$$\begin{equation}...\end{equation}$$, on which LaTeX barfs, as
\begin{equation} starts by trying to enter math mode, so LaTeX throws
up an error.

So it looks like the only solution is to allow pandoc to pass LaTeX
blocks and \eqref{...} etc. to HTML raw when --mathjax is specified.
(Perhaps with some other command line parameter to control this
behaviour, eg -f markdown+raw_tex?)

There is already an extension for raw tex in the markdown
reader (it's enabled by default). So all that would be
required would be passing through raw tex when output is
HTML and --mathjax is used.

This would be a very easy thing to add.

In the mean time, you could write a filter that finds
RawInline (Format "latex") and RawBlock (Format "latex") elements and converts them to raw HTML,
properly escaped. This too would be easy, and it
wouldn't require any changes in pandoc itself.

@bpj
Copy link

bpj commented Feb 13, 2015

Den 2015-02-13 17:09, BPJ skrev:

One could easily write a filter which recognises the .math class on
codeblocks and codespans and converts the code text to a RawBlock or
RawInline with the right format label and wraps it in the right <span class="math"> w/o <p> for HTML output.

I did a colossal blooper!

Since I don't do math myself I omitted the LaTeX math delimiters
in the first version of my filter! Corrected now:

https://gist.github.com/bpj/baf84ac52dd47205e5cb#file-pandoc-wrap-raw-pl

@jgm wrote:

In the mean time, you could write a filter that finds
RawInline (Format "latex") and RawBlock (Format "latex") elements and converts them to raw HTML,
properly escaped. This too would be easy, and it
wouldn't require any changes in pandoc itself.

I think my approach with tagged 'code' may have its use.
For one thing it lets you be selective about which LaTeX
Raw* elements you want to include in HTML.

/bpj

@jgm jgm reopened this Feb 14, 2015
@jgm jgm closed this as completed in 4f0c5c3 Feb 26, 2015
@benstevens48
Copy link

Hi,

I've had a look at the code for this fix and I don't think it's quite right. In order for MathJax to interpret the raw latex you are outputting to HTML, the latex needs to be inside math delimiters. So where you have written

blockToHtml opts (RawBlock f str)
  | f == Format "html" = return $ preEscapedString str
  | f == Format "latex" =
      case writerHTMLMathMethod opts of
           MathJax _  -> do modify (\st -> st{ stMath = True })
                            return $ toHtml str

I think it should say

...
return $ toHtml $ "\\[" ++ str ++ "\\]"

and correspondingly for the inline case. Ideally I think they should also be put inside the appropriate html span as when you write a math block.

Sorry if I have misinterpreted you code but I hope what I've said is correct.

Ben

@jgm
Copy link
Owner

jgm commented Mar 2, 2015

Thanks, this may be correct. This change was mostly intended for things like

\begin{equation}
e = mc^2
\end{equation}

which, in LaTeX, would NOT be placed inside math delimiters ($$..$$ or \[..\]), and for things like \ref{eqn:3}. If you use these in MathJax, do you write the following instead?

\[
\begin{equation}
e = mc^2
\end{equation}
\]

$\ref{eqn:3}$

+++ benstevens48 [Mar 02 15 08:45 ]:

Hi,

I've had a look at the code for this fix and I don't think it's quite right. In order for MathJax to interpret the raw latex you are outputting to HTML, the latex needs to be inside math delimiters. So where you have written

blockToHtml opts (RawBlock f str)
 | f == Format "html" = return $ preEscapedString str
 | f == Format "latex" =
     case writerHTMLMathMethod opts of
          MathJax _  -> do modify (\st -> st{ stMath = True })
                           return $ toHtml str

I think it should say

...
return $ toHtml $ "\\[" ++ str ++ "\\]"

and correspondingly for the inline case. Ideally I think they should also be put inside the appropriate html span as when you write a math block.

Sorry if I have misinterpreted you code but I hope what I've said is correct.

Ben


Reply to this email directly or view it on GitHub:
#1938 (comment)

@benstevens48
Copy link

Yes, I'm pretty sure that MathJax scans the page looking for math delimiters and the processes the stuff inside them, so if it's not inside math delimiters then it will just ignore it. The fact that stuff like the equation environment should not be inside math delimiters in Latex is why we were struggling to get both to work, and hence the use of the filter for the workaround!

So, yes, as you said, for mathjax to work, in the html, you write

    \[
    \begin{equation}
    e = mc^2
    \end{equation}
    \]

and

    $\ref{eqn:3}$

or, to be consistent with pandoc's delimiters for mathjax elsewhere,

    \(\ref{eqn:3}\)

I hope this is correct.

Ben

@jgm
Copy link
Owner

jgm commented Mar 2, 2015

Well, let's confirm that this is correct before continuing, since in LaTeX it wouldn't be correct to do things this way...

+++ benstevens48 [Mar 02 15 10:53 ]:

Yes, I'm pretty sure that MathJax scans the page looking for math
delimiters and the processes the stuff inside them, so if it's not
inside math delimiters then it will just ignore it. The fact that stuff
like the equation environment should not be inside math delimiters in
Latex is why we were struggling to get both to work, and hence the use
of the filter for the workaround!

So, yes, as you said, for mathjax to work, in the html, you write
[
\begin{equation}
e = mc^2
\end{equation}
]

and
$\ref{eqn:3}$

or, to be consistent with pandoc's delimiters for mathjax elsewhere,
(\ref{eqn:3})

I hope this is correct.

Ben


Reply to this email directly or [1]view it on GitHub.

References

  1. Support LaTeX environments in Markdown -> HTML conversion #1938 (comment)

@benstevens48
Copy link

Hi,
Sorry, but it seems that actually both methods work with MathJax. So you can mostly ignore everything I said! I do find this sentence in the MathJax getting started guide a bit misleading though: 'Mathematics that is written in TeX or LaTeX format is indicated using math delimiters that surround the mathematics, telling MathJax what part of your page represents mathematics and what is normal text.' There is a potential issue in that any Latex commands such as \newpage that MathJax doesn't recognise will just be left as plain text on the page, whereas if they are inside math delimiters then it is possible to define a \newcommand in the MathJax configuration to deal with this. So it might be better to put the delimiters in as it gives more flexibility, but I'm not sure what the MathJax official best practice is. Sorry for not fully checking this earlier!
Ben

@Thell
Copy link

Thell commented May 31, 2015

@juliangilbey This issue has come up before, multiple times. Early last year I ran in to it and got pretty much the same response regarding a filter and such. The result was similar to yours... 😞 So to scratch my itch a patch was submitted that didn't alter the behavior of any of the targets except latex (since the latex target is where the problem exists) without any side-effects.

If you don't mind patching yourself, I've been using it over a year now with great results.

Essentially all it does is strip the $$ or \[ tokens from latex math environments when the target is latex; so just surrounding your latex math environment with the mathjax tokens makes all the targets happy. 😄

[update]
The tex-math-consume-escapes branch has been rebased onto the latest pandoc master. If desired a pull request can be submitted.

@mseri
Copy link

mseri commented Sep 7, 2015

+1 for PR

@diazona
Copy link

diazona commented Sep 9, 2015

@Thell I'd also like to see this pulled into pandoc proper

@Thell
Copy link

Thell commented Sep 14, 2015

@mseri and @diazona we'll need to see what @jgm wants. There are currently quite a few outstanding issues and pull requests and the latest release does at least allow passage of raw blocks (which helps with html/latex targets) so I'm guessing it will be a while unless we can come up with a non edge-case usage example.

@rreece
Copy link

rreece commented May 9, 2018

With Pandoc 2.2, I'm still having this issue. Naked math latex environments do not make it to the html from pandoc. Note that in order to be processed by mathjax properly, the equation, align, ... environments would need to be wrapped in

<p><span class="math display">
...
</span></p>

Any further advice on how to produce proper html and latex from the same markdown? How should one markdown equations to support both outputs?

@jgm
Copy link
Owner

jgm commented May 9, 2018

@rreece - please give a specific example of a math environment that isn't properly (full instructions for how to reproduce the issue). And probably better to open a new issue, referring to this one, since this one is closed.

@rreece
Copy link

rreece commented May 9, 2018

Thanks for the reply @jgm! I've submitted a new issue: #4640.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests