-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math support #6
Comments
It should be relatively straightforward, but nothing is ever simple. Take Not on the critical path for rustdoc, but worth doing. |
Is this in commonmark? |
@vyp No, however it is one of the more commonly requested extensions. See http://talk.commonmark.org/t/mathematics-extension/457 for more discussion. |
Oh sorry, I just saw #4. Intention was that commonmark may have guidelines on cases like |
@raphlinus Ah yes, I'm familiar with that thread. Second commenter there. |
I'm very open to doing this. By far the most helpful thing from the community would be a precise spec. One got suggested in a reddit thread. But are these the right rules? Would the rules for |
I've also made a table to summarise some of the syntaxes. I've only focused on the platforms that are likely to be used by mathsy people, since mathsy people are the ones likely to use this feature anyway. Feel free to add a comment there if there's anything relevant I've missed!
Not by itself, but there might be more than one dollar sign in a sentence (say if you're making a list of prices) |
That wouldn't be a problem I think. A space before an ending
|
There could be spaces in the middle of it, I assume. So this can happen.
|
@notriddle yes that's what I had in mind. Also:
However these are pretty contorted cases. In this thread on the CommonMark forums John MacFarlane seems to agree that the dollar syntax works just fine:
|
If you replace those |
What @jeanm said :) |
For me, as mentioned in pikelet-lang/pikelet#109, what I'd really want is the ability to add HTML classes to code (both inline and code). Something like: Inline math: {katex}`\Gamma \vdash x : \tau`
```{katex}
\Gamma \vdash x : \tau
``` With that I could then run KaTeX on all the things with that ID. That way at least Github wouldn't completely bork my maths in the previews. |
In heradoc I have implemented the following three ways of creating latex-math, such that it still renders well in plain CommonMark renderers without math support: Inline-math: |
Summarizing and continuing discussion from #100: Whatever math syntax is chosen should piggyback off of The rest of this post concerns mainly KaTeX since I have much more experience using KaTeX than MathJax. Probably, the most powerful solution would use an ahead-of-time translation (to MathML + JS? KaTeX pre-render? RRIR?) rather than relying on display-time JS translation, but a minimal solution should rely on the preexisting translation tools. Interestingly, for block syntax, no extra ahead-of-time processing needs to be done for display-time integration with KaTeX. The markdown ```math
1 + 2
``` is translated to the html <pre><code class="language-math">1 + 2
</code></pre> and can be translated with KaTeX via JS like for (let math of document.getElementsByClassName("language-math")) {
let span = document.createElement('span');
katex.render(math.innerText, span, {displayMode: true});
math.parentNode // <pre>
.parentNode // context
.replaceChild(span, // new
math.parentNode); // old
} into the following DOM: DOM<span>
<span class="katex-display">
<span class="katex">
<span class="katex-mathml">
<math>
<semantics>
<mrow>
<mn>1</mn>
<mo>+</mo>
<mn>2</mn>
</mrow>
<annotation encoding="application/x-tex">1 + 2</annotation>
</semantics>
</math>
</span>
<span class="katex-html" aria-hidden="true">
<span class="strut" style="height: 0.64444em;"></span>
<span class="strut bottom" style="height: 0.72777em; vertical-align: -0.08333em;"></span>
<span class="base">
<span class="mord">1</span>
<span class="mord rule" style="margin-right: 0.222222em;"></span>
<span class="mbin">+</span>
<span class="mord rule" style="margin-right: 0.222222em;"></span>
<span class="mord">2</span>
</span>
</span>
</span>
</span>
</span> which is the correct way for displaying KaTeX-rendered display-style text, and the exact same output generated with KaTeX's You can make the argument that co-opting the Similarly, My next comment will discuss more "first-class" support and what that would mean. |
CommonMark Wiki on math extensions: https://github.com/commonmark/commonmark-spec/wiki/Deployed-Extensions#math If you don't care about fallback behavior and are willing to change the parse tree in degenerate cases,
It's for this reason plus fallback that I really think any solution should actually lean on "code literal" syntax. This gives us growable fences for free. So then the "reasonable" choices per that are bracket-style What do I actually recommend? For now, I think Javascript to do both dollar-style and bracket-style(() => {
const todo = []; // don't mutate document while iterating
function processMath() {
if (arguments.length == 3) {
const [prev, code, displayMode] = arguments;
prev.splitText(prev.textContent.length - 1).remove();
code.childNodes[0].splitText(code.textContent.length - 2).remove();
const span = document.createElement('span');
katex.render(code.textContent, span, {displayMode: displayMode, throwOnError: false});
code.parentNode.replaceChild(span, code);
} else if (arguments.length == 4) {
const [prev, code, next, displayMode] = arguments;
prev.splitText(prev.textContent.length - 1 - displayMode).remove();
next.splitText(1 + displayMode); next.remove();
const span = document.createElement('span');
katex.render(code.textContent, span, {displayMode: displayMode, throwOnError: false});
code.parentNode.replaceChild(span, code);
} else {
throw Error(`Wrong number of arguments to ${processMath}`);
}
}
for (const code of document.getElementsByTagName('code')) {
const prev = code.previousSibling;
const next = code.nextSibling;
if (prev && prev.nodeType === Node.TEXT_NODE) {
// dollar style
if (next && next.nodeType === Node.TEXT_NODE) {
if (/\$\$$/.test(prev.textContent) && /^\$\$/.test(next.textContent)) {
todo.push(() => processMath(prev, code, next, true));
continue;
}
if (/\$$/.test(prev.textContent) && /^\$/.test(next.textContent)) {
todo.push(() => processMath(prev, code, next, false));
continue;
}
}
// bracket style (start outside, end inside)
if (/\[$/.test(prev.textContent) && /\\\]$/.test(code.textContent)) {
todo.push(() => processMath(prev, code, true));
continue;
}
if (/\($/.test(prev.textContent) && /\\\)$/.test(code.textContent)) {
todo.push(() => processMath(prev, code, false));
continue;
}
}
}
for (const f of todo) f();
})() |
See also https://github.com/cben/mathdown/wiki/math-in-markdown with math syntaxes from a lot of markdown implementations. While dollars and other TeX-like syntaxes seem most common, I second the recommendation for literal-based syntax, for example like GitLab, especially if you're considering embedding other non-markdown syntaxes (mermaid etc). |
On a more philosophical level, I guess the fear I have with something like:
..is how do you distinguish between talking about |
There is a discussion on making that distinction: |
Thanks to @CAD97, @cben and @brendanzab for the excellent considerations and resources! It looks to me that there seems to be some consensus on using code fences for display math, although it's not yet clear what language specifier would be best. This seems like a good idea: it degrades gracefully on renderers without math support. The alternatives like The inline case is more diverse and more difficult. Dollar affixes seem to be most widespread and will be very familiar to LaTeX users. There are some concerns about unintended math spans from the natural use of dollar signs, but from the discussion earlier (January of 2017) in this thread I understand that with the right heuristics this can be managed in a way like it is for emphasis. Personally, I am partial to code span based syntax, like Since there is no clear consensus on math syntax, we could even opt to not introduce any additional syntax or semantics. This is already possible for displays by letting users decide the language that makes sense for them, bypassing the issue flagged by @brendanzab. We could do something similar for inline math by exposing the length of the delimiters for code spans. The decision to interpret double backticked code spans as math or not is then left to the user. This would eliminate the need for a math option entirely. |
I have been meaning to find the time for writing up a proposal for, With that in place we could get a graceful mechanism for dealing with inline code span language identifiers via Also @brendanzab I seem to recall (though really should check!) that the "info-string" of code fences can contain information besides the language of some form after the space, up-to newline. I hadn't seen this in pulldown_cmark though. Such that one could imagine something similar to: ```latex render |
Per the CommonMark dingus at least, the Of course, any solution that uses the block info-string doesn't work for inline, so the general extension mechanism (whether it be |
Given that |
I currently preprocess the math block using katex (yes, there is a rust binding), translating them into HTML and then passed the processed text to pulldown-cmark. This is an ugly approach I guess; but it works.
I also tried delayed process with mathjax but it may mix things up (underlines for example) and end up with a terrible mess.
As for the difficulties, I think, inserting such blocks will makes things more context-dependent and slow down the SIMD routine used by the parser; but I did not get a chance to go into the details.
Anyway, really hoping to have the math support. I think it is a significant drawback for this crate not to have it.
Schrodinger ZHU Yifan
School of Data Science, CUHK(SZ)
Website: https://zhuyi.fan
Github: SchrodingerZhu
Twitter: ZhuSchrodinger
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
…On Monday, April 5th, 2021 at AM 2:49, Tim ***@***.***> wrote:
Given that `$` and `$$` are the de facto standard and Pandoc hasn't experienced any issues with that syntax and there's an implementation of it is there a compelling reason to do anything else?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hey, I was going through the discussion, and I might have a suggestion which might not work, or might solve a couple of problem:
So for inline math, we can do
which then can be processed by math processing section to produce appropriate latex stuff (:sweat_smile:) The escaping of this can also be of simple format , writing a
I feel this along with @CAD97 's codeblock math can solve the math problem, and also allow others to extend the syntax as required, and solve additional problems such as adding support of custom id extension or other things as people need in their own crates. As said in start this might not work, so would really like to hear opinions on this. Thanks :) |
Sorry to bother , but @marcusklaas , is this still active? 😅😅 If so, can you please take a look at my comment above? Thanks :) |
@YJDoc2: as an extensible extension syntax, your suggested syntax is not bad. But I think the primary concern is that it is yet another different and completely non-standard way of doing things which is incompatible with all other implementations. I think the primary question is not how a completely new syntax would look like but rather why the widely supported |
The big one, even if you solve the "two Instead, most markdown renderer which "supports" math syntax does it the naive (and problematic) way, which you can already do with pulldown-cmark: just render the markdown, then run KaTeX or MathJax on the output. This works okayish for very simple stuff, but you will quickly run into clashes where the math is interpreted as markdown. For minimal support, a markdown engine has to treat But it gets wrinkly, quickly. You also have to understand (a subset of) your embedded language in order to understand when it ends. For MathJax style math, this means at least allowing To be honest, I'm beginning to think the only somewhat reasonable approach (for |
I agree with @CAD97 almost entirely about pre-parsing, The reason that I like the consistent-attribute-syntax extension + code fences, is that it gives something for a pre-parser of I need a larger subset of latex than either KaTeX/MathJax support (i.e. I convert markdown to TeX, and the markdown contains code fences with TeX code to be rendered). The same problem occurs if say you want to embed a graph in markdown and pass it to a graph-drawing algorithm for rendering. So my preference is to punt on |
@ratmice : if passing the data through (mostly) unmodified is the only goal (and not human readability, since it's an internal step after preparsing), then the code block technique is sufficient. You can use however many A general extension point for markdown would be great to have, but I don't think pulldown-cmark should be defining one. (That would be the domain of CommonMark, the actual specification.) |
@CAD97 Yes, and I do agree. The specific extension I was referring to is https://talk.commonmark.org/t/consistent-attribute-syntax/272 It was mentioned it in my comment a long time ago, I probably should have linked to it again so people didn't have to search through the plethora of comments, apologies. I haven't quite followed what is going on with this extension or if there exists any other sufficient proposals though. |
I wish I kept some record in https://github.com/cben/mathdown/wiki/math-in-markdown which tools take such buggy shortcut. I've followed bug trackers in many tools, and the general trajectory seems to be:
Running the math render before markdown is also buggy! You'll render math inside indented literal blocks, fenced literal blocks, HTML islands (including literals like => You really do need the markdown parser to understand the And yes nesting: |
Well, I count pandoc and MathOverflow as "widely" in this instance. It is the closest thing we have to a standard on the matter. I certainly have a lot of documents which use the syntax, and I'm interested in being able to parse them with Rust.
I realize it's tricky, but pandoc does it, so we know it's not impossible! And it works well too.
But if parsing the |
just implement one, better than nothing. |
I suggest using the math delimiter from this talk:
Using them has the following benefits:
|
These benefits are good points. @Netsaver, I'm curious, is there any markdown software that specifically recognizes these delimeters? Googling, I also found you described the same syndax in pbek/QOwnNotes#529 but it's unclear to me whether there is a markdown processor actually requiring It's certainly possible to use that as just a convention, which begs the question should a tool care — why not recognize just dollars and let users insert braces if they wish — but that loses on benefits 1 and 2... |
GFM now supports math notation using $$:
(it is there, check the edit history... but it's not actually rendering on my end 🙃)
What's somewhat interesting is that GFM doesn't allow escaping |
Yeah I assume we've all seen this critique of Github's implementation. It makes me all the more convinced that @Eumeryx's suggestion is a great idea. @cben that article should answer your question "why not recognize just dollars and let users insert braces if they wish"! |
The critique of GitHub's implementation now admits that "Most of this has been fixed now". The issues raised there were not at all fundamental. My two-penneth: the highest priority for In my experience as a cryptographic engineer, there is a very significant usability gap between the All of the syntactic ambiguities with |
There’s some good news here: as of Chrome if you’re not familiar, it’s a XML style format for displaying equations that can just be in an HTML block and rendered properly by the browser. So, this sidesteps the KaTeX vs. MathJax issue and instead means we just need a TeX->MathML parser, of which there are many (don’t know about any written in rust though) Test page for MathML if you want to check your browser: https://www.w3.org/Math/testsuite/mml2-testsuite/index.html (the files in TortureTests/Complexity are the best overview tests) |
For the task of tex->mathml, this crate is available: https://crates.io/crates/latex2mathml. License is MIT There haven't been any updates since 2020, but it looks like it's a "done" crate. If a dependency is an issue, the entire crate could likely be pulled into this repository. |
This is a complex but really useful feature. I will review the open pull requests and the different implementations in other parsers like mdBook relatively soon, but the final implementation will take some time. |
wow, this thread has been open for 8 years! do we now have a way to just skip $ and double $$ pairs and let browser parse the math on client side ? |
I opened a proposal to the commonmark spec to add support for display blocks, which would theoretically allow math as well as mermaid/graphviz: commonmark/commonmark-spec#745 |
Closed by #734. |
This simply means enabling the use of
$
and$$
delimiters (inline and block-level respectively, as in Pandoc) to very conveniently leverage something like MathJax. Hoedown also supports this.The text was updated successfully, but these errors were encountered: