Fix bug when there is "<" in the math formula #514

zasdfgbnm · 2017-01-18T02:34:22Z

The easiest way to trigger the bug is to add a markdown cell and put $$<k'>$$.
When the notebook is converted to html, the "<" is left unescaped,
which makes browsers parse this as a html tag

The easiest way to trigger the bug is to add a markdown cell and put $$<k'>$$. When the notebook is converted to html, the "<" is left unescaped, which makes browsers parse this as a html tag

takluyver · 2017-01-18T10:45:19Z

nbconvert/filters/markdown_mistune.py

@@ -104,15 +104,20 @@ def header(self, text, level, raw=None):
        html = super(IPythonRenderer, self).header(text, level, raw=raw)
        return add_anchor(html)

+    def escape_lt(self,text):
+        return text.replace('<','&lt;')


Shouldn't we also be translating >? There's a function html.escape() which we might want to use.

I didn't find any bug triggered by ">". But it is a good idea to escape all these special characters.

takluyver · 2017-01-18T10:46:10Z

Could you add a test for this in nbconvert.filters.tests.test_markdown? Thanks!

zasdfgbnm · 2017-01-18T16:13:11Z

I will add the test

mpacer · 2017-01-18T18:16:28Z

nbconvert/filters/markdown_mistune.py

@@ -104,15 +104,20 @@ def header(self, text, level, raw=None):
        html = super(IPythonRenderer, self).header(text, level, raw=raw)
        return add_anchor(html)

+    def escape_lt(self,text):
+        return text.replace('<','&lt;')
+
    # Pass math through unaltered - mathjax does the rendering in the browser


This will no longer be true, we're deciding to follow a different standard than MathJax in this case, as MathJax does not always treat "<" and ">" as characters to be escaped (which leads to the "bugs" that this PR is trying to address).

Add tests for escape

zasdfgbnm · 2017-01-19T00:44:20Z

It should work now.

takluyver · 2017-01-19T10:23:32Z

Thanks. This looks OK to me now, but I'll let @michaelpacer have another look over it.

mpacer · 2017-01-19T02:00:35Z

nbconvert/filters/tests/test_markdown.py

@@ -171,7 +190,8 @@ def test_markdown2html_math_paragraph(self):
        ]

        for case in cases:
-            self.assertIn(case, markdown2html(case))
+            s = markdown2html(case)
+            self.assertIn(case,self._unescape(s))


Do we expect markdown2html() anywhere else to be returning unescaped values? Is there any way to systematically detect that expectation (vs. not)?

I ask because it seems a little off to be using a private function in our testing suite just to make tests pass in the way that they were before. It almost seems like we should change the test's expected output (in this case, the case value) rather than to use this _unescape so our tests don't need any special treatment in order to pass. This way we're more explicit about the expected behaviour.

That said, if there's any purpose to having the unescaping function, we should probably surface it in the regular code base and test it separately.

It's a bit ugly, but it was a bit of an ugly test before - it checks that the Markdown rendering doesn't change these samples. So it only works with samples that don't use any Markdown syntax (other than the math syntax we add). I wouldn't particularly ask @zasdfgbnm to fix this. Obviously we welcome improvements to the test, but that's probably best as a later PR.

Yes that was the original test before. Currently this transformation does not leave these samples unchanged, which is why the unescaping needs to happen. I get that to make the test pass as it is it's easier to not have to concern yourself with the escaped bits.

Also, if the idea of this test is that the math processing leaves markdown unaffected, we should probably include some <, > and & in the markdown section and not automatically unescape them (because the math processing is leaking out into the markdown processing).

Also, this test should include a

\begin{} … \end{}

(i.e., including the new lines around the declaration) since, not being wrapped in any number of $, there is an absence of the easiest cues to delimiting math vs. markdown.

mpacer · 2017-01-19T02:01:21Z

nbconvert/filters/tests/test_markdown.py

+        # all the "<", ">", "&" must be escaped correctly
+        cases = [ "$a<b&b<lt$",
+                  "$a<b&lt;b>a;a-b<0$",
+                  "$<k'>$"]


None of these cases take into account displayed math (i.e., between $$…$$). This problem probably applies equally there.

These also don't take into account cases where there are no $…$ delimiters of any kind but where there are LaTeX commands included in their own paragraph see: #232 and the above test.

If you unify this test with the above test, this may address the problem.

Also, one of the common cases where & is used in LaTeX is in the context of tables, it may be worthwhile to include a table example in the test.

+1 to adding a few more test cases.

mpacer · 2017-01-19T02:02:17Z

nbconvert/filters/tests/test_markdown.py

-    def test_markdown2html_math(self):
-        # Mathematical expressions should be passed through unaltered
+    def test_markdown2html_math_noescape(self):
+        # Mathematical expressions not containing <, >, & should be passed through unaltered


It seems like we should unify this with the other test so that we can have a unified set of tests that check for a single set of expected behaviour that covers both those things that should not be escaped and those things that should be.

Unifying them is probably ideal, but I'd merge a PR if that was the only thing outstanding.

mpacer · 2017-01-20T01:38:52Z

I just realised that I had had those comments in a sort of "staged" sense for the past day or so… Apologies for appearing to have gone "dark" on this. And thank you @takluyver for saying that explicitly…because the implication that I hadn't had another look at it helped me realise that I had never "submitted" my review.

I didn't make these "required changes" because I think that they're not absolutely necessary but would be better.

In particular, if @takluyver thinks that we don't need to

unify the tests since their separation no longer seems to make sense
not use a private function in our tests to make them pass in a way that would not pass if it were expected behaviour in the main library
cover all the test cases that this the escaping is supposed to cover (e.g., within \begin{}… \end{} and $$…$$)

then I'm cool with merging.

zasdfgbnm · 2017-01-20T21:00:08Z

@michaelpacer You are actually making good suggestions. What I was doing on the test was just trying not to make too much changes on existing code. If @takluyver thinks it is better to restructure several tests in a simple PR, I have no problem doing this job.

mpacer · 2017-01-21T00:04:58Z

@zasdfgbnm That instinct makes sense and in general I'm guessing is a good practice. I just think in this case it may merit a bigger change to the tests.

That said, some of my comments can be addressed without changing the structure of the tests just by changing the set of cases in test_markdown2html_math_escape.

zasdfgbnm · 2017-02-06T05:41:11Z

@takluyver @michaelpacer Sorry for the long wait. I was busy the past days. I merged the nonescape cases and escape cases for math formula. I also added test case for $$...$$ and table as @michaelpacer says. I didn't do anything to the _unescape, because I have no idea on exactly what to do to improve it.

zasdfgbnm · 2017-02-06T05:43:32Z

Actually I don't think _unescape is something ugly only to make test passes as @michaelpacer said. On python 3 there is a unescape on official library that does the same thing, but it's not on python 2. That's all the reason I write my own _unescape. See https://wiki.python.org/moin/EscapingHtml for detail. What makes things ugly is not the design of test, but the fact of having two incompatible version of python.

takluyver · 2017-02-20T17:20:53Z

Thanks @zasdfgbnm - I'm happy enough to live with the test being a bit ugly for now.

Fix bug when there is "<" in the math formula

656f67f

The easiest way to trigger the bug is to add a markdown cell and put $$<k'>$$. When the notebook is converted to html, the "<" is left unescaped, which makes browsers parse this as a html tag

zasdfgbnm mentioned this pull request Jan 18, 2017

issues with latex < #504

Open

takluyver reviewed Jan 18, 2017

View reviewed changes

mpacer reviewed Jan 18, 2017

View reviewed changes

zasdfgbnm added 3 commits January 18, 2017 15:46

Use html.escape to escape <, > and &

abd54ca

Add tests for escape

change escape and unescape for compatibility

8864bd6

fix assertNotRegex for compatibility of python 2.7

6b9b481

mpacer reviewed Jan 20, 2017

View reviewed changes

zasdfgbnm added 3 commits February 5, 2017 23:45

add more test cases

85e6ee2

merge escape & nonescape cases

9c16783

handle eqn with different beginnings, multiline

8b80daf

takluyver added this to the 5.2 milestone Feb 20, 2017

takluyver merged commit ec51671 into jupyter:master Feb 20, 2017

zasdfgbnm deleted the patch-1 branch February 20, 2017 17:37

parente mentioned this pull request Jul 10, 2018

nbviewer fail to render latex < in some cases jupyter/nbviewer#647

Closed

eric-wieser mentioned this pull request May 28, 2020

Escape HTML characters in latex output strings #1278

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug when there is "<" in the math formula #514

Fix bug when there is "<" in the math formula #514

zasdfgbnm commented Jan 18, 2017

takluyver Jan 18, 2017

zasdfgbnm Jan 18, 2017

takluyver commented Jan 18, 2017

zasdfgbnm commented Jan 18, 2017

mpacer Jan 18, 2017

zasdfgbnm commented Jan 19, 2017

takluyver commented Jan 19, 2017

mpacer Jan 19, 2017

takluyver Jan 24, 2017

mpacer Jan 24, 2017

mpacer Jan 19, 2017

mpacer Jan 19, 2017

takluyver Jan 24, 2017

mpacer Jan 19, 2017

takluyver Jan 24, 2017

mpacer commented Jan 20, 2017 •

edited

Loading

zasdfgbnm commented Jan 20, 2017

mpacer commented Jan 21, 2017

zasdfgbnm commented Feb 6, 2017

zasdfgbnm commented Feb 6, 2017 •

edited

Loading

takluyver commented Feb 20, 2017

Fix bug when there is "<" in the math formula #514

Fix bug when there is "<" in the math formula #514

Conversation

zasdfgbnm commented Jan 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takluyver commented Jan 18, 2017

zasdfgbnm commented Jan 18, 2017

Choose a reason for hiding this comment

zasdfgbnm commented Jan 19, 2017

takluyver commented Jan 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpacer commented Jan 20, 2017 • edited Loading

zasdfgbnm commented Jan 20, 2017

mpacer commented Jan 21, 2017

zasdfgbnm commented Feb 6, 2017

zasdfgbnm commented Feb 6, 2017 • edited Loading

takluyver commented Feb 20, 2017

mpacer commented Jan 20, 2017 •

edited

Loading

zasdfgbnm commented Feb 6, 2017 •

edited

Loading