Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug when there is "<" in the math formula #514

Merged
merged 7 commits into from
Feb 20, 2017
Merged

Fix bug when there is "<" in the math formula #514

merged 7 commits into from
Feb 20, 2017

Conversation

zasdfgbnm
Copy link
Contributor

The easiest way to trigger the bug is to add a markdown cell and put $$&lt;k'&gt;$$.
When the notebook is converted to html, the "<" is left unescaped,
which makes browsers parse this as a html tag

The easiest way to trigger the bug is to add a markdown cell and put $$<k'>$$.
When the notebook is converted to html, the "<" is left unescaped,
which makes browsers parse this as a html tag
@zasdfgbnm zasdfgbnm mentioned this pull request Jan 18, 2017
@@ -104,15 +104,20 @@ def header(self, text, level, raw=None):
html = super(IPythonRenderer, self).header(text, level, raw=raw)
return add_anchor(html)

def escape_lt(self,text):
return text.replace('<','&lt;')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also be translating >? There's a function html.escape() which we might want to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bug triggered by ">". But it is a good idea to escape all these special characters.

@takluyver
Copy link
Member

Could you add a test for this in nbconvert.filters.tests.test_markdown? Thanks!

@zasdfgbnm
Copy link
Contributor Author

I will add the test

@@ -104,15 +104,20 @@ def header(self, text, level, raw=None):
html = super(IPythonRenderer, self).header(text, level, raw=raw)
return add_anchor(html)

def escape_lt(self,text):
return text.replace('<','&lt;')

# Pass math through unaltered - mathjax does the rendering in the browser
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will no longer be true, we're deciding to follow a different standard than MathJax in this case, as MathJax does not always treat "<" and ">" as characters to be escaped (which leads to the "bugs" that this PR is trying to address).

@zasdfgbnm
Copy link
Contributor Author

It should work now.

@takluyver
Copy link
Member

Thanks. This looks OK to me now, but I'll let @michaelpacer have another look over it.

@@ -171,7 +190,8 @@ def test_markdown2html_math_paragraph(self):
]

for case in cases:
self.assertIn(case, markdown2html(case))
s = markdown2html(case)
self.assertIn(case,self._unescape(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect markdown2html() anywhere else to be returning unescaped values? Is there any way to systematically detect that expectation (vs. not)?

I ask because it seems a little off to be using a private function in our testing suite just to make tests pass in the way that they were before. It almost seems like we should change the test's expected output (in this case, the case value) rather than to use this _unescape so our tests don't need any special treatment in order to pass. This way we're more explicit about the expected behaviour.

That said, if there's any purpose to having the unescaping function, we should probably surface it in the regular code base and test it separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit ugly, but it was a bit of an ugly test before - it checks that the Markdown rendering doesn't change these samples. So it only works with samples that don't use any Markdown syntax (other than the math syntax we add). I wouldn't particularly ask @zasdfgbnm to fix this. Obviously we welcome improvements to the test, but that's probably best as a later PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was the original test before. Currently this transformation does not leave these samples unchanged, which is why the unescaping needs to happen. I get that to make the test pass as it is it's easier to not have to concern yourself with the escaped bits.

Also, if the idea of this test is that the math processing leaves markdown unaffected, we should probably include some <, > and & in the markdown section and not automatically unescape them (because the math processing is leaking out into the markdown processing).

Also, this test should include a

\begin{}
… 
\end{}

(i.e., including the new lines around the declaration) since, not being wrapped in any number of $, there is an absence of the easiest cues to delimiting math vs. markdown.

# all the "<", ">", "&" must be escaped correctly
cases = [ "$a<b&b<lt$",
"$a<b&lt;b>a;a-b<0$",
"$<k'>$"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these cases take into account displayed math (i.e., between $$…$$). This problem probably applies equally there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These also don't take into account cases where there are no $…$ delimiters of any kind but where there are LaTeX commands included in their own paragraph see: #232 and the above test.

If you unify this test with the above test, this may address the problem.

Also, one of the common cases where & is used in LaTeX is in the context of tables, it may be worthwhile to include a table example in the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to adding a few more test cases.

def test_markdown2html_math(self):
# Mathematical expressions should be passed through unaltered
def test_markdown2html_math_noescape(self):
# Mathematical expressions not containing <, >, & should be passed through unaltered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should unify this with the other test so that we can have a unified set of tests that check for a single set of expected behaviour that covers both those things that should not be escaped and those things that should be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unifying them is probably ideal, but I'd merge a PR if that was the only thing outstanding.

@mpacer
Copy link
Member

mpacer commented Jan 20, 2017

I just realised that I had had those comments in a sort of "staged" sense for the past day or so… Apologies for appearing to have gone "dark" on this. And thank you @takluyver for saying that explicitly…because the implication that I hadn't had another look at it helped me realise that I had never "submitted" my review.

I didn't make these "required changes" because I think that they're not absolutely necessary but would be better.

In particular, if @takluyver thinks that we don't need to

  1. unify the tests since their separation no longer seems to make sense
  2. not use a private function in our tests to make them pass in a way that would not pass if it were expected behaviour in the main library
  3. cover all the test cases that this the escaping is supposed to cover (e.g., within \begin{}… \end{} and $$…$$)

then I'm cool with merging.

@zasdfgbnm
Copy link
Contributor Author

@michaelpacer You are actually making good suggestions. What I was doing on the test was just trying not to make too much changes on existing code. If @takluyver thinks it is better to restructure several tests in a simple PR, I have no problem doing this job.

@mpacer
Copy link
Member

mpacer commented Jan 21, 2017

@zasdfgbnm That instinct makes sense and in general I'm guessing is a good practice. I just think in this case it may merit a bigger change to the tests.

That said, some of my comments can be addressed without changing the structure of the tests just by changing the set of cases in test_markdown2html_math_escape.

@zasdfgbnm
Copy link
Contributor Author

@takluyver @michaelpacer Sorry for the long wait. I was busy the past days. I merged the nonescape cases and escape cases for math formula. I also added test case for $$...$$ and table as @michaelpacer says. I didn't do anything to the _unescape, because I have no idea on exactly what to do to improve it.

@zasdfgbnm
Copy link
Contributor Author

zasdfgbnm commented Feb 6, 2017

Actually I don't think _unescape is something ugly only to make test passes as @michaelpacer said. On python 3 there is a unescape on official library that does the same thing, but it's not on python 2. That's all the reason I write my own _unescape. See https://wiki.python.org/moin/EscapingHtml for detail. What makes things ugly is not the design of test, but the fact of having two incompatible version of python.

@takluyver takluyver added this to the 5.2 milestone Feb 20, 2017
@takluyver
Copy link
Member

Thanks @zasdfgbnm - I'm happy enough to live with the test being a bit ugly for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants