-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug when there is "<" in the math formula #514
Conversation
The easiest way to trigger the bug is to add a markdown cell and put $$<k'>$$. When the notebook is converted to html, the "<" is left unescaped, which makes browsers parse this as a html tag
@@ -104,15 +104,20 @@ def header(self, text, level, raw=None): | |||
html = super(IPythonRenderer, self).header(text, level, raw=raw) | |||
return add_anchor(html) | |||
|
|||
def escape_lt(self,text): | |||
return text.replace('<','<') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also be translating >
? There's a function html.escape() which we might want to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any bug triggered by ">". But it is a good idea to escape all these special characters.
Could you add a test for this in |
I will add the test |
@@ -104,15 +104,20 @@ def header(self, text, level, raw=None): | |||
html = super(IPythonRenderer, self).header(text, level, raw=raw) | |||
return add_anchor(html) | |||
|
|||
def escape_lt(self,text): | |||
return text.replace('<','<') | |||
|
|||
# Pass math through unaltered - mathjax does the rendering in the browser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will no longer be true, we're deciding to follow a different standard than MathJax in this case, as MathJax does not always treat "<" and ">" as characters to be escaped (which leads to the "bugs" that this PR is trying to address).
It should work now. |
Thanks. This looks OK to me now, but I'll let @michaelpacer have another look over it. |
@@ -171,7 +190,8 @@ def test_markdown2html_math_paragraph(self): | |||
] | |||
|
|||
for case in cases: | |||
self.assertIn(case, markdown2html(case)) | |||
s = markdown2html(case) | |||
self.assertIn(case,self._unescape(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect markdown2html()
anywhere else to be returning unescaped values? Is there any way to systematically detect that expectation (vs. not)?
I ask because it seems a little off to be using a private function in our testing suite just to make tests pass in the way that they were before. It almost seems like we should change the test's expected output (in this case, the case
value) rather than to use this _unescape
so our tests don't need any special treatment in order to pass. This way we're more explicit about the expected behaviour.
That said, if there's any purpose to having the unescaping function, we should probably surface it in the regular code base and test it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit ugly, but it was a bit of an ugly test before - it checks that the Markdown rendering doesn't change these samples. So it only works with samples that don't use any Markdown syntax (other than the math syntax we add). I wouldn't particularly ask @zasdfgbnm to fix this. Obviously we welcome improvements to the test, but that's probably best as a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was the original test before. Currently this transformation does not leave these samples unchanged, which is why the unescaping needs to happen. I get that to make the test pass as it is it's easier to not have to concern yourself with the escaped bits.
Also, if the idea of this test is that the math processing leaves markdown unaffected, we should probably include some <
, >
and &
in the markdown section and not automatically unescape them (because the math processing is leaking out into the markdown processing).
Also, this test should include a
\begin{}
…
\end{}
(i.e., including the new lines around the declaration) since, not being wrapped in any number of $
, there is an absence of the easiest cues to delimiting math vs. markdown.
# all the "<", ">", "&" must be escaped correctly | ||
cases = [ "$a<b&b<lt$", | ||
"$a<b<b>a;a-b<0$", | ||
"$<k'>$"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these cases take into account displayed math (i.e., between $$…$$
). This problem probably applies equally there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These also don't take into account cases where there are no $…$
delimiters of any kind but where there are LaTeX commands included in their own paragraph see: #232 and the above test.
If you unify this test with the above test, this may address the problem.
Also, one of the common cases where &
is used in LaTeX is in the context of tables, it may be worthwhile to include a table example in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to adding a few more test cases.
def test_markdown2html_math(self): | ||
# Mathematical expressions should be passed through unaltered | ||
def test_markdown2html_math_noescape(self): | ||
# Mathematical expressions not containing <, >, & should be passed through unaltered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should unify this with the other test so that we can have a unified set of tests that check for a single set of expected behaviour that covers both those things that should not be escaped and those things that should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unifying them is probably ideal, but I'd merge a PR if that was the only thing outstanding.
I just realised that I had had those comments in a sort of "staged" sense for the past day or so… Apologies for appearing to have gone "dark" on this. And thank you @takluyver for saying that explicitly…because the implication that I hadn't had another look at it helped me realise that I had never "submitted" my review. I didn't make these "required changes" because I think that they're not absolutely necessary but would be better. In particular, if @takluyver thinks that we don't need to
then I'm cool with merging. |
@michaelpacer You are actually making good suggestions. What I was doing on the test was just trying not to make too much changes on existing code. If @takluyver thinks it is better to restructure several tests in a simple PR, I have no problem doing this job. |
@zasdfgbnm That instinct makes sense and in general I'm guessing is a good practice. I just think in this case it may merit a bigger change to the tests. That said, some of my comments can be addressed without changing the structure of the tests just by changing the set of cases in |
@takluyver @michaelpacer Sorry for the long wait. I was busy the past days. I merged the nonescape cases and escape cases for math formula. I also added test case for |
Actually I don't think |
Thanks @zasdfgbnm - I'm happy enough to live with the test being a bit ugly for now. |
The easiest way to trigger the bug is to add a markdown cell and put$$<k'>$$ .
When the notebook is converted to html, the "<" is left unescaped,
which makes browsers parse this as a html tag