Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsedown fails with long text #443

Open
h2652797 opened this issue Oct 14, 2016 · 11 comments
Open

Parsedown fails with long text #443

h2652797 opened this issue Oct 14, 2016 · 11 comments

Comments

@h2652797
Copy link

h2652797 commented Oct 14, 2016

If you visit the Parsedown Demo and paste in this example text, the server fails.

The text is around 23,000 characters, and includes a single asterisk on the first line. This occurs with any text of a similar length or greater. I'm assuming Parsedown reads the opening asterisk and fails when it needs to search for the closing asterisk in such a long string.

I use Parsedown for user comments on my site. Comments are a standard MySQL text field which holds values greater than the above in length, and users occasionally include symbols like an asterisk in their comment, without a closing one. When this occurs, the comments page fails to load.

@brolaugh
Copy link

I did a test on my local machine with Parsedown-extra version 0.7.1. This was my output.

example*
example
example
example
example
example
...

@neil-yoga-crypto
Copy link

It's probably due to the fact that the Apache 2.4.7 server configuration of the demo blocks POST request that are longer than [x]mb.

@h2652797
Copy link
Author

It's probably due to the fact that the Apache 2.4.7 server configuration of the demo blocks POST request that are longer than [x]mb.

I don't think that's the case, because larger text blocks are accepted without an issue. For example, here's the original text with 4 more asterisks added throughout. It now works fine if you paste it into the demo.

There cannot be more than N characters between an opening and closing asterisk or the script fails. It may be an issue with Apache or server configuration (the demo fails the same way as my server), or Parsedown might need to be modified to handle these type of situations.

@PhrozenByte
Copy link
Contributor

The maximum input length depends on your PHP configuration, see https://secure.php.net/manual/de/pcre.configuration.php

cebe added a commit to cebe/parsedown that referenced this issue Oct 20, 2016
issue erusev#443

seem to fail on some PHP version, but not on PHP 7 for me locally.
Sending this PR to see what travis says.
@cebe
Copy link
Contributor

cebe commented Oct 20, 2016

I have added a test in #444 and it looks like this file causes PHP to segfault in versions < 7.0 : https://travis-ci.org/erusev/parsedown/builds/169156782

@cebe
Copy link
Contributor

cebe commented Oct 20, 2016

See also https://bugs.php.net/bug.php?id=45735

@cebe
Copy link
Contributor

cebe commented Oct 20, 2016

it is likely to be caused by the regex for emph and strong matching resulting in something simlar as described here: http://www.regular-expressions.info/catastrophic.html

cebe added a commit to cebe/parsedown that referenced this issue Oct 31, 2016
fixes erusev#443 by checking whether the required markers are avaialble in the
text before applying the regex on it.
@cebe
Copy link
Contributor

cebe commented Oct 31, 2016

I have submitted a fix in #444

@h2652797
Copy link
Author

h2652797 commented Nov 2, 2016

Thanks cebe, that would fix the initial example. I'm assuming it would still crash if we added a marker to the end, such as here, but at least it resolves certain cases.

@erusev erusev added the priority label Nov 2, 2016
@cebe
Copy link
Contributor

cebe commented Nov 3, 2016

@h2652797 thanks for noting this. Need to figure out why that happens, maybe the correct fix would need to adjust the regex.

@gene-sis
Copy link
Contributor

Here an approach regex which works for both cases:
without end marker
with end marker
It should ignore escaped backslash, escaped asterix and other escaped characters.
It uses \x5c as backslash to avoid additional backslash escaping in php.
The regex is around line 1527

    protected $EmRegex = array(
        '*' => '/^[*]((?:[^*\x5c]++|[*]{2}[^*]++[*]{2}|\x5c{2}|\x5c\*|\x5c)++)[*](?![*])/',

It passes the actual tests.
Similar regexes should be possible for strong and underscore emphasis.

I suppose, that there are still some commonmark rules with strong and emphasis, which parsedown breaks. Is it planned to follow that rules?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants