-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting non-ASCII characters as markers #435
Comments
I'm new to Parsedown, so this may well be off the mark, but is the HTML interface you're using encoded as UTF-8, some other encoding, or none at all? I've found in the past that if the HTML interface isn't set to UTF-8, the text sent to PHP is also not encoded to that character set when sent via either GET or POST. Something to look into, I think, no? |
Sorry, I meant to follow up on this as I ended up with a nice fix. Support for UTF8 for several PHP functions is lacking, which would imply, as I understand it, the need to re-write core Parsedown functions to support UTF8. I got a reply from Emanuil suggesting to work around the limitation by aliasing the UTF8 characters with ASCII ones. i.e.: perform a replacement of the UTF8 tokens with ASCII ones, then simply use those ASCII characters as tokens inside of the Parsedown definitions. I needed to support curly quotes as tokens. It looks like this:
Note that I opted to use a non-visible ASCII character (in my case character 31) to save me the hassle of figuring out what would happen if anyone used them -- no one will. (A list of non-visible (control) characters is available here https://en.wikipedia.org/wiki/ASCII#Control_characters though inserting them may prove a bit more tricky.) I hope this helps. |
Mmay be fixed by #513. |
Hi there,
Thank you for the library, as I was able to add support for custom markdown in a matter of minutes.
However, when trying to use curly quotes
“”
as markers, I get the following error:If I use ASCII characters for the marker (e.g.: the
@
sign), it works just fine. Here is the code with both variants:In other words, it seems non-ASCII characters are not supported as markers.
Is it a PHP issue?
My understanding is that preg_* function do support UTF8 and that it's fine to use UTF8 as array keys Indeed, this produces the expected result:
Inside of Parsedown's
line
method, before the foreach,var_dump( $this->InlineTypes['“']
works fine (the double quote is displayed properly in the dump), butvar_dump( $marker );
produces mojibake, instead of the expected curly quote character.So correct me if I'm wrong but it seems to me this is an issue with the library itself.
Inside the library
What can we do to support UTF8?
I tried to look at the code.
Before the foreach (L. 1004), I could use
$marker = mb_substr( $excerpt, 0, 1, 'UTF8');
instead of$marker = $excerpt[0];
. And$markerPosition = mb_strpos ($text, $marker);
instead of$markerPosition = strpos($text, $marker);
.But that's not enough and the
Undefined index
on theforeach
persists.It's not clear to me whether
strpbrk()
supports UTF8. I'm stuck there.Can you please advise?
The text was updated successfully, but these errors were encountered: