Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Chinese Character in $slug #955

Closed
ash3T opened this issue Jan 24, 2023 · 4 comments
Closed

Support Chinese Character in $slug #955

ash3T opened this issue Jan 24, 2023 · 4 comments
Labels
question General questions about the project or usage

Comments

@ash3T
Copy link

ash3T commented Jan 24, 2023

Hi there,

I don't know much about php.

When I used a website code, with $slug, when I use Chinese Character as the file title, album title, these names become void or invalid. I go through all the codes, I found your codes were listed in the vendor folder.

After checking /core/vendor/league/commonmark/src/Normalizer/SlugNormalizer.php

I thought maybe your php code doesn't support Chinese Characters. Because when I use other function on the website, such as searching, it works.

Not sure if what I mentioned above make sense.

Thanks anyway.

@ash3T ash3T added the question General questions about the project or usage label Jan 24, 2023
@colinodell
Copy link
Member

Hi there!

Unfortunately I'm not very familiar with CJK scripts and best practices for "sluggifying" them. The default slug normalizer in this project relies on Unicode data to determine which characters should be kept or removed:

// Try removing characters other than letters, numbers, and marks.
$slug = \preg_replace('/[^\p{L}\p{Nd}\p{Nl}\p{M}-]+/u', '', $slug) ?? $slug;

Specifically, we only keep characters that fall into one of the following character classes:

  • \p{L} (letters)
  • \p{Nd} (decimal numbers)
  • \p{Nl} (letter numbers)
  • \p{M} (marks)
  • - (the literal - character)

It would seem that the CJK characters you're using don't fall into any of those categories :-/

I'd be open to changing this to include CJK characters somehow, so long as:

  • The regular expression doesn't become too complex for maintainers like me (who lack the familiarity with CJK in Unicode)
  • It follows best practices for users of those languages and produces similar results as other sluggifiers
  • At least two people can help verify that the updated implementation looks correct

In the meantime, you can always replace the built-in slug normalizer with your own :)

@ash3T
Copy link
Author

ash3T commented Jan 25, 2023 via email

@ptmkenny
Copy link

ptmkenny commented Mar 4, 2023

I have a Japanese website where I use Commonmark and I had a workaround for this for Commonmark version 1. However, after updating to version 2 and using PHP 8.2, I find that Japanese characters are evaluated properly and I don't need the workaround anymore.

For example,

[お名前.com](https://www.お名前.jp)

is now working with no special config.

So, my guess is that PHP 8's handling of CJK improved at some point.

@colinodell
Copy link
Member

It looks like https://github.com/cocur/slugify has support for Chinese characters (Pinyin) so I'd recommend using that.

@thephpleague thephpleague locked and limited conversation to collaborators May 11, 2023
@colinodell colinodell converted this issue into discussion #979 May 11, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
question General questions about the project or usage
Projects
None yet
Development

No branches or pull requests

3 participants