-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extended unicode characters discarded from auto heading IDs #56
Comments
As commented on #57, I think that's enough as default implementation. If you need to more preferable( for you) heading ids, you can use the WithIDs option. And if you think your auto heading id generation logic may be useful for other people, you can create it as an extension and publish it on GitHub etc. Because of I'm Japanese(using CJK characters), I agree your thought. But I think that's enough as default implementation. I would like to keep default implementation as simple as possible. Again, goldmark is an extensible library. you are welcome to publish your auto heading id generation logic as an extension :) |
Hello @yuin , I've been reading the Goldmark code to see if I can develop an extension for more complete auto heading ID generation. Here are a few questions regarding this:
|
@jkboxomine , You are overthinking it. All you have got to do is implement Users who want to use your auto heading id generation logic will use your library like the following: ctx := parser.NewContext(parser.WithIDs(yourlib.NewYourAutoHeadingGenerator()))
markdown := goldmark.New(WithParserOptions(parser.WithAutoHeadingID()))
err := markdown.Convert(source, &b, parser.WithContext(ctx)) |
@jkboxomine Please let me know when you have published your library by PR that adds your library to the README :) |
The "minimal defaults" approach is legitimate, but can it at least not strip accentuated characters, but instead "slugify" them by removing accents? é > e, œ > oe, etc. For now we find oursleves with missing letters in words and the urls are gibberish, impacting readability as well as SEO. |
Goldmark 1.1.8 implementation only takes into account one-byte code point (ASCII) while generating auto heading IDs, simply discarding extended latin characters (2 bytes) and other international characters (3 bytes).
https://github.com/yuin/goldmark/blob/master/parser/parser.go#L83-L85
In multilingual sites, this causes imperfect heading IDs to be generated.
The text was updated successfully, but these errors were encountered: