Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text segmentation for extended grapheme clusters - part 1 #2

Merged
merged 1 commit into from
Oct 22, 2023

Conversation

lukewilliamboswell
Copy link
Collaborator

This PR

  • Set up the infrastructure to generate the internal modules for text segmentation using Unicode Character Database files
  • Includes a script to run code gen and test generated files from root
  • Includes most of the parser logic for parsing the code point and GBP from GraphemeBreakProperty-15.1.0.txt data file

_ -> trimmed

expect removeTrailingSlash "abc " == "abc"
expect removeTrailingSlash " abc/package/ " == "abc/package"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love quick-and-easy tests like this! 🤗

'D' -> 13
'E' -> 14
'F' -> 15
_ -> 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can totally be in a package someday...or maybe a builtin? 🤔

Copy link
Contributor

@rtfeldman rtfeldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Super exciting to see that we already have Unicode data files as the source of truth, and that we're parsing them in Roc! 😻 😻 😻

@rtfeldman rtfeldman merged commit 6015f81 into roc-lang:main Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants