Add text segmentation for extended grapheme clusters - part 1 #2

lukewilliamboswell · 2023-10-22T03:15:40Z

This PR

Set up the infrastructure to generate the internal modules for text segmentation using Unicode Character Database files
Includes a script to run code gen and test generated files from root
Includes most of the parser logic for parsing the code point and GBP from GraphemeBreakProperty-15.1.0.txt data file

rtfeldman · 2023-10-22T14:25:32Z

ucd/GBP.roc

+        _ -> trimmed
+
+expect removeTrailingSlash "abc  " == "abc"
+expect removeTrailingSlash "  abc/package/  " == "abc/package"


I love quick-and-easy tests like this! 🤗

rtfeldman · 2023-10-22T14:27:52Z

ucd/GBP.roc

+        'D' -> 13
+        'E' -> 14
+        'F' -> 15
+        _ -> 0


This can totally be in a package someday...or maybe a builtin? 🤔

rtfeldman

Looks great! Super exciting to see that we already have Unicode data files as the source of truth, and that we're parsing them in Roc! 😻 😻 😻

WIP add grapheme segmentation

b82b2f0

rtfeldman reviewed Oct 22, 2023

View reviewed changes

ucd/GBP.roc

'D' -> 13

'E' -> 14

'F' -> 15

_ -> 0

Copy link

Contributor

rtfeldman Oct 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can totally be in a package someday...or maybe a builtin? 🤔

rtfeldman approved these changes Oct 22, 2023

View reviewed changes

rtfeldman merged commit 6015f81 into roc-lang:main Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text segmentation for extended grapheme clusters - part 1 #2

Add text segmentation for extended grapheme clusters - part 1 #2

lukewilliamboswell commented Oct 22, 2023

rtfeldman Oct 22, 2023

rtfeldman Oct 22, 2023

rtfeldman left a comment

Add text segmentation for extended grapheme clusters - part 1 #2

Add text segmentation for extended grapheme clusters - part 1 #2

Conversation

lukewilliamboswell commented Oct 22, 2023

rtfeldman Oct 22, 2023

Choose a reason for hiding this comment

rtfeldman Oct 22, 2023

Choose a reason for hiding this comment

rtfeldman left a comment

Choose a reason for hiding this comment