A string type for Rust that is not required to be valid UTF-8.
-
Updated
Jan 2, 2025 - Rust
A string type for Rust that is not required to be valid UTF-8.
A high-performance wrapper around Intl.Segmenter for efficient text segmentation. This class resolves memory handling issues seen with large strings and "maximum call stack exceeded" exceptions that occur when strings exceed 40-50k characters. Enhances performance by 50-500x. Only ~70 loc (with comments) and no dependencies.
An implementation of the Unicode algorithm for breaking code point sequences into extended grapheme clusters as specified in UAX #29. This library supports version 16.0 of the Unicode Standard.
A lightweight, intuitive wrapper around Intl.Segmenter for seamless segment-aware string operations in TypeScript and JavaScript
Check Unicode strings for possible confusing graphemes
A tool for generating sub-word (phone or grapheme) level embeddings from an HTK-style MLF ASR corpus
terminal width of Unicode 16.0+Emoji strings in nanoseconds
Working backwards through a deep convolutional network, to recreate the input image - and identify areas for improvement.
Add a description, image, and links to the graphemes topic page so that developers can more easily learn about it.
To associate your repository with the graphemes topic, visit your repo's landing page and select "manage topics."