-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API for accessing Unicode properties #148
Comments
@kpozin suggested that we could also have an API on @zbraniecki suggested looking at prior art like "is_ascii_whitespace" in the standard library. |
@echeran I'm looking for the API to query the line break property value given a codepoint, e.g. icu4x/experimental/segmenter/src/line_breaker.rs Lines 72 to 76 in cd4b7c5
Is this issue tracking the implementation of such an API? |
Yes. |
Does this issue depend on #883? |
In part, but the non-binary enumerated property API you are requesting above needs additional work. In particular, it cannot be done until CodePointTrie is done. |
Other than line breaker property, #943 also needs this API to map code point to various Unicode properties like Word_Break, Grapheme_Cluster_Break, etc. |
I added a list of sub issues to the OP. |
All six parts of this issue are done! Closing as fixed. |
Sub-issues:
With UnicodeSet (#91) and UCPTrie (#132) coming along, we should start thinking about what the API will look like for accessing Unicode properties.
A simple and clean solution would be a bunch of functions returning either UnicodeSet or UCPTrie, such as:
These functions would pull from the data provider. The data provider produces serialized sets or tries, and these functions are pretty thin wrappers that convert the serialized format to a Rust UnicodeSet or UCPTrie.
Thoughts?
@markusicu @macchiati @srl295 @EvanJP
The text was updated successfully, but these errors were encountered: