-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"char": add a type and function for Unicode Character Categories #1348
Comments
The module-in-progress called 'unicode::' in libstd is where I was going to sketch out an interface to libicu. The decision is not actually very simple for most of the character classes, and ICU has this well handled. I guess we can expose it under core::char if everyone's cool with adopting a dependency on libicu? |
libicu provides many additional desirable features, and it is probably present on most computers (Python uses it, so it should be fine for us). Do we want to provide public libicu bindings or just use it internally in modules like "char", "str" etc? |
To implement the functions in Rust's "char" correctly using libicu, i think we only need to call functions like "u_isspace()", "u_isdigit ()", "u_forDigit()" (http://icu-project.org/apiref/icu4c/uchar_8h.html). We wouldn't need full libicu-bindings (including the many constants definitions) yet. |
I think we should go for the libicu route. See #1370 |
Can we re-open this? We don't depend on libicu any more, but there's still no easy way of finding a character's category. |
Sorry to comment on a thread so old, I actually just implemented much of the UCD (v9.0.0) here. It doesn't depend on libicu, nor the standard library, so hopefully it should be easy to use with projects (though it's probably not as reliable as ICU). |
…-lang#1730) - Fix rust-lang#1348: Fix `cargo kani --debug` by redirecting kani-compiler logs to the STDERR so it doesn't conflict with cargo's output expectations. - Fix rust-lang#1631: Remove `kani-compiler` logs from the output of `--verbose`.
For Unicode Character Categories see http://www.fileformat.info/info/unicode/category/index.htm
Haskell implements the type "GeneralCategory" and a function to determine a character's "GeneralCategory".
Their implementation goes like this:
I propose to write a Python script, which does something similar.
Having such a type and function in Rust enables us to correctly implement functions in the "char" module. See http://haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/src/Data-Char.html
The text was updated successfully, but these errors were encountered: