Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chained Transliterator ID parsing #3991

Open
skius opened this issue Sep 1, 2023 · 0 comments
Open

Chained Transliterator ID parsing #3991

skius opened this issue Sep 1, 2023 · 0 comments
Labels
C-transliterator Component: transliterator C-unicode Component: Props, sets, tries

Comments

@skius
Copy link
Member

skius commented Sep 1, 2023

Depends on the runtime parsing discussed in #3849.

Transliterators can not only be loaded by a single ID in ICU4C/J, but also through chaining a bunch of other transliterators (including filters) together. Example: [a-z] ; [a] Remove ; Latin-Greek/BGN. These "chains" are actually equivalent to the transform rule source obtained by applying chain.split(";").map(|elt| format!(":: {elt} ;")).collect::<String>(), e.g. :: [a-z] ; :: [a] Remove ; :: Latin-Greek/BGN ;, i.e., the same data struct can be reused (with only an overhead cost of a few empty VZVs).

This is primarily a convenience feature for runtime construction, allowing users to not have to write a dummy source file containing the mapping explained above. Because these chains use the legacy IDs, and ICU4X data uses BCP-47 IDs, the whole issue surrounding mapping legacy IDs to BCP-47 IDs applies (#3891). I suggest instead of supporting these chains of legacy IDs, instead supporting chains of BCP-47 IDs. Support for this is also on the roadmap for ICU: https://unicode-org.atlassian.net/browse/ICU-22474

@skius skius added the C-unicode Component: Props, sets, tries label Sep 1, 2023
@Manishearth Manishearth added the C-transliterator Component: transliterator label Sep 21, 2023
sffc added a commit that referenced this issue Sep 5, 2024
…or docs test (#5483)

A more docs-friendly version of the test introduced in #5469

Related: #3991
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-transliterator Component: transliterator C-unicode Component: Props, sets, tries
Projects
None yet
Development

No branches or pull requests

3 participants