Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script fallback priority; improve language fallback priority #5344

Merged
merged 8 commits into from
Aug 13, 2024

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented Aug 8, 2024

Manishearth
Manishearth previously approved these changes Aug 8, 2024
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems good to me but would also like shane to take a look

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Algorithm needs some tweaks

components/locale/src/fallback/algorithms.rs Outdated Show resolved Hide resolved
components/locale/src/fallback/algorithms.rs Outdated Show resolved Hide resolved
components/locale/src/fallback/algorithms.rs Outdated Show resolved Hide resolved
components/locale/src/fallback/algorithms.rs Outdated Show resolved Hide resolved
@robertbastian
Copy link
Member Author

Btw this is mainly inspired by #3972 which has some prior discussion.

@robertbastian robertbastian requested a review from sffc August 13, 2024 13:37
@sffc
Copy link
Member

sffc commented Aug 13, 2024

  • @sffc - There are a few things we should definitly hit:
    • sr-ME
    • sr-Latn-ME (actually no, see below)
    • und-Latn(-ME?)
  • @robertbastian I don't think there is ever data in und-Latn-ME. Once we infer the script from the region, we can throw it away.
  • @sffc We need to reach sr-Latn-ME even though we don't store data in sr-ME because CLDR sometimes puts data in sr-Latn-ME. We currently run the same fallbacker in both datagen (CLDR land) and runtime (ICU4X land). If we want to change this behavior, the SourceProvider should automatically load sr-Latn-ME data when sr-ME is requested of it.
  • @robertbastian That's what it does. See add_script_extended in the Cldr code:
    if self.0.serde_cache.file_exists(&path)? {
  • @sffc OK great. So we then shouldn't need to reach sr-Latn-ME in either language or script fallback, right?
  • @robertbastian - We are in language and I'm not in my script PR
  • @sffc OK, let's not retain the region then when going to the script
  • @sffc Also note that the subdivision is basically part of the region. So we shouldn't retain the subdivision.
  • TODO:
    • remove redundant step in language fallback (sr-Latn-ME)
    • remove und-variant step in script fallback (script fallback should end with a script, like region fallback ends with a region)

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: script fallback priority ended up being language fallback priority with one or two extra things at the end. We might be able to do some more code sharing in a future cleanup PR.

@sffc sffc changed the title Script fallback priority Add script fallback priority; improve language fallback priority Aug 13, 2024
@robertbastian robertbastian merged commit 52bd97a into unicode-org:main Aug 13, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants