Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-20187 remove obsolete locale ID canonicalization mappings #418

Merged
merged 1 commit into from
Feb 14, 2019

Conversation

markusicu
Copy link
Member

https://unicode-org.atlassian.net/browse/ICU-20187

approved:2018-11-14 remove obsolete locale ID canonicalization mappings

drop mappings for

  • EURO (which Java dropped long ago) & PREEURO (an ICU invention)
  • ".Net names" (incomplete, and which .Net dropped long ago)
  • "Linux names" (similar to .Net names)
  • "old ICU names"
  • "Solaris variants" PINYIN (conflicts with an IANA variant) & STROKE
  • "POSIX names" C & POSIX (still handled by uprv_getPOSIXIDForCategory())

@pedberg-icu
Copy link
Contributor

How does this affect handling of the en_US_POSIX locale which is still supported by CLDR and imported into ICU, and which is still used (by Apple, for one)? Will it affect the ability to use that locale with that locale ID? Will that locale still be included in eg. getLocales()? How will the changes affect canonicalization of that locale ID?

@markusicu
Copy link
Member Author

How does this affect handling of the en_US_POSIX locale

I am not changing anything about that locale, except:

How will the changes affect canonicalization of that locale ID?

ICU uloc_canonicalize() will no longer map "C" or "POSIX" to "en_US_POSIX", but uprv_getPOSIXIDForCategory() still does so.

icu4c/source/common/putil.cpp Show resolved Hide resolved
icu4c/source/common/putil.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@FrankYFTang FrankYFTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments about all the tests. will look at the implementation next.

Copy link
Contributor

@FrankYFTang FrankYFTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about the implementation.

icu4c/source/common/putil.cpp Outdated Show resolved Hide resolved
icu4c/source/common/putil.cpp Outdated Show resolved Hide resolved
icu4c/source/common/putil.cpp Outdated Show resolved Hide resolved
icu4c/source/common/putil.cpp Outdated Show resolved Hide resolved
icu4c/source/common/putil.cpp Outdated Show resolved Hide resolved
icu4c/source/common/putil.cpp Show resolved Hide resolved
icu4c/source/common/putil.cpp Outdated Show resolved Hide resolved
yumaoka
yumaoka previously approved these changes Feb 13, 2019
Copy link
Member

@yumaoka yumaoka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment.

icu4c/source/test/intltest/calregts.cpp Show resolved Hide resolved
srl295
srl295 previously approved these changes Feb 13, 2019
@markusicu markusicu dismissed stale reviews from srl295 and yumaoka via ca72da3 February 14, 2019 16:38
@markusicu markusicu force-pushed the long-obsolete-variants branch from 9b35ea6 to ca72da3 Compare February 14, 2019 16:38
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@markusicu
Copy link
Member Author

sorry, did a rebase on reflex... will address feedback in a few minutes

@markusicu
Copy link
Member Author

PTAL

yumaoka
yumaoka previously approved these changes Feb 14, 2019
Copy link
Member

@yumaoka yumaoka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

Copy link
Contributor

@FrankYFTang FrankYFTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@markusicu markusicu merged commit 1afef30 into unicode-org:master Feb 14, 2019
@markusicu markusicu deleted the long-obsolete-variants branch February 14, 2019 20:27
} CanonicalizationMap;

/**
* A map to canonicalize locale IDs. This handles a variety of
* different semantic kinds of transformations.
*/
static const CanonicalizationMap CANONICALIZE_MAP[] = {
{ "", "en_US_POSIX", NULL, NULL }, /* .NET name */
{ "c", "en_US_POSIX", NULL, NULL }, /* POSIX name */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the cause of regression https://unicode-org.atlassian.net/browse/ICU-20575 - "c" should NOT have been removed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ICU uloc_canonicalize() will no longer map "C" or "POSIX" to "en_US_POSIX", but uprv_getPOSIXIDForCategory() still does so.

two different comparisons were conflated. the strcmp("C") was to check whether the posix layer was returning real data or not. The mapping still needed to be done in uloc_canonicalize()

srl295 added a commit to srl295/icu that referenced this pull request Apr 25, 2019
Regression was in 1afef30
PR unicode-org#418 [ICU-20187]

- We dropped the mapping from "C" in uloc_canonicalize,
  but then putil did not handle cases where a codepage was
  set (such as C.UTF-8).

- Add an additional check in uprv_getDefaultLocaleID() for
  locales that end up as "C" or "POSIX" after removing codepage
  suffix.

- Also fix regression where aa@bb would become aa__BB__BB
  (incorrectly doubled __BB)
srl295 added a commit that referenced this pull request Apr 25, 2019
Regression was in 1afef30
PR #418 [ICU-20187]

- We dropped the mapping from "C" in uloc_canonicalize,
  but then putil did not handle cases where a codepage was
  set (such as C.UTF-8).

- Add an additional check in uprv_getDefaultLocaleID() for
  locales that end up as "C" or "POSIX" after removing codepage
  suffix.

- Also fix regression where aa@bb would become aa__BB__BB
  (incorrectly doubled __BB)
srl295 added a commit to srl295/icu that referenced this pull request Apr 25, 2019
Regression was in 1afef30
PR unicode-org#418 [ICU-20187]

- We dropped the mapping from "C" in uloc_canonicalize,
  but then putil did not handle cases where a codepage was
  set (such as C.UTF-8).

- Add an additional check in uprv_getDefaultLocaleID() for
  locales that end up as "C" or "POSIX" after removing codepage
  suffix.

- Also fix regression where aa@bb would become aa__BB__BB
  (incorrectly doubled __BB)

(cherry picked from commit 075cefb)
srl295 added a commit that referenced this pull request Apr 26, 2019
Regression was in 1afef30
PR #418 [ICU-20187]

- We dropped the mapping from "C" in uloc_canonicalize,
  but then putil did not handle cases where a codepage was
  set (such as C.UTF-8).

- Add an additional check in uprv_getDefaultLocaleID() for
  locales that end up as "C" or "POSIX" after removing codepage
  suffix.

- Also fix regression where aa@bb would become aa__BB__BB
  (incorrectly doubled __BB)

(cherry picked from commit 075cefb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants