-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-14453 Adding IANA zone.tab mapping in timezone.xml #3105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like great progress, thanks @yumaoka! Are there other changes you have planned beyond what's in here already?
Coincidentally, @anba and I are discussing whether the ECMAScript spec should refer to CLDR instead of IANA as a "source of truth" for which IDs are available in ECMAScript and which ones are canonical. See tc39/ecma402#806.
@anba - you may want to review this PR too.
common/bcp47/timezone.xml
Outdated
<type name="ugkla" description="Kampala, Uganda" alias="Africa/Kampala"/> | ||
<type name="umawk" description="Wake Island, U.S. Minor Outlying Islands" alias="Pacific/Wake"/> | ||
<type name="umjon" description="Johnston Atoll, U.S. Minor Outlying Islands" alias="Pacific/Johnston"/> | ||
<type name="umjon" description="Johnston Atoll, U.S. Minor Outlying Islands" deprecated="true" iana="ushnl"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iana="ushnl"
Did you mean preferred="ushnl"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct. Fixed the error.
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
common/dtd/ldmlBCP47.dtd
Outdated
@@ -69,6 +69,9 @@ CLDR data files are interpreted according to the LDML specification (http://unic | |||
<!ATTLIST type since CDATA #IMPLIED > | |||
<!--@MATCH:version--> | |||
<!--@METADATA--> | |||
<!ATTLIST type iana CDATA #IMPLIED > | |||
<!--@MATCH:any--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better as a regex that matched the structure. eg something like [A-Za-z_]+(/[A-Za-z_]+)*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@macchiati Updated. Please check.
|
I just noticed that WET, EET, CET, MET are canonical in IANA but are missing from timezone.xml. I assume these should be added? Other than those, are there any other Zones in IANA that are missing from timezone.xml? |
If the intent is for zones of the same country code (SJ in this case) to always share the same canonical ID, then resolving to Arctic/Longyearbyen does sound like the correct behavior here. Maybe this should prompt a change to the definition of which IDs that CLDR says are canonical? I wrote an initial guess at how this could be described. Would the rules below work?
(EDIT: changed above to accommodate links that might not correspond to a country code, like Etc/Universal) |
Right. This is the reason why it's not an alias of Europe/Berlin.
This one is a bit tricky. I did not make
On the other hand, both Etc/UTC and Etc/GMT are CLDR canonical zones. Theoretically they are different, and not a small number of people argue they should be different. Unlike EST, it's not coming from legacy system requirements, CLDR handles them as separate zones. I think it's probably easier to spell out all exceptions, because these exception for legacy zones probably won't change in future. |
@@ -69,6 +69,9 @@ CLDR data files are interpreted according to the LDML specification (http://unic | |||
<!ATTLIST type since CDATA #IMPLIED > | |||
<!--@MATCH:version--> | |||
<!--@METADATA--> | |||
<!ATTLIST type iana CDATA #IMPLIED > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Your point is right. I'm now wondering if it still make sense to keep these legacy zone IDs. There is a problem for making CLDR is not the source of offset transition rules, but providing localized names. If we don't have a CLDR zone ID corresponding to IANA zone If application depends on CLDR want to show localized name for time zone, such application would exclude these "legacy" zones. Such application may limit the set of zones from zone.tab. EST, CST6CDT, CET... these are not included in zone.tab. For above reasons, I now think we probably don't need these legacy zones (EST5EDT, and some others) as a part of CLDR canonical set. If we want a consistent policy, we have two options.
Adding WET and some others is relatively easy. I will bring this question to CLDR team, and handle it in a separate PR. |
There's one more oddball canonical ID: "Factory". Should we also omit that one? I admit that I don't know what that Zone is for. Do you?
👍 I like the idea of removing weird legacy IDs like "PST8PDT", because it'd clean up the output of ECMAScript's I'd also want to hear what @sffc and @anba think about this proposal.
I assume by "deprecate" you mean that we'd make those IDs into aliases of other canonical IDs. If this assumption is correct, then this seems reasonable to me. I assume that single-offset POSIX names like CET would, per your comment above, be resolved to "Etc/GMT*" names, while the 4 multiple-offset POSIX names would be resolved to their appropriate counterparts like "America/New_York" or "America/Chicago"?
I'm OK with spelling out exceptions, but I'd also like to (if possible) document general principles or rules that drive these exceptions. This will be helpful to explain the exceptions to others. Are the rules below (adapted from tc39/ecma402#825) an accurate way to document the changes planned? These include the exceptions noted above and my understanding of your proposed POSIX solution. Each Zone in the IANA Time Zone Database must be primary in CLDR ("primary" means that it is either listed in an
(EDIT: changed "canonical" to "primary" in the text above to match ECMAScript terminology as well as terms used in https://unicode-org.atlassian.net/browse/ICU-22452) |
@justingrant I will discuss with CLDR folks and decide what to do for CET, etc. I'm leaning toward to deprecate existing legacy IDs. The policy of maintaining CLDR canonical zones might be simply documented in the process doc - https://cldr.unicode.org/development/updating-codes/update-time-zone-data-for-zoneparser |
Do the bullet points at the end of #3105 (comment) match your opinion of how these CET, PST8PDT, etc. IDs should be mapped to primary IDs? If yes, then I'll update tc39/ecma402#825 to align with those bullet points.
👍
👍 |
Here's more info about this Zone, from https://mm.icann.org/pipermail/tz/2023-August/033032.html:
|
As mentioned in #3105 (comment), |
CLDR-14453
ALLOW_MANY_COMMITS=true