Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify canonicalization algorithms for Intl enumeration #726

Closed
ptomato opened this issue Nov 2, 2022 · 2 comments · Fixed by #889
Closed

Specify canonicalization algorithms for Intl enumeration #726

ptomato opened this issue Nov 2, 2022 · 2 comments · Fixed by #889
Assignees
Labels
c: meta Component: intl-wide issues editorial Involves an editorial fix s: help wanted Status: help wanted; needs proposal champion
Milestone

Comments

@ptomato
Copy link
Contributor

ptomato commented Nov 2, 2022

Once Intl.supportedValuesOf becomes part of the specification, apply the changes proposed in tc39/proposal-intl-enumeration#49

@ptomato ptomato added the editorial Involves an editorial fix label Nov 2, 2022
@sffc sffc added this to the ES 2023 milestone Nov 3, 2022
@sffc sffc added s: help wanted Status: help wanted; needs proposal champion c: meta Component: intl-wide issues labels May 2, 2023
@sffc
Copy link
Contributor

sffc commented May 2, 2023

@ben-allen to sync with @ptomato and coordinate on a PR.

@justingrant
Copy link
Contributor

Note that the time zones (unlike the other things that Intl.supportedValuesOf enumerates) also live in 262, because non-UTC IANA time zones (and hence the need to canonicalize them) can exist in non-402 implementations of ECMAScript.

@gibson042 and I have been working on an editorial PR for 262 (tc39/ecma262#3035) to specify how time zone canonicalization works there. (With editorial PRs for 402 and Temporal to follow if the 262 PR is accepted.) The summary of the PR is:

  • Add an implementation-defined AO AvailableTimeZoneIdentifiers which returns a list of {Identifier, CanonicalIdentifier} records.
  • Add a non-implementation-defined AO GetAvailableTimeZoneIdentifier(id) which returns the record where Identifier ASCII-case-insensitively matches id, or ~empty~.
  • Replace CanonicalizeTimeZoneIdentifier with calls to GetAvailableTimeZoneIdentifier(id).CanonicalIdentifier.
  • Replace IsAvailableTimeZoneIdentifier with calls to GetAvailableTimeZoneIdentifier(id) is not ~empty~
  • Rename DefaultTimeZone to SystemTimeZoneIdentifier to match naming of other related AOs. (And because there's no "default" time zone in Temporal.)

I don't know if this approach is relevant to other things that Intl.supportedValuesOf enumerates, but Richard and I would be happy to coordinate with @ben-allen and @ptomato if similar idioms could be used for those other enumerations too.

BTW, here's a few reasons for the set of AOs above:

  • Simplify and consolidate AOs and reduce the number of implementation-defined AOs required (instead of one for enumeration, another for canonicalization, another for case normalization, etc.)
  • Simplify 402, Temporal, and (if accepted) proposal-canonical-tz spec text.
  • Expose both canonical and non-canonical IDs, either as a full list or as an individual pair, to support a wider range of use-cases without having to change existing behavior or add new AOs. For example, had these AOs been in place earlier, then Temporal, Intl.supportedValuesOf, and proposal-canonical-tz wouldn't need to change or add any AOs to work with time zone IDs.
  • Narrow the scope of 262 text that needs to be overridden in 402, because overrides introduce complexity for readers and implementers. Similarly, reduce the scope of Temporal text that overrides 262 and/or 402.
  • Encourage implementers to think about caching and/or hard-coding the list of IDs and using their indexes for canonicalization instead of fetching them one at a time and storing strings in internal slots. Doing this could make ZonedDateTime, TimeZone, and DateTimeFormat types more space-efficient. Like this pseudo-C++ code:
struct TimeZoneIdRecord {
  const unsigned short idIndex; // could also be a 10-bit field
  const unsigned short canonicalIdIndex; // could also be a 10-bit field
};

// Everything below populated via automated build step using IANA and/or CLDR data

const unsigned short TIMEZONE_ID_COUNT = 579;

const char* sortedTimeZoneIds[TIMEZONE_ID_COUNT] = {
 "Africa/Abidjan",
 "Africa/Accra",
 // . . . 
};

// for case-normalized comparisons
const char* lowerCaseTimeZoneIds[TIMEZONE_ID_COUNT] = {
 "africa/abidjan",
 "africa/accra",
 // . . . 
};

const TimeZoneIdRecord sortedTimeZoneIdMap[TIMEZONE_ID_COUNT] = {
 { 0, 0 },  // example of a canonical ID
 { 1, 1 },
 // . . .
 { 203, 16 }, // example of a non-canonical ID
 // . . .
};

ptomato added a commit to ptomato/ecma402 that referenced this issue May 8, 2024
AvailableCalendars should return all possible aliases, so that other
places in the spec (e.g. in the future, validating a string calendar ID in
Temporal) can use them to determine whether a given input value is valid.
This input value can subsequently be canonicalized by another abstract
operation, CanonicalizeCalendar.

In Intl.supportedValuesOf(), on the other hand, we should not return all
possible aliases, so we filter them out using CanonicalizeCalendar before
returning the list of AvailableCalendars codes as an array to the caller.

See tc39/proposal-intl-enumeration#49. This is the
part of that PR that I consider relevant for the future integration of
Temporal. The time zone parts were already done as part of tc39#876. If
desired, I could implement the rest of that PR, adding
CanonicalizeCollation, CanonicalizeCurrency, CanonicalizeNumberingSystem,
and CanonicalizeUnit as well.

Closes: tc39#726
ptomato added a commit to ptomato/ecma402 that referenced this issue Jun 4, 2024
AvailableCalendars should return all possible aliases, so that other
places in the spec (e.g. in the future, validating a string calendar ID in
Temporal) can use them to determine whether a given input value is valid.
This input value can subsequently be canonicalized by another abstract
operation, CanonicalizeUValue, which we can use in several other places.

In Intl.supportedValuesOf(), on the other hand, we should not return all
possible aliases, so we filter them out using CanonicalizeUValue before
returning the list of AvailableCalendars codes as an array to the caller.

See tc39/proposal-intl-enumeration#49. This is the
part of that PR that I consider relevant for the future integration of
Temporal. The time zone parts were already done as part of tc39#876. If
desired, I could implement the rest of that PR, adding
CanonicalizeCollation, CanonicalizeCurrency, CanonicalizeNumberingSystem,
and CanonicalizeUnit as well.

Closes: tc39#726
gibson042 pushed a commit to ptomato/ecma402 that referenced this issue Jun 4, 2024
AvailableCalendars should return all possible aliases, so that other
places in the spec (e.g. in the future, validating a string calendar ID in
Temporal) can use them to determine whether a given input value is valid.
This input value can subsequently be canonicalized by another abstract
operation, CanonicalizeUValue, which we can use in several other places.

In Intl.supportedValuesOf(), on the other hand, we should not return all
possible aliases, so we filter them out using CanonicalizeUValue before
returning the list of AvailableCalendars codes as an array to the caller.

See tc39/proposal-intl-enumeration#49. This is the
part of that PR that I consider relevant for the future integration of
Temporal. The time zone parts were already done as part of tc39#876. If
desired, I could implement the rest of that PR, additionally supporting
collation, currency, numbering system, and unit canonicalization.

Closes: tc39#726
gibson042 pushed a commit that referenced this issue Jun 4, 2024
#889)

AvailableCalendars returns all supported values, including aliases (which will
be needed for calendar ID validation in Temporal).

Intl.supportedValuesOf("calendar"), however, continues to return only canonical
values, internally filtering out aliases by using a new CanonicalizeUValue
operation that also has applications elsewhere in the spec.

Closes: #726
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: meta Component: intl-wide issues editorial Involves an editorial fix s: help wanted Status: help wanted; needs proposal champion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants