Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add members for localization #1101

Merged
merged 1 commit into from
Oct 8, 2024
Merged

Add members for localization #1101

merged 1 commit into from
Oct 8, 2024

Conversation

christianliebel
Copy link
Member

@christianliebel christianliebel commented Oct 9, 2023

Closes #1077, closes #1078, closes #1080, closes #1085, closes #1087, closes #1086, closes #1084, closes #1088, closes #676

This change (choose at least one, delete ones that don't apply):

  • Adds new normative requirements

Implementation commitment (delete if not making normative changes):

If change is normative, and it adds or changes a member:

Commit message:

Add members for localization

Person merging, please make sure that commits are squashed with one of the following as a commit message prefix:

  • chore:
  • editorial:
  • BREAKING CHANGE:
  • And use none if it's a normative change

Preview | Diff

@aarongustafson
Copy link
Collaborator

I'd prefer adding the "_localized" suffix across the board as it makes it explicit what the member's purpose is.

@marcoscaceres
Copy link
Member

marcoscaceres commented Nov 3, 2023

Yeah, I guess it does make sense to make it "_localized" as this can only be used for that.

@dmurph
Copy link
Collaborator

dmurph commented Nov 3, 2023

From editors meeting:

  • We want to also localize the icons, but because it already has an s that makes it a little weird
  • Due to that, let's do *_localized for all of these fields.
  • icons_localized won't have any triplets, we'll just resuse the parsing algorithm for icons (same json structure). But, like the others, it will be lang-string dictionary.
  • These can apply as well within the shortcuts item, to be able to localize those.

TPAC discussion here

@marcoscaceres
Copy link
Member

Instead of defining new members, let's instead define *_localized member pattern that is either a "text localizable member" or an "image-resource localizable member" (i.e., icons or for shortcuts). That way we don't need to define the algorithms over and over again.

@marcoscaceres
Copy link
Member

We already have defined localizable members so we can already reuse that.

@marcoscaceres
Copy link
Member

marcoscaceres commented Nov 3, 2023

Here is what an shortcuts member might look like with some _localized members sprinkled in:

{
  "shortcuts": [
    {
      "name": "Play Later",
      "name_localized": {
        "fr": "Écouter plus tard"
      },
      "description": "View the list of podcasts you saved for later",
      "description_localized": {
        "fr": { "lang": "en", "dir": "ltr", "value": "English description because that's part of our brand." },
      },
      "url": "/play-later",
      "icons": [
        {
          "src": "/icons/play-later.svg",
          "type": "image/svg+xml"
        }
      ],
      "icons_localized": {
        "fr": [
          {
            "src": "/icons/fr/play-later.svg",
            "type": "image/svg+xml"
          }
        ]
      } 
    },
    {
      "name": "Subscriptions",
      "description": "View the list of podcasts you listen to",
      "description_localized": {
        "fr": "Consultez la liste des podcasts que vous écoutez."
      },
      "url": "/subscriptions?sort=desc"
    }
  ]
}

Note: updated to include a triple example.

@marcoscaceres
Copy link
Member

Question is if we should allow localizing various URLs... that gets a bit messy in places, but could be doable.

@christianliebel
Copy link
Member Author

@marcoscaceres @dmurph Thanks. Let's also localize urls. I'll resume working on this soon.

@mgiuca
Copy link
Collaborator

mgiuca commented Nov 6, 2023

LTTP... thanks for working on this Christian!

Let's also localize urls. I'll resume working on this soon.

Wait, why are we localizing URLs? That seems undesirable to me. It means that shortcuts can go to different places depending on the language, and we need to update functionality that may be cached (that isn't just string/icon data). This may add non-trivial implementation complexity (it would require a deeper analysis to understand to what degree).

Would the use case for this be that you can have URLs with different ?lang= query parameters to match the user's language? I think I'd prefer just letting the website use the Accept-Language HTTP feature for this.

@christianliebel
Copy link
Member Author

Wait, why are we localizing URLs? That seems undesirable to me.

Maybe we can also loop in @aphillips to see whether it makes sense to localize URLs.

@aphillips
Copy link
Contributor

Icons, graphics, or remote content (help pages, for example) are sometimes varied by locale or by region (with locale serving as a poor proxy for region). This might be done, for example, because the icon contains some text or because a graphic shows a culturally-linked image (personal images, national costume, post box shapes, etc. etc.) that the user wishes to localize. Or it might be because functionality or defaults differ (sorting based on pronunciation instead of name for Chinese, for example)

We don't know why the user might want to localize the icon or shortcut (or whatever).

I agree that this can be abused and there might be reasons not to allow some fields to be localized, although I'd probably thinks about health warnings first?

@aarongustafson
Copy link
Collaborator

I think I'd prefer just letting the website use the Accept-Language HTTP feature for this.

Probably the most elegant, but it’s not within the realm of possibility for a lot of orgs and site types (thinking static sites, for example).

Would the use case for this be that you can have URLs with different ?lang= query parameters to match the user's language?

See also MDN style links where the language code is embedded in the URL path.

Icons, graphics, or remote content (help pages, for example) are sometimes varied by locale or by region (with locale serving as a poor proxy for region). This might be done, for example, because the icon contains some text or because a graphic shows a culturally-linked image (personal images, national costume, post box shapes, etc. etc.) that the user wishes to localize. Or it might be because functionality or defaults differ (sorting based on pronunciation instead of name for Chinese, for example)

We don't know why the user might want to localize the icon or shortcut (or whatever).

Agree on all of these.

@benfrancis
Copy link
Member

benfrancis commented Nov 14, 2023

Just one data point, but as a past precedent the similar Web of Things (WoT) Thing Description specification landed on titles and descriptions members of Thing (each of type MultiLanguage) in addition to title and description for this use case. This is the case for both Thing Description 1.0 (W3C Recommendation) and Thing Description 1.1 (W3C Proposed Recommendation).

I personally don't like that solution because I prefer using HTTP content negotiation with an Accept-Language header as per the suggestion in the current Web Application Manifest Working Draft, rather than creating an extremely verbose manifest with (theoretically) up to thousands of different languages. However, as has been pointed out it's not always possible to use content negotiation (e.g. on static site hosting like GitHub Pages). The Thing Description specification therefore offers both as alternative approaches.

If consistency between W3C specifications is considered important, then names, short_names and descriptions would make sense. That doesn't work for icons, but that member is already different because it's an array of ImageResources that the user agent can select from. Language could potentially just be another criteria for selecting an image.

I note that in HTML the <link> element has a hreflang attribute, so presumably <link rel="icon" href="/icons/fr/play-later.svg" hreflang="fr"> is valid (though likely currently ignored by user agents). For the (slightly unusual) case of localising app icons, an equivalent might be to add a lang member to ImageResource.

Example:

{
  "lang": "en",
  "dir": "ltr",
  "name": "Super Racer 3000",
  "names": {
    "fr-FR": "Super Coureur 3000",
    "es-ES": "Súper Corredor 3000"
  },
  "short_name": "Racer3K",
  "short_names": {
    "fr-FR": "Coureur3K",
    "es-ES": "Corredor3K"
  },
  "icons": [
    {
      "src": "icon.png",
      "sizes": "64x64",
      "type": "image/png"
    },
    {
      "src": "icon-fr.png",
      "sizes": "64x64",
      "type": "image/png",
      "lang": "fr-FR"
    },
    {
      "src": "icon-es.png",
      "sizes": "64x64",
      "type": "image/png",
      "lang": "es-ES"
    }
  ],
  "scope": "/",
  "id": "superracer",
  "start_url": "/start.html",
  "display": "fullscreen",
  "orientation": "landscape",
  "theme_color": "aliceblue",
  "background_color": "red"
}

Note that ImageResource also has a label member which could then also be localised this way, if a localised accessible description of the icon is needed!

One question: How does dir interact with the localised members? Can it safely be derived from language? The Thing Description specification has a lot to say on that topic, which I can't say I fully understand, but they use the Strings on the Web: Language and Direction Metadata W3C Note for guidance.

Hope this helps.

@mgiuca
Copy link
Collaborator

mgiuca commented Nov 16, 2023

Perhaps we should clarify what "localize URLs" means.

Are we talking about:

  • Localizing all fields that are URLs? (e.g. start_url, scope, potential future ones like the home scope of tabbed apps).
  • Localizing all fields called "url", like the ones in icons and shortcuts?
  • Something else?

I think we generally agree that we need to localize icons, but we're doing that at the icons level, not the url within icons (i.e. icons_localized with a local version of each icon dict, not icons with the icon dict including a url_localized. Allowing both of these creates two ways to do it which isn't ideal.)

Maybe it makes sense for URLs like shortcuts to be able to change based on language, but really the point of this initiative (I thought) was to be able to localize your app's metadata that gets displayed at the OS level, like name and icon, not to solve the problem of localizing all the content in the app. (If you can't configure your server to serve content based on headers, you can still make your service worker return localized content based on Accept-Language.) An app that relies purely on the manifest URLs to display content in the user's language is likely to be quite brittle.

I think we could run into major headaches if we allow scope to be localized. And by extension, start_url. (e.g. scope must be a superset of start_url - what happens if that's true in some languages but not others?)

I prefer if we just start with name, short_name, description and icons and go from there as needed.

How does dir interact with the localised members? Can it safely be derived from language?

This was discussed at TPAC (search for "dir") - unfortunately when I asked this question, the answer was not recorded ("?"). From memory, @aphillips pointed out that we may not know the language's direction because languages are not set in stone. I think the overwhelmingly common case will be a known language, which means we should be able to derive dir from lang in all practical cases (and probably default to ltr if we don't recognize the language - the overwhelming majority of languages are LTR). We should have the dir member for being explicit, but I think in 99.9% of cases the site should not need to specify dir as it can be derived from language.

@marcoscaceres
Copy link
Member

Supportive of what @mgiuca said above... let's start small and go from there (and definitely let's not have multiple ways of doing the same thing, specially with URLs). And yes, let's keep the localizable members restricted to a small set (including image objects, not members within those objects).

@aphillips
Copy link
Contributor

@mgiuca noted:

From memory, @aphillips pointed out that we may not know the language's direction because languages are not set in stone. I think the overwhelmingly common case will be a known language, which means we should be able to derive dir from lang in all practical cases (and probably default to ltr if we don't recognize the language - the overwhelming majority of languages are LTR). We should have the dir member for being explicit, but I think in 99.9% of cases the site should not need to specify dir as it can be derived from language.

Your specification should not derive direction from language unless there is no other alternative. This is not because languages are mutuable.

You may permit an item that lacks separate direction metadata to attempt to use the language to estimate the direction or to act as a hint, but this should not be the default way of doing it. In fact, I18N recommends making the direction auto when the dir is not present at the item or document-default level instead of using directional estimation based on language. We explicitly recommend using auto instead of ltr as the default, since an unlabeled string that starts with a strongly RTL character is probably trying to tell you something 😉.

We have extensive guidance in https://www.w3.org/TR/string-meta/ and we're working on an update to our guidance about manifests here (I hope to land this PR on Thursday) which you may find useful here.

@dmurph dmurph mentioned this pull request May 2, 2024
@dmurph
Copy link
Collaborator

dmurph commented May 2, 2024

Manifest Working session notes:

This seems to be the concluded format (copied from Marcos's comment above) with triple examples:

{
  ...
  "dir": "ltr",
  "lang": "en-x-marcos",
  ...
  "shortcuts": [
    {
      "name": "Play Later",
      "name_localized": {
        "fr": "Écouter plus tard"
      },
      "description": "View the list of podcasts you saved for later",
      "description_localized": {
         "en":  { "value": "My App, hey!", "dir": "ltr", "lang": "fr"},
         "en-GB": { "value": "My App, eh wut?", "dir": "ltr"},
         "fr": "string",
         "ar": { "value": "...", "dir": "rtl" }
      },
      "url": "/play-later",
      "icons": [
        {
          "src": "/icons/play-later.svg",
          "type": "image/svg+xml"
        }
      ],
      "icons_localized": {
        "fr": [
          {
            "src": "/icons/fr/play-later.svg",
            "type": "image/svg+xml"
          }
        ]
      } 
    },
    {
      "name": "Subscriptions",
      "description": "View the list of podcasts you listen to",
      "description_localized": {
        "fr": "Consultez la liste des podcasts que vous écoutez."
      },
      "url": "/subscriptions?sort=desc"
    }
  ],
   ...
}

TPAC notes from this are here

@tomayac
Copy link
Contributor

tomayac commented May 3, 2024

I suppose the stray "lang": "fr" in your comment before isn't intended:

"en":  { "value": "My App, hey!", "dir": "ltr", "lang": "fr"},

I could edit your comment directly, but wanted to make sure it's indeed a copy/paste first.

@dmurph
Copy link
Collaborator

dmurph commented Jun 6, 2024

I suppose the stray "lang": "fr" in your comment before isn't intended:

"en":  { "value": "My App, hey!", "dir": "ltr", "lang": "fr"},

I could edit your comment directly, but wanted to make sure it's indeed a copy/paste first.

This actually is intended - the example might not be great, but we need the ability for a string to be displayed in a different language A when showing for language B. For example, if a product name or logo etc was a character in a different language, the dev can make sure it renders in the desired language for the user's chosen display language.

@marcoscaceres
Copy link
Member

@christianliebel can you check #676 ... and update the description of this issue as closing that bug, as well as any other bugs this will close?

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
christianliebel added a commit to christianliebel/manifest-incubations that referenced this pull request Jun 8, 2024
@calidion
Copy link

calidion commented Jun 25, 2024

hello, everyone

I would suggest that localization be a native feature for all strings.

Hence we don't need extra fields for localization.

just to enhance the parser to parse new strings.

My suggestion is that strings can divided into two types:

  1. the normal primitive strings defined by different charsets
 "Hello world!"
or
 "你好世界!"
  1. the enhanced strings which include i18n features and be in json format, like this:
lang: "en",   // fallback language if browser meets no locale strings listed
"name":  {
  "en":  "Web App",    // fallback for all en-* browsers or for general  en-* users
  "en-US":  "Web App",   // specific string for localized en
  "zh": "网站应用”, // fallback for all zh-* browsers  or for general  zh-* users
  "zh-CN": "网站应用”, // specific string for localized zh
  ...
}

index.html Outdated Show resolved Hide resolved
@@ -2298,7 +2556,9 @@ <h3>
</h3>
<p>
The <dfn>application's name</dfn> is derived from either the
[=manifest/name=] member or [=manifest/short_name=] member.
[=manifest/name=] member or [=manifest/short_name=] member. The user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more clarification as to which one wins.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Chromium we use them depending on UX needs. It's not implemented, but I would like to follow similar rules for extensions - short_name is truncated to 12 characters and name is truncated to 75.

This sets expectations for devs, and allows the user agent to show an app name where there might not be much space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we discuss that in a separate issue? It used to be like this and is not directly related to l10n.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How they are treated by the OS is implementation specific. I don't think we need to say anything here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK if it's true that these are only used by the OS - the user agent uses these strings too. Happy for it to be a non-normative note - truncating characters is actually fraught with weirdness due to multi-character sequences now etc, so it would be helpful to document here limits to prevent those breakages.

User agents can't just allow any string here to be displayed in full, as it'll cause weird UX issues with dialogs / menu items / etc. This was the purpose of, say, short_name.

But - I certainly don't want to block this. And perhaps you are right here @christianliebel that this can be addressed in a separate issue / pull request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll second @dmurph's truncating characters is actually fraught with weirdness.

Example. 🏴󠁧󠁢󠁳󠁣󠁴󠁿

The Scottish flag emoji is 7 code points. The JavaScript string length is 14. It's one grapheme (user perceived character, i.e. screen position). If the short_name limit is short_name.length < 13, the flag won't fit...

As a reminder, I18N's guidelines about text truncation apply to length limits. One has to be careful about what one is counting (bytes, code points, graphemes) to ensure that some languages are not disadvantaged because of how their writing system works (emoji makes for a great demo, but there are languages that use combining marks similar to the way emoji sequences work that have the same counting issue)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #1145 for further discussion.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@calidion
Copy link

I don't think put all languages into one file is a good idea.
And I never see compatibility work in web.
Almost all new created web pages can not be loaded in old browsers.
backward compatibility is meaningless in most cases.

@calidion
Copy link

appendix _localized definitely is not a good choice for localization for a web application or a web site.
There should be a general localization schema for both a web application and/or a web site that can easily introduce multi-language support.
and reduce the current burden carried by most backend web servers and frontend libraries/frameworks.

@christianliebel
Copy link
Member Author

There should be a general localization schema for both a web application and/or a web site that can easily introduce multi-language support.

@calidion While a general localization schema for web applications and websites may be beneficial, this PR is solely focused on adding localization capabilities specifically to the Web Application Manifest rather than solving broader multi-language support for web applications or websites. I think the WICG Proposals repo would be the right spot to propose and discuss a more general solution.

@calidion
Copy link

@christianliebel
Thanks for the information.

But still I hope this feature can be hold for a while before vast agreement would be made.

Copy link
Collaborator

@dmurph dmurph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM, one suggestion to change the example.

</p>
<aside class="example" title="Localizing the application name">
<pre class="json">
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make it obvious why one might want to have a different 'lang' for an entry in here, perhaps we should use a company that so it would have opinions about name pronunciations in other languages.

L'Occitane is a good example - using wikipedia you can see that sometimes is uses the french version, while other times it has a translated name. For cases like English & German, screen readers should read the name in a french pronunciation.

Suggested change
{
{
"lang": "fr",
"dir": "ltr",
"name": "L'Occitane",
"name_localized": {
"en": { value: "L'Occitane", "lang": "fr" },
"de": { value: "L'Occitane", "lang": "fr" },
"zh": "歐舒丹"
"en-GB": {"value": "L'Occitane en Provence", "lang": "fr" },
"fr": "L'Occitane",
"ar": {value: "لوکسیتان", "dir": "rtl"}
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's okay to use real-world brands/examples, that's fine with me. I also have the "Just Eat" example here, where the brand even differs between de-DE and de-CH: #1101 (comment)

@marcoscaceres WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but don't use a real company name... they might not approve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example below seems fine to me... @dmurph?

@@ -2298,7 +2556,9 @@ <h3>
</h3>
<p>
The <dfn>application's name</dfn> is derived from either the
[=manifest/name=] member or [=manifest/short_name=] member.
[=manifest/name=] member or [=manifest/short_name=] member. The user
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Chromium we use them depending on UX needs. It's not implemented, but I would like to follow similar rules for extensions - short_name is truncated to 12 characters and name is truncated to 75.

This sets expectations for devs, and allows the user agent to show an app name where there might not be much space.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
Copy link
Member

@marcoscaceres marcoscaceres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok to me.

Add the trimming any text strings.

@aphillips
Copy link
Contributor

Why did you add trimming of localized string values? Leading or trailing whitespace is sometimes significant in localization. Often it is a signal of an I18N bug, but still... users sometimes count on the whitespace being present.

@christianliebel
Copy link
Member Author

@aphillips The browser engines trim the values for the default representation fields (example name: Chromium, Gecko, WebKit), so we wanted to treat the localized values the same way. If you are still at TPAC today, we can also discuss this in person. The WebApps WG meets in Huntington (4 Concourse Level) today.

@aphillips
Copy link
Contributor

No, that answers the question. I'm "at" TPAC, but have a cold, so I'm staying away from giving it to you all. Happy to jump on your Zoom connection if there is more to discuss, though.

Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
@christianliebel christianliebel merged commit 75ae2b4 into main Oct 8, 2024
2 checks passed
@christianliebel christianliebel deleted the l10n branch October 8, 2024 11:58
github-actions bot added a commit that referenced this pull request Oct 8, 2024
SHA: 75ae2b4
Reason: push, by christianliebel

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@mgiuca
Copy link
Collaborator

mgiuca commented Oct 8, 2024

Amazing work, Christian! This has been an extremely long standing and complex issue, it's great that you worked your way through it.

@aarongustafson
Copy link
Collaborator

Yes, congrats. Amazing work!

@dmurph
Copy link
Collaborator

dmurph commented Oct 11, 2024

This is really awesome @christianliebel! Thanks for your work here and being so persistent!

@jcayzac
Copy link
Member

jcayzac commented Oct 26, 2024

Thanks @christianliebel!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment