No mechanism to indicate what the "default language" of a description is [I18N] #635

aphillips · 2019-05-07T16:20:26Z

Section 5.2.1 "Thing"
https://cdn.staticaly.com/gh/w3c/wot-thing-description/TD-TAG-review/index.html?env=dev#thing

Provides additional (human-readable) information based on a default language.

The optional field description is described as above, but there appears to be no mechanism defined for declaring what language the "default language" is. It is possible that the JSON-LD @context mechanism could be used to supply an @language for a description. If that is the preferred or intended mechanism, it should be called out. Otherwise there should be mechanism, possibly at the document level, for declaring the default language using a BCP47 language tag.

The text was updated successfully, but these errors were encountered:

mkovatsc · 2019-05-07T16:38:30Z

Thank you for your review! We have been working hard recently and updated the spec already to document the @language mechanism accordingly.

We also added text on the possibility to use content negotiation such as the Accept-Language header field of HTTP.

I will cite the assertions in this Issue.

mkovatsc · 2019-05-07T23:43:40Z

We need need some rewrite of the text after the table in 5.3,1,1 Thing. I started sketching the new text, statement by statement:

The @context name-value pair MUST contain the string https://www.w3.org/2019/td/v1 either directly when of type string or as first element when of type Array.

When @context is an Array, the string https://www.w3.org/2019/td/v1 MAY be followed by elements of type anyUri or Map in any order,

Maps contained in an @context Array MAY have name-value pairs,
where the value is a namespace IRI of type anyURI and the name a Term or prefix defined for that namespace,
while it is RECOMMENDED to include only one Map in the Array that holds all defined name-value pairs.

One Map contained in an @context Array SHOULD contain a name-value pair,
where the name is @language and the value a well-formed language tag as defined by [[!BCP47]],
which defines the default language for the Thing Description instance.

The default language is used to compute the base direction for all human-readable values except for MultiLanguage Maps:

...continue with the bullet point list

aphillips · 2019-05-08T01:10:22Z

The default language is used to compute the base direction for all human-readable values except for MultiLanguage Maps:

While this reflects the current state of affairs in JSON standards-based document formats, it's not a particularly desirable recommendation and the I18N WG is actively working to find a "better path". In addition, I'd point out that "compute the base direction" needs a definition. Therefore I'd suggest that you reference our document String-Meta, particularly the best practice documented at #script_subtag, which describes how one would do this. In keeping with the allowed-but-not-loveable nature, I'd suggest:

The default language MAY be used to compute the base direction [[String-Meta]] for human-readable text values not otherwise associated with a language tag (such as MultiLanguage Maps).

mkovatsc · 2019-05-08T03:25:05Z

See #643 (comment) which is how it continues (indicated by "...continue with the bullet point list")

mmccool · 2019-05-08T06:01:00Z

One issue is that we expect the JSON-LD 1.1 WG to add some means to specify text direction explicitly in an @context, perhaps with the addition of an @dir tag. That would be great, but unfortunately it does not exist yet and the JSON-LD 1.1 draft spec explicitly calls out that it does not currently provide a means to explicitly specify text direction. However, we don't want to define our own way of doing it since then we would be in potential conflict with whatever the JSON-LD 1.1 group decides, and we want TDs to work with generic JSON-LD processors. So, one reason we decided on the "infer from a language tag with a script subtag as necessary" approach is that if there is a way to specify text direction explicitly as metadata in the final JSON-LD 1.1 standard, we can allow it (and give it priority if that metadata is present) and update our spec to just use the current infer-from-the-language-and-script-tags as a fallback plan.

mkovatsc · 2019-05-08T15:29:37Z

@aphillips , please have a look at the new definitions in 5.3.1.1 Thing (after the table) and 5.3.1.7 MultiLanguage.

aphillips · 2019-05-08T15:31:21Z

@mkovatsc My bad for not looking at the bulleted list.

@mmccool I fully agree with your comment and appreciate the care the WG applied here.

aphillips · 2019-05-08T15:43:08Z

In 5.3.1.1 I see:

where the name is the Term @language and the value a well-formed language tag as defined by [BCP47], potentially including a script subtag (e.g., en, de, ja, zh-Hans, zh-Hant, az-Arab).

The call out about script subtags seems overly specific. Do you really need to call that out? It's on your mind now because of the thread about direction, but I think it's a distraction. I would also show some examples with region subtags and maybe even a variant. Perhaps:

where the name is the Term @language and the value a well-formed language tag as defined by [BCP47] (e.g., en, de-AT, gsw-CH, zh-Hans, zh-Hant-HK, sl-nedis).

aphillips · 2019-05-08T15:59:05Z

The direction computing stuff in 5.3.1.1 I have these comments:

approach is misspelled.
The following quoted text forces a default of LTR. I would probably encourage using "first strong" detection instead:

If no language tag is given, the base direction MUST be assumed to be LTR (left-to-right). This implies that if the language used in human-readable text uses a script that is written RTL (right-to-left), the default language needs to be specified explicitly, so that an appropriate base direction can be inferred.

I would reduce the MUST to SHOULD. I would also allow CLDR's "likely subtag" algorithm to be used. This is especially helpful for Chinese cases where some systems use region subtags to imply the script (e.g. zh-CN => zh-Hans-CN etc.). Note that Azerbaijani is (if rarely these days) also written in Cyrillic (az-Cyrl).

In cases where a language can be written in more than one script with different base directions, the corresponding language tag given in @language or MultiLanguage Maps MUST include a script subtag, so that an appropriate base direction can be inferred. An example is Azeri, which is written LTR when Latin script is used (specified using az-Latn) and RTL when Arabic script is used (specified using az-Arab).

I think the following recommendations are counter productive. I like that you point out the problem, but the types of strings used here might very naturally include brand names, trademarks, version numbers, etc.

TD Processors should also be aware of certain special cases that can arise in processing bidirectional text. In particular, producers of TDs should avoid numbers with embedded spaces in bidirectional text. Strings starting with embedded text using a script with a writing direction opposite to that of the base direction (for example, English words embedded in Arabic text) or with multidigit numbers should be avoided if possible.

I would instead provide guidance to producers and consumers, perhaps as follows:

TD Processors should be aware of certain special cases when processing bidirectional text. They should take care to use bidi isolation when presenting strings to users, particularly when embedding in surrounding text. Mixed direction text can occur in any language, even when the language is properly identified.

TD producers should attempt to provide mixed direction strings in a way that can be displayed successfully by a naive user agent. For example, if an RTL string begins with an LTR run (such as a number or a brand or trade name in Latin script), including an RLM character at the start of the string or wrapping opposite direction runs in bidi controls can assist in proper display.

@r12a any comments?

aphillips · 2019-05-08T16:05:48Z

On 5.3.1.7:

Same comment about script subtags as I mentioned above.
Should there be a requirement that language tags not be repeated? (e.g. you can't have two strings with the tag en-GB)

mkovatsc · 2019-05-08T17:17:55Z

Pushing an update with your proposed changes in 5 min.

mmccool · 2019-05-09T14:40:24Z

At least should add an assertion that if the script subtag is NOT necessary it should not be included
SHOULD use strong first in the absence of other information
Coordinate this with MAY use strong-first even if have language information or can't use it (eg on a constrained device).
Remove "avoid strings starting with ..."; make sure String-Meta is referenced as a guide

Look again at this suggestion:
I would instead provide guidance to producers and consumers, perhaps as follows:

TD Processors should be aware of certain special cases when processing bidirectional text. They should take care to use bidi isolation when presenting strings to users, particularly when embedding in surrounding text. Mixed direction text can occur in any language, even when the language is properly identified.

TD producers should attempt to provide mixed direction strings in a way that can be displayed successfully by a naive user agent. For example, if an RTL string begins with an LTR run (such as a number or a brand or trade name in Latin script), including an RLM character at the start of the string or wrapping opposite direction runs in bidi controls can assist in proper display.

mkovatsc · 2019-05-09T15:13:17Z

Remove "avoid strings starting with ..."; make sure String-Meta is referenced as a guide

Note that I removed the critical text on avoiding some text constructs already. We have this NOTE, which should be replaced by the processor/producer paragraphs, I guess:

Great care has to be given when assigning bidirectional text to the human-readable metadata of Thing Descriptions. Producers of such texts are advised to include bidi controls as appropriate to try to ensure proper display. Consumers of such texts are advised to apply bidi isolation when including human-readable metadata of TDs in other text (e.g., for Web user interface). Strings on the Web: Language and Direction Metadata [string-meta] provides some guidance and illustrates a number of pitfalls when using bidirectional text.

aphillips · 2019-09-26T17:11:32Z

I've reviewed all of the edits pertaining to the original issue here (defining the default language) and the various other suggestions on this thread. I'm satisfied with the results.

sebastiankb · 2019-09-27T15:29:20Z

thank you for your feedback. I will close this issue.

aphillips mentioned this issue May 7, 2019

No mechanism to indicate what the "default language" of a description is w3c/i18n-activity#677

Closed

aphillips changed the title ~~No mechanism to indicate what the "default language" of a description is~~ No mechanism to indicate what the "default language" of a description is [I18N] May 7, 2019

mkovatsc added the PR needed label May 7, 2019

mmccool added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label May 8, 2019

mkovatsc added a commit that referenced this issue May 8, 2019

Improve assertions for #635

906b922

mkovatsc added Needs review Issue was fixed, but is still open for post-merge reviews and removed PR needed labels May 8, 2019

mmccool added the by CR transition label May 9, 2019

mmccool removed the by CR transition label May 9, 2019

sebastiankb closed this as completed Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No mechanism to indicate what the "default language" of a description is [I18N] #635

No mechanism to indicate what the "default language" of a description is [I18N] #635

aphillips commented May 7, 2019

mkovatsc commented May 7, 2019

mkovatsc commented May 7, 2019

aphillips commented May 8, 2019

mkovatsc commented May 8, 2019

mmccool commented May 8, 2019 •

edited

Loading

mkovatsc commented May 8, 2019

aphillips commented May 8, 2019

aphillips commented May 8, 2019

aphillips commented May 8, 2019

aphillips commented May 8, 2019

mkovatsc commented May 8, 2019

mmccool commented May 9, 2019 •

edited

Loading

mkovatsc commented May 9, 2019

aphillips commented Sep 26, 2019

sebastiankb commented Sep 27, 2019

No mechanism to indicate what the "default language" of a description is [I18N] #635

No mechanism to indicate what the "default language" of a description is [I18N] #635

Comments

aphillips commented May 7, 2019

mkovatsc commented May 7, 2019

mkovatsc commented May 7, 2019

aphillips commented May 8, 2019

mkovatsc commented May 8, 2019

mmccool commented May 8, 2019 • edited Loading

mkovatsc commented May 8, 2019

aphillips commented May 8, 2019

aphillips commented May 8, 2019

aphillips commented May 8, 2019

aphillips commented May 8, 2019

mkovatsc commented May 8, 2019

mmccool commented May 9, 2019 • edited Loading

mkovatsc commented May 9, 2019

aphillips commented Sep 26, 2019

sebastiankb commented Sep 27, 2019

mmccool commented May 8, 2019 •

edited

Loading

mmccool commented May 9, 2019 •

edited

Loading