diff --git a/.github/workflows/auto-publish.yml b/.github/workflows/auto-publish.yml index 02c13e7..8062232 100644 --- a/.github/workflows/auto-publish.yml +++ b/.github/workflows/auto-publish.yml @@ -6,7 +6,7 @@ on: jobs: main: name: Build, Validate and Deploy - runs-on: ubuntu-22.04 + runs-on: ubuntu-latest permissions: contents: write steps: diff --git a/README.md b/README.md index e028bf4..4b4b54b 100644 --- a/README.md +++ b/README.md @@ -1,177 +1,350 @@ -# Explainer for the TODO API - -**Instructions for the explainer author: Search for "todo" in this repository and update all the -instances as appropriate. For the instances in `index.bs`, update the repository name, but you can -leave the rest until you start the specification. Then delete the TODOs and this block of text.** - -This proposal is an early design sketch by [TODO: team] to describe the problem below and solicit -feedback on the proposed solution. It has not been approved to ship in Chrome. - -TODO: Fill in the whole explainer template below using https://tag.w3.org/explainers/ as a -reference. Look for [brackets]. - -## Proponents - -- [Proponent team 1] -- [Proponent team 2] -- [etc.] - -## Participate -- https://github.com/explainers-by-googlers/[your-repository-name]/issues -- [Discussion forum] - -## Table of Contents [if the explainer is longer than one printed page] - - - - - -- [Introduction](#introduction) -- [Goals](#goals) -- [Non-goals](#non-goals) -- [User research](#user-research) -- [Use cases](#use-cases) - - [Use case 1](#use-case-1) - - [Use case 2](#use-case-2) -- [[Potential Solution]](#potential-solution) - - [How this solution would solve the use cases](#how-this-solution-would-solve-the-use-cases) - - [Use case 1](#use-case-1-1) - - [Use case 2](#use-case-2-1) -- [Detailed design discussion](#detailed-design-discussion) - - [[Tricky design choice #1]](#tricky-design-choice-1) - - [[Tricky design choice 2]](#tricky-design-choice-2) -- [Considered alternatives](#considered-alternatives) - - [[Alternative 1]](#alternative-1) - - [[Alternative 2]](#alternative-2) -- [Stakeholder Feedback / Opposition](#stakeholder-feedback--opposition) -- [References & acknowledgements](#references--acknowledgements) - - - -## Introduction - -[The "executive summary" or "abstract". -Explain in a few sentences what the goals of the project are, -and a brief overview of how the solution works. -This should be no more than 1-2 paragraphs.] +# Explainer for the Web Translation API + +_This proposal is an early design sketch by the Chrome translate API team to describe the problem below and solicit feedback on the proposed solution. It has not been approved to ship in Chrome._ + +Browsers are increasingly offering language translation to their users. Such translation capabilities can also be useful to web developers. This is especially the case when browser's built-in translation abilities cannot help, such as: + +* translating user input or other interactive features; +* pages with complicated DOMs which trip up browser translation; +* providing in-page UI to start the translation; or +* translating content that is not in the DOM, e.g. spoken content. + +To perform translation in such cases, web sites currently have to either call out to cloud APIs, or bring their own translation models and run them using technologies like WebAssembly and WebGPU. This proposal introduces a new JavaScript API for exposing a browser's existing language translation abilities to web pages, so that if present, they can serve as a simpler and less resource-intensive alternative. ## Goals -[What is the **end-user need** which this project aims to address? Make this section short, and -elaborate in the Use cases section.] +Our goals are to: -## Non-goals +* Help web developers perform real-time translations (e.g. of user input). +* Help web developers perform real-time language detection. +* Guide web developers to gracefully handle failure cases, e.g. translation not being available or possible. +* Harmonize well with existing browser and OS translation technology ([Brave](https://support.brave.com/hc/en-us/articles/8963107404813-How-do-I-use-Brave-Translate), [Chrome](https://support.google.com/chrome/answer/173424?hl=en&co=GENIE.Platform%3DDesktop#zippy=%2Ctranslate-selected-text), [Edge](https://support.microsoft.com/en-us/topic/use-microsoft-translator-in-microsoft-edge-browser-4ad1c6cb-01a4-4227-be9d-a81e127fcb0b), [Firefox](https://support.mozilla.org/en-US/kb/website-translation), [Safari](https://9to5mac.com/2020/12/04/how-to-translate-websites-with-safari-mac/)), e.g. by allowing on-the-fly downloading of different languages instead of assuming all are present from the start. +* Allow a variety of implementation strategies, including on-device vs. cloud-based translation, while keeping these details abstracted from developers. -[If there are "adjacent" goals which may appear to be in scope but aren't, -enumerate them here. This section may be fleshed out as your design progresses and you encounter necessary technical and other trade-offs.] +The following are explicit non-goals: -## User research +* We do not intend to force every browser to ship language packs for every language combination, or even to support translation at all. It would be conforming to implement this API by always returning `"no"` from `canTranslate()`, or to implement this API entirely by using cloud services instead of on-device translation. +* We do not intend to provide guarantees of translation quality, stability, or interoperability between browsers. These are left as quality-of-implementation issues, similar to the [shape detection API](https://wicg.github.io/shape-detection-api/). (See also a [discussion of interop](https://www.w3.org/reports/ai-web-impact/#interop) in the W3C "AI & the Web" document.) -[If any user research has been conducted to inform your design choices, -discuss the process and findings. User research should be more common than it is.] +The following are potential goals we are not yet certain of: -## Use cases +* Allow web developers to know whether translations are done on-device or using cloud services. This would allow them to guarantee that any user data they feed into this API does not leave the device, which can be important for privacy purposes. (Similarly, we might want to allow developers to request on-device-only translation, in case a browser offers both varieties.) +* Allow web developers to know some identifier for the translation model in use, separate from the browser version. This would allow them to allowlist or blocklist specific models to maintain a desired level of quality. -[Describe in detail what problems end-users are facing, which this project is trying to solve. A -common mistake in this section is to take a web developer's or server operator's perspective, which -makes reviewers worry that the proposal will violate [RFC 8890, The Internet is for End -Users](https://www.rfc-editor.org/rfc/rfc8890).] +Both of these potential goals are potentially detrimental to interoperability, so we want to investigate more how important such functionality is to developers to find the right tradeoff. -### Use case 1 +## Examples -### Use case 2 +Note that in this API, languages are represented as [BCP 47](https://www.rfc-editor.org/info/bcp47) language tags, as already used by the existing JavaScript `Intl` API or the HTML `lang=""` attribute. Examples: `"ja"`, `"en"`, `"de-AT"`, `"zh-Hans-CN"`. - +See [below](#language-tag-handling) for more on the details of how language tags are handled in this API, and the [appendix](#appendix-converting-between-language-tags-and-human-readable-strings) for some helper code that converts between language tags and human-readable strings. -## [Potential Solution] +### For a known source language -[For each related element of the proposed solution - be it an additional JS method, a new object, a new element, a new concept etc., create a section which briefly describes it.] +If the source language is known, using the API looks like so: ```js -// Provide example code - not IDL - demonstrating the design of the feature. +const canTranslate = await translation.canTranslate({ + sourceLanguage: "en", + targetLanguage: "ja" +}); + +if (canTranslate !== "no") { + const translator = await translation.createTranslator({ + sourceLanguage: "en", + targetLanguage: "ja" + }); + + console.assert(translator.sourceLanguage === "en"); + console.assert(translator.targetLanguage === "ja"); + + const text = await translator.translate("Hello, world!"); + const readableStreamOfText = await translator.translateStreaming(` + Four score and seven years ago our fathers brought forth, upon this...`); +} else { + // Use alternate methods +} +``` + +### For an unknown source language -// If this API can be used on its own to address a user need, -// link it back to one of the scenarios in the goals section. +If the source language is unknown, the same APIs can be called without the `sourceLanguage` option. The return type of the resulting translator object's `translate()` and `translateStreaming()` methods will change to include the best-guess at the detected language, and a confidence level between 0 and 1: -// If you need to show how to get the feature set up -// (initialized, or using permissions, etc.), include that too. +```js +const canTranslate = await translation.canTranslate({ targetLanguage: "ja" }); + +if (canTranslate !== "no") { + const translator = await translation.createTranslator({ targetLanguage: "ja" }); + + console.assert(translator.sourceLanguage === null); + console.assert(translator.targetLanguage === "ja"); + + const { + detectedLanguage, + confidence, + result + } = await translator.translate(someUserText); + + // result is a ReadableStream + const { + detectedLanguage, + confidence, + result + } = await translator.translateStreaming(longerUserText); +} +``` + +If the language cannot be detected, then the return value will be `{ detectedLanguage: null, confidence: 0, result: null }`. + +### Downloading new languages + +In the above examples, we're always testing if the `canTranslate()` method returns something other than `"no"`. Why isn't it a simple boolean? The answer is because the return value can be one of three possibilities: + +* `"no"`: it is not possible for this browser to translate as requested +* `"readily"`: the browser can readily translate as requested +* `"after-download"`: the browser can perform the requested translation, but only after it downloads appropriate material. + +To see how to use this, consider an expansion of the above example: + +```js +const canTranslate = await translation.canTranslate({ targetLanguage: "is" }); + +if (canTranslate === "readily") { + const translator = await translation.createTranslator({ targetLanguage: "is" }); + doTheTranslation(translator); +} else if (canTranslate === "after-download") { + // Since we're in the "after-download" case, creating a translator will start + // downloading the necessary language pack. + const translator = await translation.createTranslator({ targetLanguage: "is" }); + + translator.ondownloadprogress = progressEvent => { + updateDownloadProgressBar(progressEvent.loaded, progressEvent.total); + }; + await translator.ready; + removeDownloadProgressBar(); + + doTheTranslation(translator); +} else { + // Use alternate methods +} ``` -[Where necessary, provide links to longer explanations of the relevant pre-existing concepts and API. -If there is no suitable external documentation, you might like to provide supplementary information as an appendix in this document, and provide an internal link where appropriate.] +Note that `await translator.ready` is not necessary; if it's omitted, calls to `translator.translate()` or `translator.translateStreaming()` will just take longer to fulfill (or reject). But it can be convenient. -[If this is already specced, link to the relevant section of the spec.] +If the download fails, then `downloadprogress` events will stop being emitted, and the `ready` promise will be rejected with a "`NetworkError`" `DOMException`. Additionally, any calls to `translator`'s methods will reject with the same error. -[If spec work is in progress, link to the PR or draft of the spec.] +### Language detection -[If you have more potential solutions in mind, add ## Potential Solution 2, 3, etc. sections.] +Apart from translating between languages, the API can offer the ability to detect the language of text, with confidence levels. -### How this solution would solve the use cases +```js +if (await translation.canDetect() !== "no") { + const detector = await translation.createDetector(); + + const results = await detector.detect("Hello, world!"); + for (const result of results) { + console.log(result.detectedLanguage, result.confidence); + } +} +``` -[If there are a suite of interacting APIs, show how they work together to solve the use cases described.] +If no language can be detected with reasonable confidence, this API returns an empty array. -#### Use case 1 +### Listing supported languages -[Description of the end-user scenario] +To get a list of languages which the current browser can translate, we can use the following code: ```js -// Sample code demonstrating how to use these APIs to address that scenario. +for (const { language, availability } of await translation.supportedLanguages()) { + let text = languageTagToHumanReadable(lang, "en"); // see appendix + if (availibility === "after-download") { + text += "*"; + } + + languageDropdown.append(new Option(text, language)); +} ``` -#### Use case 2 +Here `availability` is either `"after-download"` or `"readily"`. + +## Detailed design + +### Full API surface in Web IDL + +```webidl +[Exposed=(Window,Worker)] +interface Translation { + Promise canTranslate(TranslationLanguageOptions options); + Promise createTranslator(TranslationLanguageOptions options); + + Promise canDetect(); + Promise createDetector(); + + Promise>> supportedLanguages(); +}; + +[Exposed=(Window,Worker)] +interface LanguageTranslator : EventTarget { + readonly attribute Promise ready; + attribute EventHandler ondownloadprogress; + + readonly attribute DOMString? sourceLanguage; + readonly attribute DOMString targetLanguage; + + Promise<(DOMString or ResultWithLanguageDetection)> translate(DOMString input); + Promise<(ReadableStream or StreamingResultWithLanguageDetection)> translateStreaming(DOMString input); +}; + +[Exposed=(Window,Worker)] +interface LanguageDetector : EventTarget { + readonly attribute Promise ready; + attribute EventHandler ondownloadprogress; + + Promise> detect(DOMString input); +}; + +partial interface WindowOrWorkerGlobalScope { + readonly attribute Translation translation; +}; + +enum TranslationAvailability { "readily", "after-download", "no" }; + +dictionary TranslationLanguageOptions { + required DOMString targetLanguage; + DOMString sourceLanguage; +}; -[etc.] +dictionary AvailableLanguage { + DOMString language; + TranslationAvailability availability; +}; -## Detailed design discussion +dictionary LanguageDetectionResult { + DOMString? detectedLanguage; + double confidence; +}; -### [Tricky design choice #1] +dictionary ResultWithLanguageDetection : LanguageDetectionResult { + DOMString? result; +}; -[Talk through the tradeoffs in coming to the specific design point you want to make.] +dictionary StreamingResultWithLanguageDetection : LanguageDetectionResult { + ReadableStream? result; +}; +``` + +### Language tag handling + +If a browser supports translating from `ja` to `en`, does it also support translating from `ja` to `en-US`? What about `en-GB`? What about the (discouraged, but valid) `en-Latn`, i.e. English written in the usual Latin script? But translation to `en-Brai`, English written in the Braille script, is different entirely. + +Tentatively, pending consultation with internationalization and translation API experts, we propose the following model. Each user agent has a list of (language tag, availability) pairs, which is the same one returned by `translation.supportedLanguages()`. Only exact matches for entries in that list will be used for the API. + +So for example, consider a browser which supports `en`, `zh-Hans`, and `zh-Hant`. Then we would have the following results: ```js -// Illustrated with example code. +await translator.canTranslate({ targetLanguage: "en" }); // true +await translator.canTranslate({ targetLanguage: "en-US" }); // false + +await translator.canTranslate({ targetLanguage: "zh-Hans" }); // true +await translator.canTranslate({ targetLanguage: "zh" }); // false ``` -[This may be an open question, -in which case you should link to any active discussion threads.] +To improve interoperability and best meet developer expectations, we can mandate in the specification that browsers follow the best practices outlined in BCP 47, especially around [extended language subtags](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.1.2), such as: + +* always returning canonical forms instead of aliases; +* correctly distinguishing between script support (e.g. `zh-Hant`) from country support (e.g. `zh-TW`); and +* avoiding including redundant script information (e.g. `en-Latn`). + +### Downloading + +The current design envisions that the following operations will _not_ cause downloads of language packs or other material like a language detection model: + +* `translation.canTranslate()` +* `translation.canDetect()` +* `translation.supportedLanguages()` + +The following _can_ cause downloads. In all cases, whether or not a call will initiate a download can be detected beforehand by checking the return value of the corresponding `canXYZ()` call. + +* `translation.createTranslator()` +* `translation.createDetector()` + +After a developer has a `LanguageTranslator` or `LanguageDetector` object created by these methods, further calls are not expected to cause any downloads. (Although they might require internet access, if the implementation is not entirely on-device.) + +## Privacy considerations + +This proposal as-is has privacy issues, which we are actively thinking about how to address. They are all centered around how sites that use this API might be able to uniquely fingerprint the user. -### [Tricky design choice 2] +The most obvious identifier in the current API design is the list of supported languages, and especially their availability status (`"no"`, `"readily"`, or `"after-download"`). For example, as of the time of this writing [Firefox supports 9 languages](https://www.mozilla.org/en-US/firefox/features/translate/), which can each be [independently downloaded](https://support.mozilla.org/en-US/kb/website-translation#w_configure-installed-languages). With a naive implementation, this gives 9 bits of identifying information, which various sites can all correlate. -[etc.] +Some sort of mitigation may be necessary here. We believe this is adjacent to other areas that have seen similar mitigation, such as the [Local Font Access API](https://github.com/WICG/local-font-access/blob/main/README.md). Possible techniques are: -## Considered alternatives +* Grouping language packs to reduce the number of bits, so that downloading one language also downloads others in its group. +* Partitioning download status by top-level site, introducing a fake download (which takes time but does not actually download anything) for the second-onward site to download a language pack. +* Only exposing a fixed set of languages to this API, e.g. based on the user's locale. -[This should include as many alternatives as you can, -from high level architectural decisions down to alternative naming choices.] +Another way in which this API might enhance the web's fingerprinting surface is if translation and language detection models are updated separately from browser versions. In that case, differing results from different versions of the model provide additional fingerprinting bits beyond those already provided by the browser's major version number. Mandating that older browser versions not receive updates or be able to download models from too far into the future might be a possible remediation for this. -### [Alternative 1] +## Alternatives considered and under consideration -[Describe an alternative which was considered, -and why you decided against it.] +### Streaming input support -### [Alternative 2] +Although the API contains support for streaming output of a translation, via the `translateStreaming()` API, it doesn't support streaming input. Should it? -[etc.] +We believe it should not, for now. In general, translation works best with more context; feeding more input into the system over time can produce very different results. For example, translating "彼女の話を聞いて、驚いた" to English would give "I was surprised to hear her story". But if you then streamed in another chunk, so that the full sentence was "彼女の話を聞いて、驚いたねこが逃げた", the result changes completely to "Upon hearing her story, the surprised cat ran away." This doesn't fit well with how streaming APIs behave generally. -## Stakeholder Feedback / Opposition +In other words, even if web developers are receiving a stream of input (e.g. over the network or from the user), they need to take special care in how they present such updating-over-time translations to the user. We shouldn't treat this as a usual stream-to-string or stream-to-stream API, because that will rarely be useful. -[Implementors and other stakeholders may already have publicly stated positions on this work. If you can, list them here with links to evidence as appropriate.] +That said, we are aware of [research](https://arxiv.org/abs/2005.08595) on translation algorithms which are specialized for this kind of setting, and attempt to mitigate the above problem. It's possible we might want to support this sort of API in the future, if implementations are excited about implementing that research. This should be possible to fit into the existing API surface, possibly with some extra feature-detection API. -- [Implementor A] : Positive -- [Stakeholder B] : No signals -- [Implementor C] : Negative +### Flattening the API and reducing async steps -[If appropriate, explain the reasons given by other implementors for their concerns.] +The current design requires multiple async steps to do useful things: -## References & acknowledgements +```js +const translator = await translation.createTranslator(options); +const text = await translator.translate(sourceText); + +const detector = await translation.createDetector(); +const results = await detector.detect(sourceText); +``` + +Should we simplify these down with convenience APIs that do both steps at once? + +We're open to this idea, but we think the existing complexity is necessary to support the design wherein translation and language detection models might not be already downloaded. By separating the two stages, we allow web developers to perform the initial creation-and-possibly-downloading steps early in their page's lifecycle, in preparation for later, hopefully-quick calls to APIs like `translate()`. + +Another possible simplification is to make some of the more informational APIs, namely `canTranslate()`, `canDetect()`, and `supportedLanguages()`, synchronous instead of asynchronous. This would be implementable by having the browser proactively load the information about supported languages into the main thread's process, upon creation of the global object. We think this is not worthwhile, though, as it imposes a non-negligible cost on all global object creation. + +### Separating language detection and translation + +As discussed in [For an unknown source language](#for-an-unknown-source-language), we support performing both language detection and translation to the best-guess language at the same time, in one API. This slightly complicates the `translate()` and `translateStreaming()` APIs, by giving them polymorphic return types. -[Your design will change and be informed by many people; acknowledge them in an ongoing way! It helps build community and, as we only get by through the contributions of many, is only fair.] +We could instead require that developers always supply a `sourceLanguage`, and if they want to detect it ahead of time, they could use the `detect()` API. -[Unless you have a specific reason not to, these should be in alphabetical order.] +We're open to this simplification, but suspect it would be worse for efficiency, as it bakes in a requirement of multiple traversals over the input text, mediated by JavaScript code. We plan to investigate whether multiple traversals over the input are necessary anyway according to the latest research, in which case this simplification would probably be preferable. -Many thanks for valuable feedback and advice from: +## Stakeholder feedback -- [Person 1] -- [Person 2] -- [etc.] +* W3C TAG: to be requested +* Browser engines: + * Chromium: prototyping behind a flag + * Gecko: to be requested + * WebKit: to be requested +* Web developers: Chrome has received private enthusiasm for such an API, and is working on gathering public evidence of such enthusiasm. + +## Appendix: converting between language tags and human-readable strings + +This code already works today and is not new to this API proposal. It is likely useful in conjunction with this API, for example when building user interfaces. + +```js +function languageTagToHumanReadable(languageTag, targetLanguage) { + const displayNames = new Intl.DisplayNames([targetLanguage], { type: "language" }); + return displayNames.of(languageTag); +} + +languageTagToHumanReadable("ja", "en"); // "Japanese" +languageTagToHumanReadable("zh", "en"); // "Chinese" +languageTagToHumanReadable("zh-Hant", "en"); // "Traditional Chinese" +languageTagToHumanReadable("zh-TW", "en"); // "Chinese (Taiwan)" + +languageTagToHumanReadable("en", "ja"); // "英語" +``` diff --git a/index.bs b/index.bs index 2d8010b..1b370be 100644 --- a/index.bs +++ b/index.bs @@ -1,28 +1,21 @@ Introduction {#intro} ===================== For now, see the [explainer]([REPOSITORYURL]). - -See [https://garykac.github.io/procspec/](https://garykac.github.io/procspec/), -[https://dlaliberte.github.io/bikeshed-intro/index.html](https://dlaliberte.github.io/bikeshed-intro/index.html), -and [https://speced.github.io/bikeshed/](https://speced.github.io/bikeshed/) to get started on your -specificaton.