Skip to content

New Target Language Code

Joel Lonbeck edited this page Jun 30, 2016 · 47 revisions

NOTE: for the most part the questionnaire used to collect information about a new language is generic enough to be used for other things with little change to the implementation. This is by design in order to be as forward compatible as possible.

This page describes the processes around creating a new target language code in translationStudio. For an overview of the entire process please see the Google Doc.

Questionnaire API

When a new target language code is required users must first complete a questionnaire. The purpose of this questionnaire is twofold:

  • avoid needlessly creating new target language codes by suggesting to the user existing language codes
  • collect the information necessary for correctly identifying the represented target language.

The questionnaire itself will be available on the server via an api. However, the latest version will commonly exist in data bundled in the app.

###Endpoint http://td.unfoldingword.org/api/questionnaire/

You can edit this draft at http://www.jsoneditoronline.org/?id=27ff9243200aaee0552729c5ee05df5f.

Sample data:

{
  "languages": [
    {
      "name": "English",
      "dir": "ltr",
      "slug": "en",
      "questionnaire_id": 1,
      "data_fields": {
        "ln": 0,
        "cc": 4,
        "ld": 3
      },
      "language_data": {
        "ln": 0,
        "ld": 3
      },
      "questions": [
        {
          "id": 2,
          "text": "some question",
          "help": "this is some help text",
          "required": true,
          "input_type": "string|boolean|date",
          "sort": 2,
          "depends_on": 1
        }
      ]
    }
  ]
}

Field descriptions

  • id - DB primary key according to translationDatabase
  • text - Question string
  • help - Info that may be helpful to user (optional)
  • input_type - Type of user input interface
  • required - Indicate whether user can skip this question
  • sort - The order in which the question appears
  • depends_on - Id of the question that must be answered before this one

The data_field indicates certain properties that may be defined in the answer to a question. For example ln (language name) can be determined from the answer of question 0,

Language Code Generation

If a user completes the questionnaire without identifying the target language then a new temporary target language code will be generated for them. These language codes will use the qaa prefix followed by the private use code x e.g. qaa-x-. After the x we will add the unique code to represent the target language.

The algorithm for generating the language code is as follows:

  1. Retrieve the device id (UUID for mobile devices, MAC address for PC's)
  2. Retrieve the current system time in milliseconds
  3. Concat the device id with the current time
  4. calculate the sha1 value from of the previous step. (Note: the result is alphanumeric)
  5. take the first 6 characters of the sha1 value to be used in the temporary target language code.
  6. concat the code prefix and private use code with the generated language code and make the entire string lowercase e.g. qaa-x-[6 character code]

Example

device = "c48267e3-a035-4e0b-a634-4803cfe300af";
time = 1457372113338;
concatedString = device + time;
hash = sha1(concatedString);
code = hash.subString(0, 6);
newTargetLanguageCode = "qaa-x-" + code.toLowerCase();
print newTargetLanguageCode;
$ "qaa-x-47eaa8"

###Algorithm Background One constraint of new target language codes is they must be unique. In order to provide sufficiently unique codes we must use 6 characters in the private portion of the IETF code. Without going into the math, a 6 character code has less than 0.1% chance of encountering duplicate language codes. A 5 character code has greater than 1% chance. And a 4 character code has a 30% chance of encountering duplicates.

Using a 6 character code will allow the generation of statistically unique target language codes.

Recording Questionnaires

Once a questionnaire has been completed the following information should be collected and stored on the device within a single json file:

  • the answers to the questions
  • the generated target language code
  • a new UUID that will identify this particular request for a new target language code. e.g. the completed questionnaire and related information
  • the data_fields from the questionnaire so other devices can infer properties of the language from the answers.

submitted_at indicates the time when the request was submitted. Once submitted the request does not need to be submitted again. This value should only be set after the request has been successfully submitted to the server.

{
  "request_id": "0d41289b-736b-44d1-823e-37878026a876",
  "temp_code":"qaa-x-47eaa8",
  "data_fields": {
    "ln": 0,
    "cc": 4,
    "ld": 3
  },
  "questionnaire_id": 1,
  "app": "ts-android|ts-desktop|tr|td",
  "requester": "", -- name of the translator who completed the questionnaire
  "submitted_at": yyyymmddhhmmss,
  "answers": [
    {
      "question_id": 1,
      "text": "some answer"
    },
    ...
  ]
}

The temp language code will also be inserted into the temp_target_language table so the user can find it in the list of available languages next time they create a target translation.

Storing Questionnaires

Users may fill out the questionnaire multiple times. Therefore it is important that the device can hold several questionnaire results (language requests) at a time.

Storing language requests has two parts:

  1. A target translation that uses a new target language code must include the related language request in it's repository. That is to say, a copy will be placed into the target translation directory.
  2. Due to the previous condition language requests must also be stored independent of target translations so that they can be copied into new target translations as needed at a later date.

In the first case the language requests must be copied into the target translation directory as a file named new_language.json. A target translation can only ever have a single new_language.json.

In the second case the language requests must be saved to the data path (as appropriate by platform) in a folder named new_languages and as a file named with whatever language code was generated.

Example:

/my/data/path/new_languages/qaa-x-47eaa8.json

NOTE: in the future users may be allowed to change the target language of a translation. In this case the language requests will be removed from that target translation. They will not however be removed from the data path since other future translations may still use it.

Submitting Language Requests

Language requests that are located in the data path should be submitted at a minimum in the following cases:

  • When submitting a request to publish.
  • When backing up a target translation.
  • When checking for updates to the library.

It is allowed to also submit language requests while performing another action that requires the network, or to initiate a manual submission of requests.

The API endpoint will receive the new language information at which point administrators on the server will be able to verify the information.

When a questionnaire has been successfully submitted to the server it should be updated to indicate that is has been submitted by setting the value of submitted_at in the stored questionnaire response. Any target translation that uses that temporary language code should also have it's new_language.json updated in the same way.

If, at any time in the future, a target translation is imported who's language code matches one of the stored language codes it's new_language.json must also be updated as described above.

Endpoint

http://td.unfoldingword.org/api/questionnaire/

Cleaning up Language Requests

A new catalog http://td.unfoldingword.org/api/templanguages/assignment/changed/ will be introduced to the API that will dynamically map temporary language codes to the real ones. When an update comes down from the api the following operations will be performed:

  • all newly mapped language requests will be removed from the device data path.
  • all target translations using the old temporary language codes will be updated to use the new code
  • all language requests will be removed from the target translations mentioned in the previous step

Presenting Questionnaire to Users

  • Users will be presented with three questions at a time. The only exception is that questions which depend on another question will be displayed on the same page seperate from other questions.
  • Questions that depend on another question will be disabled until the dependency has been answered
  • Each time a page of questions have been answered the app will check if any of those questions matches one of the data fields. If one does then the app will attempt to identify and suggest which language the user is looking for.
  • If at any time the user accepts a suggested language the questionnaire will abort and the chosen language will be used.

##Importing Language Requests Language requests may be added to the device by importing a target translation from online, locally, or from a peer on the network. In all cases we'll need to check if the target translation contains a language request.

It is required that target translations undergo proper migration before an import occurs. Therefore language requests should be checked for at the end of the migration task.

If a target translation contains a new_language.json file a few checks need to be made:

  • has the language request already been approved?
  • YES - Migrate the target translation to the approved language code and we are done. If however, changing the language of the target translation would result in a conflict with another target translation, we do nothing and stop processing this language request.
  • NO - continue below
  • does the app already contain that language request in the data path?
  • NO - store the language request in the data path and we are done.
  • YES - continue below
  • Has the stored language request been submitted while the one in the target translation has not?
  • YES - mark the language request in the target translation as submitted and we are done.
  • NO - continue below
  • Has the language request in the target translation been submitted while the stored one has not?
  • YES - mark the stored language request as submitted and we are done.

When storing the language request in the data path the file should be copied into the data path and named with the temporary language code found inside the request. The temporary language should also be added to the db so the user can use it in other translations as well. In this way the new language requests may propagate to other devices for submission.