More intelligent removing of language variants #207

pavel-karatsiuba · 2023-03-04T17:06:52Z

Right now with the parameter --withoutLanguageVariants script is not created zim for languages with variants. This is not good because with this parameter we will never see for example simulations for the Chinese language.

For languages with variants, I propose to use language with the biggest count of simulations. For example, in the result, we have 3 Chinese zim files: Chinese (Hong Kong) contains 33 simulations, Chinese (Simplified) contains 221 simulations and Chinese (Traditional) contains 223 simulations.
My proposition is not to remove all 3 but at least leave Chinese (Traditional) which contains the biggest count of simulations and rename it to cn without language variants addition.
The next list is shown languages with variants and count of simulations. Also, a list containing the same language but without variants. So which parameter all languages with arguments will be removed.

"ar": {
    "name": "Arabic",
    "count": 194
},
"ar_MA": {  
    "name": "Arabic (Morocco)",
    "count": 96
},
"ar_SA": {
    "name": "Arabic (Saudi Arabia)",
    "count": 119
},
"zh_HK": {
    "name": "Chinese (Hong Kong)",
    "count": 33
},
"zh_CN": {
    "name": "Chinese (Simplified)",
    "count": 221
},
"zh_TW": {
    "name": "Chinese (Traditional)",
    "count": 223
},
"en": {
    "name": "English",
    "count": 224
},
"en_CA": {
    "name": "English (Canada)",
    "count": 1
},
"en_GB": {
    "name": "English (United Kingdom)",
    "count": 1
},
"ku": {
    "name": "Kurdish",
    "count": 41
},
"ku_TR": {
    "name": "Kurdish (Turkey)",
    "count": 63
},
"pt": {
    "name": "Portuguese",
    "count": 170
},
"pt_BR": {
    "name": "Portuguese (Brazil)",
    "count": 224
},
"es": {
    "name": "Spanish",
    "count": 224
},
"es_CO": {
    "name": "Spanish (Colombia)",
    "count": 17
},
"es_MX": {
    "name": "Spanish (Mexico)",
    "count": 208
},
"es_PE": {
    "name": "Spanish (Peru)",
    "count": 224
},
"es_ES": {
    "name": "Spanish (Spain)",
    "count": 48
},`

The text was updated successfully, but these errors were encountered:

kelson42 · 2023-03-05T10:46:28Z

@pavel-karatsiuba Agree. But please harcode this (basically zh_CN -> zh), so if for some reason we want to not follow that rule for a specific language, then we can easily change it.

pavel-karatsiuba added the enhancement label Mar 4, 2023

pavel-karatsiuba mentioned this issue Mar 5, 2023

Duplicate phet ZIM files #204

Closed

kelson42 assigned pavel-karatsiuba Mar 5, 2023

kelson42 added this to the 2.4.1 milestone Mar 5, 2023

pavel-karatsiuba mentioned this issue Mar 10, 2023

Use language with variants with biggest count of simulations. other should be removed. #211

Merged

kelson42 closed this as completed in #211 Mar 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More intelligent removing of language variants #207

More intelligent removing of language variants #207

pavel-karatsiuba commented Mar 4, 2023

kelson42 commented Mar 5, 2023

More intelligent removing of language variants #207

More intelligent removing of language variants #207

Comments

pavel-karatsiuba commented Mar 4, 2023

kelson42 commented Mar 5, 2023