Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More intelligent removing of language variants #207

Closed
pavel-karatsiuba opened this issue Mar 4, 2023 · 1 comment · Fixed by #211
Closed

More intelligent removing of language variants #207

pavel-karatsiuba opened this issue Mar 4, 2023 · 1 comment · Fixed by #211
Assignees
Milestone

Comments

@pavel-karatsiuba
Copy link
Collaborator

Right now with the parameter --withoutLanguageVariants script is not created zim for languages with variants. This is not good because with this parameter we will never see for example simulations for the Chinese language.

For languages with variants, I propose to use language with the biggest count of simulations. For example, in the result, we have 3 Chinese zim files: Chinese (Hong Kong) contains 33 simulations, Chinese (Simplified) contains 221 simulations and Chinese (Traditional) contains 223 simulations.
My proposition is not to remove all 3 but at least leave Chinese (Traditional) which contains the biggest count of simulations and rename it to cn without language variants addition.
The next list is shown languages with variants and count of simulations. Also, a list containing the same language but without variants. So which parameter all languages with arguments will be removed.

"ar": {
    "name": "Arabic",
    "count": 194
},
"ar_MA": {  
    "name": "Arabic (Morocco)",
    "count": 96
},
"ar_SA": {
    "name": "Arabic (Saudi Arabia)",
    "count": 119
},
"zh_HK": {
    "name": "Chinese (Hong Kong)",
    "count": 33
},
"zh_CN": {
    "name": "Chinese (Simplified)",
    "count": 221
},
"zh_TW": {
    "name": "Chinese (Traditional)",
    "count": 223
},
"en": {
    "name": "English",
    "count": 224
},
"en_CA": {
    "name": "English (Canada)",
    "count": 1
},
"en_GB": {
    "name": "English (United Kingdom)",
    "count": 1
},
"ku": {
    "name": "Kurdish",
    "count": 41
},
"ku_TR": {
    "name": "Kurdish (Turkey)",
    "count": 63
},
"pt": {
    "name": "Portuguese",
    "count": 170
},
"pt_BR": {
    "name": "Portuguese (Brazil)",
    "count": 224
},
"es": {
    "name": "Spanish",
    "count": 224
},
"es_CO": {
    "name": "Spanish (Colombia)",
    "count": 17
},
"es_MX": {
    "name": "Spanish (Mexico)",
    "count": 208
},
"es_PE": {
    "name": "Spanish (Peru)",
    "count": 224
},
"es_ES": {
    "name": "Spanish (Spain)",
    "count": 48
},`
@kelson42
Copy link
Contributor

kelson42 commented Mar 5, 2023

@pavel-karatsiuba Agree. But please harcode this (basically zh_CN -> zh), so if for some reason we want to not follow that rule for a specific language, then we can easily change it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants