Skip to content

Commit

Permalink
Improve update-locales script and fix locale processing bug (#23240)
Browse files Browse the repository at this point in the history
The locales of Gitea has been broken for long time, till now, it's still
not fully fixed.

One of the root problems is that the `ini` library is quite quirky and
the `update-locales` script doesn't work well for all cases.


This PR fixes the `update-locales` script to make it satisfy `ini`
library and the crowdin.

See the comments for more details.

The `locale_zh-CN.ini` is an example, it comes from crowdin and is
processed by the new `update-locales.sh`. Especially see the `feed_of`:
https://github.com/go-gitea/gitea/pull/23240/files#diff-321f6ca4eae1096eba230e93c4740f9903708afe8d79cf2e57f4299786c4528bR268
  • Loading branch information
wxiaoguang authored Mar 2, 2023
1 parent ce73492 commit d72462d
Show file tree
Hide file tree
Showing 2 changed files with 225 additions and 13 deletions.
45 changes: 40 additions & 5 deletions build/update-locales.sh
Original file line number Diff line number Diff line change
@@ -1,14 +1,49 @@
#!/bin/sh
#!/bin/bash

set -e

SED=sed

if [[ $OSTYPE == 'darwin'* ]]; then
# for macOS developers, use "brew install gnu-sed"
SED=gsed
fi

if [ ! -f ./options/locale/locale_en-US.ini ]; then
echo "please run this script in the root directory of the project"
exit 1
fi

mv ./options/locale/locale_en-US.ini ./options/

# Make sure to only change lines that have the translation enclosed between quotes
sed -i -r -e '/^[a-zA-Z0-9_.-]+[ ]*=[ ]*".*"$/ {
s/^([a-zA-Z0-9_.-]+)[ ]*="/\1=/
s/\\"/"/g
# the "ini" library for locale has many quirks
# * `a="xx"` gets `xx` (no quote)
# * `a=x\"y` gets `x\"y` (no unescaping)
# * `a="x\"y"` gets `"x\"y"` (no unescaping, the quotes are still there)
# * `a='x\"y'` gets `x\"y` (no unescaping, no quote)
# * `a="foo` gets `"foo` (although the quote is not closed)
# * 'a=`foo`' works like single-quote
# crowdin needs the strings to be quoted correctly and doesn't like incomplete quotes
# crowdin always outputs quoted strings if there are quotes in the strings.

# this script helps to unquote the crowdin outputs for the quirky ini library
# * find all `key="...\"..."` lines
# * remove the leading quote
# * remove the trailing quote
# * unescape the quotes
# * eg: key="...\"..." => key=..."...
$SED -i -r -e '/^[-.A-Za-z0-9_]+[ ]*=[ ]*".*"$/ {
s/^([-.A-Za-z0-9_]+)[ ]*=[ ]*"/\1=/
s/"$//
s/\\"/"/g
}' ./options/locale/*.ini

# * if the escaped line is incomplete like `key="...` or `key=..."`, quote it with backticks
# * eg: key="... => key=`"...`
# * eg: key=..." => key=`..."`
$SED -i -r -e 's/^([-.A-Za-z0-9_]+)[ ]*=[ ]*(".*[^"])$/\1=`\2`/' ./options/locale/*.ini
$SED -i -r -e 's/^([-.A-Za-z0-9_]+)[ ]*=[ ]*([^"].*")$/\1=`\2`/' ./options/locale/*.ini

# Remove translation under 25% of en_us
baselines=$(wc -l "./options/locale_en-US.ini" | cut -d" " -f1)
baselines=$((baselines / 4))
Expand Down
Loading

0 comments on commit d72462d

Please sign in to comment.