Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ko: refresh all messages #1058

Closed
wants to merge 1 commit into from
Closed

ko: refresh all messages #1058

wants to merge 1 commit into from

Conversation

mgeisler
Copy link
Collaborator

@mgeisler mgeisler commented Aug 9, 2023

This refreshes all messages with the latest messages from the course:

MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot"}}' mdbook build -d po
msgmerge --update po/ko.po po/messages.pot

Part of #925.

This refreshes all messages with the latest messages from the course:

    MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot"}}' mdbook build -d po
    msgmerge --update po/ko.po po/messages.pot

Part of #925.
@mgeisler
Copy link
Collaborator Author

mgeisler commented Aug 9, 2023

After this PR, there are new fuzzy messages:

% msgfmt -o /dev/null --statistics po/ko.po
1328 translated messages, 225 fuzzy translations, 217 untranslated messages.

We should fix those in smaller follow-up PRs.

@mgeisler mgeisler enabled auto-merge (squash) August 9, 2023 13:25
@mgeisler
Copy link
Collaborator Author

mgeisler commented Aug 9, 2023

@jiyongp and @jooyunghan, would it be useful if I setup a job which does this every week or every two weeks?

@jiyongp
Copy link
Collaborator

jiyongp commented Aug 9, 2023

@jiyongp and @jooyunghan, would it be useful if I setup a job which does this every week or every two weeks?

Every two weeks, or perhaps monthly?

@mgeisler
Copy link
Collaborator Author

mgeisler commented Aug 9, 2023

@jiyongp and @jooyunghan, would it be useful if I setup a job which does this every week or every two weeks?

Every two weeks, or perhaps monthly?

That could be a good frequency as well!

In general, running msgmerge simply surfaces what has already happened: some messages on course-structure.html are in English right now because they've changed since we ran msgmerge last. We don't see that right now when looking at ko.po since the file hasn't been refreshed recently. So the question is if it's useful for you as translators if we do an automatic refresh once in a while?

@jiyongp
Copy link
Collaborator

jiyongp commented Aug 9, 2023

re: automatic refresh

Well, if we can avoid that, I'd rather want to avoid that and do the manual monthly (but could be longer or shorter depending on my bandwidth) refreshes, as I think a slightly out-dated doc is better than an up-to-date doc where English and Korean are mixed.

FYI: I was in the process of fixing the 200+ fuzzy translations first. When I am done, I will submit the change. Then adding the missing translations will be done in a separate PR.

@mgeisler
Copy link
Collaborator Author

mgeisler commented Aug 9, 2023

Well, if we can avoid that, I'd rather want to avoid that and do the manual monthly (but could be longer or shorter depending on my bandwidth) refreshes, as I think a slightly out-dated doc is better than an up-to-date doc where English and Korean are mixed.

The infrastructure does not support this right now: we publish all languages using the up-to-date English Markdown files. We could of course change this and this is the topic of google/mdbook-i18n-helpers#16.

FYI: I was in the process of fixing the 200+ fuzzy translations first. When I am done, I will submit the change. Then adding the missing translations will be done in a separate PR.

I'm happy to drop the PR here — it was auto-generated and has no special value to me. I only put it up to get the PO file aligned with the current texts in case that was easier for people to work with.

However, when fixing the 200+ fuzzy messages, I still suggest doing this in two steps as well: one with the auto-generated update (essentially this PR) and one or more PRs which removes the fuzzy markers and translate new messages.

@jiyongp
Copy link
Collaborator

jiyongp commented Aug 10, 2023

Publishing all translations at the same time doesn't sound right. Why can't we publish each language independently from each other?

Just to clarify: does the below process work?

  1. msgmerge --update po/xx.po po/messages.pot. Make this as a commit.
  2. edit the po file to remove fuzzy markers. Make this as a new commit.
  3. continue editing the po file to add translations for new messages. Make this as yet another commit.
  4. Send a PR containing the three commits.

The step 1 is triggered manually by a human translator (ex: me). Correct?

@mgeisler
Copy link
Collaborator Author

Publishing all translations at the same time doesn't sound right. Why can't we publish each language independently from each other?

We could do this, it would just complicate the publishing pipeline. GitHub pages are published by essentially uploading a zip file. We currently build this zip file using what we find in main at the time of publication.

To publish older versions of the translation, we would likely git checkout the commit where the translation was most recently touched and generate the HTML using those sources. The publish.yml file driving this would still be from the current main and we would still using mdbook, mdbook-i18n-helpers, and mdbook-exerciser from current main — we would have to rely on those being able to deal with the old PO files. That's the kind of complexity I'm thinking of. Alternatively, we could generate a release tarball per language and upload that to GitHub. We would then download these frozen tarballs at publication time. That solves the problem of new tools having to stay compatible with old sources — and introduces a problem of maintainability: if I fix a bug in book.js, I won't have an easy way to rebuild the old release artifacts with the new JavaScript code.

Just to clarify: does the below process work?

  1. msgmerge --update po/xx.po po/messages.pot. Make this as a commit.
  2. edit the po file to remove fuzzy markers. Make this as a new commit.
  3. continue editing the po file to add translations for new messages. Make this as yet another commit.
  4. Send a PR containing the three commits.

The step 1 is triggered manually by a human translator (ex: me). Correct?

Yes, that is the correct workflow. Should we document this better in TRANSLATIONS.md?

In step 2, removing the fuzzy markers means "look at the English text and update the translation to match". In the ideal case, the changes will be small and so the diff from msgmerge will be helpful! The diff will show something like

+#, fuzzy
 msgid ""
- "The course takes four days"
+ "The course takes three days"

and thus immediately tell you what to fix in the translation. This is argument for letting translators do the updating (instead of a cron job).

However, this tactic only works if a) the translations are complete or nearly complete and b) the translators run this regularly. The PR here is showing the problematic case where a ton of updates go in at once and then the diff is no longer useful to anybody.

@jiyongp
Copy link
Collaborator

jiyongp commented Aug 10, 2023

Sorry, I don't understand. Let's assume po/ko.po has no fuzzy or missing translation, but was last updated 1 month ago. Since then the English pages have been updated and you want to publish it. You build a release. That release has up-to-date English, and (still) 1 month old Korean. To the Korean readers, there will be no change.

Later on, I follow the four steps and update po/ko.po. No fuzzy or missing tr. You build another release. This time, both English and Korean are up to date.

Can this be done?

@mgeisler
Copy link
Collaborator Author

Sorry, I don't understand. Let's assume po/ko.po has no fuzzy or missing translation, but was last updated 1 month ago. Since then the English pages have been updated and you want to publish it. You build a release. That release has up-to-date English, and (still) 1 month old Korean. To the Korean readers, there will be no change.

I think the missing piece here is that the Korean translation (the ko.po file) doesn't exist without the Markdown files. The translation system takes the (English) Markdown files as the starting point. Each paragraph is then looked up in the PO file and replaced with the translation. But the PO file by its own is not enough.

To do what you suggest, we need to archive the translations when we publish them. We need the archive to be able to publish the site since the publish action overwrites any existing content previously published.

If we automatically publish a comprehensive-rust-XX.zip file when the translation changes, then the publication step would become something like this:

mdbook build  # gives us the English html
for lang in ko pt-BR; do
    wget https://github.com/google/comprehensive-rust/releases/latest/download/comprehensive-rust-$lang.zip
done

That would populate the working directory with the new English HTML and the HTML from all the translations (a separate GitHub action would generate these zip files).

The actual publication to GitHub Pages can then happen afterwards.

@jiyongp
Copy link
Collaborator

jiyongp commented Aug 10, 2023

Aha, that's what I was missing. The markdown is like an app, and these po files are localized resources that the app uses. So, we can't build an up-to-date app showing English and 1 month-old app showing Korean at the same time.

It however is a bit odd since in our case the po files have most (or all?) information. Having to depend on the markdown feels unfortunate.

I feel like the rendered HTML pages in the translated languages should be stored in this git project (i.e. the final step of the translation is to generate the pages). Then the act of releasing the language will be just copying the HTML pages to the web server.

@mgeisler
Copy link
Collaborator Author

Aha, that's what I was missing. The markdown is like an app, and these po files are localized resources that the app uses. So, we can't build an up-to-date app showing English and 1 month-old app showing Korean at the same time.

Yes, that's the right analogy!

It however is a bit odd since in our case the po files have most (or all?) information. Having to depend on the markdown feels unfortunate.

You are correct: today, the PO files happen to be loss-less. However, I'm working on removing most of the Markdown from the files. I have an example in this mdbook-xgettext test. Notice how we can translate

Types Literals
Arrays [T; N] [20, 30, 40]
Tuples (), ... (), ('x',)

into

TYPES LITERALS
ARRAYS [T; N] [20, 30, 40]
TUPLES (), ... (), ('x',)

without having any | or - in the catalog (the PO file). I feel that will be a nice step forward and make the translations more maintainable.

I feel like the rendered HTML pages in the translated languages should be stored in this git project (i.e. the final step of the translation is to generate the pages). Then the act of releasing the language will be just copying the HTML pages to the web server.

Precisely, I think we can do exactly that. I think many people are surprised about the current system and trading stale information for a more complete translation seems good. We write write on each page when it was last updated and we could possible put a bit of JavaScript code on the page to detect if there English page is newer.

@mgeisler
Copy link
Collaborator Author

Let me close this for now since the update will happen together with the new translations.

@mgeisler mgeisler closed this Aug 11, 2023
auto-merge was automatically disabled August 11, 2023 14:29

Pull request was closed

@mgeisler mgeisler deleted the ko-refresh branch August 24, 2023 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants