Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment context #36

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
f4134bb
Merge Pontoon integration
kaedroho Dec 17, 2019
dbfc9e7
Add some test for git repository writer
kaedroho Dec 18, 2019
d9417db
Add base class for extracted values
kaedroho Dec 18, 2019
b1a69c4
Don't extract ManyToOneRels that don't come from ParentalKey
kaedroho Dec 18, 2019
d0ba740
Merge branch 'segments-refactor' into versioning-base
kaedroho Jan 2, 2020
2ce3031
Merge remote-tracking branch 'origin/merge-pontoon' into versioning-base
kaedroho Jan 2, 2020
ba45280
Added generic version models
kaedroho Dec 16, 2019
4a01d17
New location models
kaedroho Dec 16, 2019
50eb212
Add migration to new location models
kaedroho Dec 18, 2019
10d1c9a
Pontoon: Rename revision => page_revison
kaedroho Dec 18, 2019
d336a54
Pontoon: Add new revision fields
kaedroho Dec 18, 2019
9aacee8
Remove paragraph indices from content paths
kaedroho Dec 3, 2019
fa91773
Pontoon: Update logic to use new revision fields
kaedroho Dec 18, 2019
461e04d
Pontoon: Remove old revision fields
kaedroho Dec 18, 2019
ae53759
Pontoon: Change PontoonResource primary key
kaedroho Dec 18, 2019
4b5ec73
Update translation memory utils to use new location models
kaedroho Dec 20, 2019
129d472
Delete old location models
kaedroho Dec 20, 2019
30b6e8e
Implement serialization of non-page objects into translatable revision
kaedroho Dec 20, 2019
8784c7d
Moved translation code into translation memory
kaedroho Dec 28, 2019
7f72808
Treat related objects as separate resources
kaedroho Dec 23, 2019
7492719
Update pofile translation engine to support related objects
kaedroho Jan 1, 2020
f4c38ce
Update google translate translation engine to support related objects
kaedroho Jan 2, 2020
3f46017
Merge branch 'versioning' into segment-context-base
kaedroho Jan 2, 2020
2fe3a4f
Merge branch 'content-path-stability' into segment-context-base
kaedroho Jan 2, 2020
f3163c4
Implement segment context
kaedroho Jan 2, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions docs/pontoon.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# wagtail-localize-pontoon

Use [Pontoon](https://pontoon.mozilla.org/) as a translation engine for [wagtail-localize](https://github.com/kaedroho/wagtail-localize).

**Note: This project will be merged into `wagtail-localize` soon.**

## Installation

Install both `wagtail-localize` and `wagtail-localize-pontoon`, then add the following to your `INSTALLED_APPS`:

```python
INSTALLED_APPS = [
...
'wagtail_localize',
'wagtail_localize.translation_memory',
'wagtail_localize_pontoon',
...
]
```

Then set the following settings:

`WAGTAILLOCALIZE_PONTOON_GIT_URL` - This is a URL to an empty git repository where `wagtail-localize-pontoon` will push source strings and Pontoon will push back translations.
`WAGTAILLOCALIZE_PONTOON_GIT_CLONE_DIR` - The local directory where the git repository will be checked out.

## Configuring page types

Any page types that need to be translatable must inherit from `TranslatablePageMixin` and have `translated_fields` set to a list of field
names that are translatable.

To migrate existing page types to be translatable, use the `BootstrapTranslatableMixin` clas sfrom `wagtail-localize` to help create the migrations. See docstring on that class for details.

## Running initial sync

When adding to an existing site, we firstly need to manually submit any existing content to Pontoon.

Firstly run the `sync_languages` command. This creates `Language` objects for all the languages defined in your `LANGUAGES` setting.

Then run the `submit_whole_site_to_pontoon` management command. This generates submissions for all live translatable pages on the site.

Then finally run the `sync_pontoon` management command. This pushes the source strings to Pontoon.

## How it works

This relies heavily on `wagtail-localize`'s `translation_memory` module to track which source strings need to be translated in order to create/update a translated version of a page.

### Creating submissions

Pages are submitted to Pontoon when the English (US) version of any transltable page is published. Nothing is uploaded to git at this time (this would be done when the `sync_pontoon` management command is next run), but the following happens when the page is published:

- All translatable segments are extracted from the page and saved into the `translation_memory.Segment` model. This model holds unique source strings, the locations where these strings appear on actual pages is stored in the `translation_memory.SegmentPageLocation` model.
- A `wagtail_localize_pontoon.PontoonResourceSubmission` is created to note which page revision needs to be submitted to Pontoon.

Note: The `submit_whole_site_to_pontoon` command runs this process for all live translatable pages.

### Pushing source strings to Pontoon

The `sync_pontoon` command firstly fetches the git repo, then checks for and imports any translated strings. This is covered in the next section.

After new translations are ingested (if there are any), it will rewrite all of the source and locale `.po` files based on the records in `PontoonResourceSubmission` and the segments/translations in translation memory.

Note: All source strings are added into both the source `.pot` file and each locale-specific `.po` file, because Pontoon will not send back translations unless the source strings exist in both places.

If a segment is no longer used on a page, it is removed from the source `.pot` file, but may be left in the locale-specific `.po` files if a translation existed for that string but will be flagged as obsolete.

### Pulling translations from Pontoon

At the beginning of the `sync_pontoon` command, the git repo is fetched and if there are any changes, a diff is performed between the new remote `HEAD` and the local `HEAD`.

If any of the locale PO files have been modified, they will be parsed and any new/changed translations saved in the `translation_memory.SegmentTranslation` model.

After a locale PO file is imported, the translation progress of the associated page is checked by making a query against the `translation_memory.{Segment,SegmentTranslation}` models. If the page is ready to be translated, it will create/update the translated version of the page and publish it.

If a page is ready to be translated, but it's parent is not translated into the target language, the translation is delayed until the parent is translated.

### Caveats

- Any edits on translated pages will be overwritten if the original page is updated and translated again.
- If a page is translated but one of the strings is edited in pontoon, the new version of the string will not be pulled through automatically. The original page should be re-submitted to Pontoon again first.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ nav:
- Getting Started:
- Setup: setup.md
- Tutorial: tutorial.md
- How to Guides: pontoon.md
9 changes: 9 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,15 @@
'testing': [
'psycopg2>=2.6',
],
'pontoon': [
'polib>=1.1,<2.0',
'pygit2>=0.28,<0.29',
'gitpython>=3.0,<4.0',
'toml>=0.10,<0.11',
],
'google_translate': [
'googletrans>=2.4,<3.0',
],
},
zip_safe=False,
)
2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ envlist = py{37}-dj{22,master}-wa{26,27}-{postgres}
ignore = D100,D101,D102,D103,D105,D200,D202,D204,D205,D209,D400,D401,E303,E501,W503,N805,N806

[testenv]
install_command = pip install -e ".[testing]" -U {opts} {packages}
install_command = pip install -e ".[testing,pontoon,google_translate]" -U {opts} {packages}
commands = coverage run testmanage.py test

basepython =
Expand Down
204 changes: 111 additions & 93 deletions wagtail_localize/segments/__init__.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,69 @@
from collections import Counter

from django.contrib.contenttypes.models import ContentType
from django.forms.utils import flatatt
from django.utils.html import escape

from .html import extract_html_elements, restore_html_elements


class SegmentValue:
class BaseValue:
def __init__(self, path, order=0):
self.path = path
self.order = order

def clone(self):
"""
Clones this segment. Must be overridden in subclass.
"""
raise NotImplementedError

def with_order(self, order):
"""
Sets the order of this segment.
"""
clone = self.clone()
clone.order = order
return clone

def wrap(self, base_path):
"""
Appends a component to the beginning of the path.

For example:

>>> s = SegmentValue('field', 'foo')
>>> s.wrap('wrapped')
SegmentValue('wrapped.field', 'foo')
"""
new_path = base_path

if self.path:
new_path += "." + self.path

clone = self.clone()
clone.path = new_path
return clone

def unwrap(self):
"""
Pops a component from the beginning of the path. Reversing .wrap().

For example:

>>> s = SegmentValue('wrapped.field', 'foo')
>>> s.unwrap()
'wrapped', SegmentValue('field', 'foo')
"""
first_component, *remaining_components = self.path.split(".")
new_path = ".".join(remaining_components)

clone = self.clone()
clone.path = new_path
return first_component, clone


class SegmentValue(BaseValue):
class HTMLElement:
"""
Represents the position of an inline element within an HTML segment value.
Expand Down Expand Up @@ -48,12 +105,17 @@ def __eq__(self, other):
def __repr__(self):
return f"<SegmentValue.HTMLElement {self.identifier} '{self.element_tag}' at [{self.start}:{self.end}]>"

def __init__(self, path, text, html_elements=None, order=0):
self.path = path
self.order = order
def __init__(self, path, text, html_elements=None, **kwargs):
self.text = text
self.html_elements = html_elements

super().__init__(path, **kwargs)

def clone(self):
return SegmentValue(
self.path, self.text, html_elements=self.html_elements, order=self.order
)

@classmethod
def from_html(cls, path, html):
text, elements = extract_html_elements(html)
Expand All @@ -70,46 +132,6 @@ def from_html(cls, path, html):

return cls(path, text, html_elements)

def with_order(self, order):
"""
Sets the order of this segment.
"""
return SegmentValue(self.path, self.text, self.html_elements, order=order)

def wrap(self, base_path):
"""
Appends a component to the beginning of the path.

For example:

>>> s = SegmentValue('field', "The text")
>>> s.wrap('relation')
SegmentValue('relation.field', "The text")
"""
new_path = base_path

if self.path:
new_path += "." + self.path

return SegmentValue(new_path, self.text, self.html_elements, order=self.order)

def unwrap(self):
"""
Pops a component from the beginning of the path. Reversing .wrap().

For example:

>>> s = SegmentValue('relation.field', "The text")
>>> s.unwrap()
'relation', SegmentValue('field', "The text")
"""
base_path, *remaining_components = self.path.split(".")
new_path = ".".join(remaining_components)
return (
base_path,
SegmentValue(new_path, self.text, self.html_elements, order=self.order),
)

@property
def html(self):
if not self.html_elements:
Expand Down Expand Up @@ -200,77 +222,73 @@ def __repr__(self):
return '<SegmentValue {} "{}">'.format(self.path, self.html)


class TemplateValue:
def __init__(self, path, format, template, segment_count, order=0):
self.path = path
self.order = order
class TemplateValue(BaseValue):
def __init__(self, path, format, template, segment_count, **kwargs):
self.format = format
self.template = template
self.segment_count = segment_count

def with_order(self, order):
"""
Sets the order of this segment.
"""
super().__init__(path, **kwargs)

def clone(self):
return TemplateValue(
self.path, self.format, self.template, self.segment_count, order=order
self.path, self.format, self.template, self.segment_count, order=self.order
)

def wrap(self, base_path):
"""
Appends a component to the beginning of the path.
def is_empty(self):
return self.template in ["", None]

For example:
def __eq__(self, other):
return (
isinstance(other, TemplateValue)
and self.path == other.path
and self.format == other.format
and self.template == other.template
and self.segment_count == other.segment_count
)

>>> s = TemplateValue('field', 'html', "<text position=\"0\">, 1)
>>> s.wrap('relation')
TemplateValue('relation.field', 'html', "<text position=\"0\">, 1)
"""
new_path = base_path
def __repr__(self):
return "<TemplateValue {} format:{} {} segments>".format(
self.path, self.format, self.segment_count
)

if self.path:
new_path += "." + self.path

return TemplateValue(
new_path, self.format, self.template, self.segment_count, order=self.order
)
class RelatedObjectValue(BaseValue):
def __init__(self, path, content_type, translation_key, **kwargs):
self.content_type = content_type
self.translation_key = translation_key

def unwrap(self):
"""
Pops a component from the beginning of the path. Reversing .wrap().
super().__init__(path, **kwargs)

For example:
@classmethod
def from_instance(cls, path, instance):
model = instance.get_translation_model()
return cls(
path, ContentType.objects.get_for_model(model), instance.translation_key
)

>>> s = TemplateValue('relation.field', 'html', "<text position=\"0\">, 1)
>>> s.unwrap()
'relation', TemplateValue('field', 'html', "<text position=\"0\">, 1)
"""
base_path, *remaining_components = self.path.split(".")
new_path = ".".join(remaining_components)
return (
base_path,
TemplateValue(
new_path,
self.format,
self.template,
self.segment_count,
order=self.order,
),
def get_instance(self, locale):
return self.content_type.get_object_for_this_type(
translation_key=self.translation_key, locale=locale
)

def clone(self):
return RelatedObjectValue(
self.path, self.content_type, self.translation_key, order=self.order
)

def is_empty(self):
return self.template in ["", None]
return self.content_type is None and self.translation_key is None

def __eq__(self, other):
return (
isinstance(other, TemplateValue)
isinstance(other, RelatedObjectValue)
and self.path == other.path
and self.format == other.format
and self.template == other.template
and self.segment_count == other.segment_count
and self.content_type == other.content_type
and self.translation_key == other.translation_key
)

def __repr__(self):
return "<TemplateValue {} format:{} {} segments>".format(
self.path, self.format, self.segment_count
return "<RelatedObjectValue {} {} {}>".format(
self.path, self.content_type, self.translation_key
)
Loading