Back translation transformation #534

cogeid · 2021-10-01T09:37:43Z

What does this PR do?

Summary

This PR adds back-translation transformation, which uses MarianMT model to translate input texts.

Additions

Added sentence level transformation module within transformations module
Added back-translation transformation

Here is a snippet of code demonstrating back-translation for text augmentation:

Hanyu-Liu-123

Looks good to me! I've also added an option called chained_back_translation that allows users to repeatedly translate between the target language and source language given a language model. @cogeid

qiyanjun · 2021-10-12T21:21:55Z

@cogeid can you check why some tests failed..

qiyanjun · 2021-10-14T20:37:33Z

@cogeid

75
=================================== FAILURES ===================================
76
_ test_command_line_list[list_augmentation_recipes-textattack list augmentation-recipes-tests/sample_outputs/list_augmentation_recipes.txt] _
77

78
name = 'list_augmentation_recipes'
79
command = 'textattack list augmentation-recipes'
80
sample_output_file = 'tests/sample_outputs/list_augmentation_recipes.txt'
81

82
    @pytest.mark.parametrize("name, command, sample_output_file", list_test_params)
83
    def test_command_line_list(name, command, sample_output_file):
84
        desired_text = open(sample_output_file).read().strip()
85
    
86
        # Run command and validate outputs.
87
        result = run_command_and_get_result(command)
88
    
89
        assert result.stdout is not None
90
        assert result.stderr is not None
91
    
92
        stdout = result.stdout.decode().strip()
93
        print("stdout =>", stdout)
94
        stderr = result.stderr.decode().strip()
95
        print("stderr =>", stderr)
96
    
97
>       assert stdout == desired_text
98
E       AssertionError: assert '\x1b[94mback...NetAugmenter)' == '\x1b[94mchar...NetAugmenter)'
99
E         + back_trans (textattack.augmentation.BackTranslationAugmenter)
100
E           charswap (textattack.augmentation.CharSwapAugmenter)
101
E           checklist (textattack.augmentation.CheckListAugmenter)
102
E           clare (textattack.augmentation.CLAREAugmenter)
103
E           eda (textattack.augmentation.EasyDataAugmenter)
104
E           embedding (textattack.augmentation.EmbeddingAugmenter)
105
E           wordnet (textattack.augmentation.WordNetAugmenter)
106

107
tests/test_command_line/test_list.py:28: AssertionError
108
----------------------------- Captured stdout call -----------------------------
109
stdout => back_trans (textattack.augmentation.BackTranslationAugmenter)
110
charswap (textattack.augmentation.CharSwapAugmenter)
111
checklist (textattack.augmentation.CheckListAugmenter)
112
clare (textattack.augmentation.CLAREAugmenter)
113
eda (textattack.augmentation.EasyDataAugmenter)
114
embedding (textattack.augmentation.EmbeddingAugmenter)
115
wordnet (textattack.augmentation.WordNetAugmenter)
116
stderr => 2021-10-14 02:03:42.569883: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.6.15/x64/lib
117
2021-10-14 02:03:42.569927: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
118

cogeid · 2021-10-14T20:50:13Z

I have been working on a fix, this seems like a test issue rather than a bug in the code.

…

On Thu, Oct 14, 2021 at 4:37 PM qiyanjun ***@***.***> wrote: @cogeid <https://github.com/cogeid> 75=================================== FAILURES ===================================76_ test_command_line_list[list_augmentation_recipes-textattack list augmentation-recipes-tests/sample_outputs/list_augmentation_recipes.txt] _77 78name = 'list_augmentation_recipes'79command = 'textattack list augmentation-recipes'80sample_output_file = 'tests/sample_outputs/list_augmentation_recipes.txt'81 82 @pytest.mark.parametrize("name, command, sample_output_file", list_test_params)83 def test_command_line_list(name, command, sample_output_file):84 desired_text = open(sample_output_file).read().strip()85 86 # Run command and validate outputs.87 result = run_command_and_get_result(command)88 89 assert result.stdout is not None90 assert result.stderr is not None91 92 stdout = result.stdout.decode().strip()93 print("stdout =>", stdout)94 stderr = result.stderr.decode().strip()95 print("stderr =>", stderr)96 97> assert stdout == desired_text98E AssertionError: assert '\x1b[94mback...NetAugmenter)' == '\x1b[94mchar...NetAugmenter)'99E + back_trans (textattack.augmentation.BackTranslationAugmenter)100E charswap (textattack.augmentation.CharSwapAugmenter)101E checklist (textattack.augmentation.CheckListAugmenter)102E clare (textattack.augmentation.CLAREAugmenter)103E eda (textattack.augmentation.EasyDataAugmenter)104E embedding (textattack.augmentation.EmbeddingAugmenter)105E wordnet (textattack.augmentation.WordNetAugmenter)106 107tests/test_command_line/test_list.py:28: AssertionError108----------------------------- Captured stdout call -----------------------------109stdout => back_trans (textattack.augmentation.BackTranslationAugmenter)110charswap (textattack.augmentation.CharSwapAugmenter)111checklist (textattack.augmentation.CheckListAugmenter)112clare (textattack.augmentation.CLAREAugmenter)113eda (textattack.augmentation.EasyDataAugmenter)114embedding (textattack.augmentation.EmbeddingAugmenter)115wordnet (textattack.augmentation.WordNetAugmenter)116stderr => 2021-10-14 02:03:42.569883: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.6.15/x64/lib1172021-10-14 02:03:42.569927: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.118 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#534 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARYDGERY47NK6NLX5Z72EPLUG45RRANCNFSM5FEIWHQQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

donggrant · 2021-10-14T22:08:21Z

I also tried testing these changes earlier and have been getting internal errors in PyTest. Sometimes the test also freezes in the middle or slows down. I've attached the output of one run below. Not sure if this behavior is normal.

(textattackenv) root@DG9XW4Q2:~/cs_research/TextAttack# make test
python -m pytest --dist=loadfile -n auto
============================================================================== test session starts ===============================================================================
platform linux -- Python 3.7.11, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /root/cs_research/TextAttack, configfile: pytest.ini, testpaths: tests
plugins: xdist-2.4.0, forked-1.3.0
gw0 [58] / gw1 [58] / gw2 [58] / gw3 [58]
.............................[gw1] node down: Not properly terminated
F
replacing crashed worker gw1
gw0 [58] / gw4 ok / gw2 [58] / gw3 [58].[gw2] node down: Not properly terminated

replacing crashed worker gw2
gw0 [58] / gw4 [58] / gw5 [58] / gw3 [58][gw4] node down: Not properly terminated
F
replacing crashed worker gw4
gw0 [58] / gw6 [58] / gw5 [58] / gw3 [58][gw5] node down: Not properly terminated
F
replacing crashed worker gw5
gw0 [58] / gw6 [58] / gw7 [58] / gw3 [58]F...........

qiyanjun · 2021-10-15T15:05:26Z

@cogeid this looks ready for merge!

Last question: why the default lang for the translate function is not "en" ?

cogeid · 2021-10-15T16:22:50Z

The language parameter in the translate function is the target language, so it assumes that the "input" parameter will be in English, and it will translate to Spanish then back to English.

…

On Fri, Oct 15, 2021 at 11:05 AM qiyanjun ***@***.***> wrote: @cogeid <https://github.com/cogeid> this looks ready for merge! Last question: why the default lang for the translate function is not "en" ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#534 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARYDGEWCTQWPVCW7YSKD2D3UHA7MFANCNFSM5FEIWHQQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

qiyanjun · 2021-10-15T17:25:31Z

@cogeid makes sense.. ready to merge!

diegoc added 2 commits October 1, 2021 05:33

Add sentence level transformation: back-translation

8cf9dd4

update format

bef0501

cogeid requested a review from Hanyu-Liu-123 October 1, 2021 09:38

Add test and chained_back_translation

6045f5c

Hanyu-Liu-123 approved these changes Oct 5, 2021

View reviewed changes

Hanyu-Liu-123 added 2 commits October 4, 2021 23:14

Update back_translation.py

e2c3d04

Add Back Translation Augmentation Recipe

316bc0e

diegoc added 2 commits October 15, 2021 01:15

fix pytest

ebeb3df

pytest fix 2

830de53

qiyanjun merged commit c5c10f5 into master Oct 15, 2021

jxmorris12 deleted the back-translation-transformation branch October 17, 2021 16:35

agver0 mentioned this pull request Nov 5, 2021

Fixed the pytest assertion error #565

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Back translation transformation #534

Back translation transformation #534

cogeid commented Oct 1, 2021 •

edited

Loading

Hanyu-Liu-123 left a comment

qiyanjun commented Oct 12, 2021

qiyanjun commented Oct 14, 2021

cogeid commented Oct 14, 2021 via email

donggrant commented Oct 14, 2021 •

edited

Loading

qiyanjun commented Oct 15, 2021

cogeid commented Oct 15, 2021 via email

qiyanjun commented Oct 15, 2021

Back translation transformation #534

Back translation transformation #534

Conversation

cogeid commented Oct 1, 2021 • edited Loading

What does this PR do?

Summary

Additions

Hanyu-Liu-123 left a comment

Choose a reason for hiding this comment

qiyanjun commented Oct 12, 2021

qiyanjun commented Oct 14, 2021

cogeid commented Oct 14, 2021 via email

donggrant commented Oct 14, 2021 • edited Loading

qiyanjun commented Oct 15, 2021

cogeid commented Oct 15, 2021 via email

qiyanjun commented Oct 15, 2021

cogeid commented Oct 1, 2021 •

edited

Loading

donggrant commented Oct 14, 2021 •

edited

Loading