-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Back translation transformation #534
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I've also added an option called chained_back_translation that allows users to repeatedly translate between the target language and source language given a language model. @cogeid
@cogeid can you check why some tests failed.. |
75
=================================== FAILURES ===================================
76
_ test_command_line_list[list_augmentation_recipes-textattack list augmentation-recipes-tests/sample_outputs/list_augmentation_recipes.txt] _
77
78
name = 'list_augmentation_recipes'
79
command = 'textattack list augmentation-recipes'
80
sample_output_file = 'tests/sample_outputs/list_augmentation_recipes.txt'
81
82
@pytest.mark.parametrize("name, command, sample_output_file", list_test_params)
83
def test_command_line_list(name, command, sample_output_file):
84
desired_text = open(sample_output_file).read().strip()
85
86
# Run command and validate outputs.
87
result = run_command_and_get_result(command)
88
89
assert result.stdout is not None
90
assert result.stderr is not None
91
92
stdout = result.stdout.decode().strip()
93
print("stdout =>", stdout)
94
stderr = result.stderr.decode().strip()
95
print("stderr =>", stderr)
96
97
> assert stdout == desired_text
98
E AssertionError: assert '\x1b[94mback...NetAugmenter)' == '\x1b[94mchar...NetAugmenter)'
99
E + back_trans (textattack.augmentation.BackTranslationAugmenter)
100
E charswap (textattack.augmentation.CharSwapAugmenter)
101
E checklist (textattack.augmentation.CheckListAugmenter)
102
E clare (textattack.augmentation.CLAREAugmenter)
103
E eda (textattack.augmentation.EasyDataAugmenter)
104
E embedding (textattack.augmentation.EmbeddingAugmenter)
105
E wordnet (textattack.augmentation.WordNetAugmenter)
106
107
tests/test_command_line/test_list.py:28: AssertionError
108
----------------------------- Captured stdout call -----------------------------
109
stdout => back_trans (textattack.augmentation.BackTranslationAugmenter)
110
charswap (textattack.augmentation.CharSwapAugmenter)
111
checklist (textattack.augmentation.CheckListAugmenter)
112
clare (textattack.augmentation.CLAREAugmenter)
113
eda (textattack.augmentation.EasyDataAugmenter)
114
embedding (textattack.augmentation.EmbeddingAugmenter)
115
wordnet (textattack.augmentation.WordNetAugmenter)
116
stderr => 2021-10-14 02:03:42.569883: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.6.15/x64/lib
117
2021-10-14 02:03:42.569927: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
118 |
I have been working on a fix, this seems like a test issue rather than a
bug in the code.
…On Thu, Oct 14, 2021 at 4:37 PM qiyanjun ***@***.***> wrote:
@cogeid <https://github.com/cogeid>
75=================================== FAILURES ===================================76_ test_command_line_list[list_augmentation_recipes-textattack list augmentation-recipes-tests/sample_outputs/list_augmentation_recipes.txt] _77
78name = 'list_augmentation_recipes'79command = 'textattack list augmentation-recipes'80sample_output_file = 'tests/sample_outputs/list_augmentation_recipes.txt'81
82
@pytest.mark.parametrize("name, command, sample_output_file", list_test_params)83
def test_command_line_list(name, command, sample_output_file):84
desired_text = open(sample_output_file).read().strip()85
86
# Run command and validate outputs.87
result = run_command_and_get_result(command)88
89
assert result.stdout is not None90
assert result.stderr is not None91
92
stdout = result.stdout.decode().strip()93
print("stdout =>", stdout)94
stderr = result.stderr.decode().strip()95
print("stderr =>", stderr)96
97> assert stdout == desired_text98E AssertionError: assert '\x1b[94mback...NetAugmenter)' == '\x1b[94mchar...NetAugmenter)'99E + back_trans (textattack.augmentation.BackTranslationAugmenter)100E charswap (textattack.augmentation.CharSwapAugmenter)101E checklist (textattack.augmentation.CheckListAugmenter)102E clare (textattack.augmentation.CLAREAugmenter)103E eda (textattack.augmentation.EasyDataAugmenter)104E embedding (textattack.augmentation.EmbeddingAugmenter)105E wordnet (textattack.augmentation.WordNetAugmenter)106
107tests/test_command_line/test_list.py:28: AssertionError108----------------------------- Captured stdout call -----------------------------109stdout => back_trans (textattack.augmentation.BackTranslationAugmenter)110charswap (textattack.augmentation.CharSwapAugmenter)111checklist (textattack.augmentation.CheckListAugmenter)112clare (textattack.augmentation.CLAREAugmenter)113eda (textattack.augmentation.EasyDataAugmenter)114embedding (textattack.augmentation.EmbeddingAugmenter)115wordnet (textattack.augmentation.WordNetAugmenter)116stderr => 2021-10-14 02:03:42.569883: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.6.15/x64/lib1172021-10-14 02:03:42.569927: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.118
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#534 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARYDGERY47NK6NLX5Z72EPLUG45RRANCNFSM5FEIWHQQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I also tried testing these changes earlier and have been getting internal errors in PyTest. Sometimes the test also freezes in the middle or slows down. I've attached the output of one run below. Not sure if this behavior is normal.
|
@cogeid this looks ready for merge! Last question: why the default lang for the |
The language parameter in the translate function is the target language, so
it assumes that the "input" parameter will be in English, and it will
translate to Spanish then back to English.
…On Fri, Oct 15, 2021 at 11:05 AM qiyanjun ***@***.***> wrote:
@cogeid <https://github.com/cogeid> this looks ready for merge!
Last question: why the default lang for the translate function is not
"en" ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#534 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARYDGEWCTQWPVCW7YSKD2D3UHA7MFANCNFSM5FEIWHQQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@cogeid makes sense.. ready to merge! |
What does this PR do?
Summary
This PR adds back-translation transformation, which uses MarianMT model to translate input texts.
Additions
Here is a snippet of code demonstrating back-translation for text augmentation: