Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Back translation transformation #534

Merged
merged 7 commits into from
Oct 15, 2021
Merged

Conversation

cogeid
Copy link
Contributor

@cogeid cogeid commented Oct 1, 2021

What does this PR do?

Summary

This PR adds back-translation transformation, which uses MarianMT model to translate input texts.

Additions

  • Added sentence level transformation module within transformations module
  • Added back-translation transformation

Here is a snippet of code demonstrating back-translation for text augmentation:
Screen Shot 2021-10-01 at 5 32 31 AM

@cogeid cogeid requested a review from Hanyu-Liu-123 October 1, 2021 09:38
Copy link
Collaborator

@Hanyu-Liu-123 Hanyu-Liu-123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I've also added an option called chained_back_translation that allows users to repeatedly translate between the target language and source language given a language model. @cogeid

@qiyanjun
Copy link
Member

@cogeid can you check why some tests failed..

@qiyanjun
Copy link
Member

@cogeid

75
=================================== FAILURES ===================================
76
_ test_command_line_list[list_augmentation_recipes-textattack list augmentation-recipes-tests/sample_outputs/list_augmentation_recipes.txt] _
77

78
name = 'list_augmentation_recipes'
79
command = 'textattack list augmentation-recipes'
80
sample_output_file = 'tests/sample_outputs/list_augmentation_recipes.txt'
81

82
    @pytest.mark.parametrize("name, command, sample_output_file", list_test_params)
83
    def test_command_line_list(name, command, sample_output_file):
84
        desired_text = open(sample_output_file).read().strip()
85
    
86
        # Run command and validate outputs.
87
        result = run_command_and_get_result(command)
88
    
89
        assert result.stdout is not None
90
        assert result.stderr is not None
91
    
92
        stdout = result.stdout.decode().strip()
93
        print("stdout =>", stdout)
94
        stderr = result.stderr.decode().strip()
95
        print("stderr =>", stderr)
96
    
97
>       assert stdout == desired_text
98
E       AssertionError: assert '\x1b[94mback...NetAugmenter)' == '\x1b[94mchar...NetAugmenter)'
99
E         + back_trans (textattack.augmentation.BackTranslationAugmenter)
100
E           charswap (textattack.augmentation.CharSwapAugmenter)
101
E           checklist (textattack.augmentation.CheckListAugmenter)
102
E           clare (textattack.augmentation.CLAREAugmenter)
103
E           eda (textattack.augmentation.EasyDataAugmenter)
104
E           embedding (textattack.augmentation.EmbeddingAugmenter)
105
E           wordnet (textattack.augmentation.WordNetAugmenter)
106

107
tests/test_command_line/test_list.py:28: AssertionError
108
----------------------------- Captured stdout call -----------------------------
109
stdout => back_trans (textattack.augmentation.BackTranslationAugmenter)
110
charswap (textattack.augmentation.CharSwapAugmenter)
111
checklist (textattack.augmentation.CheckListAugmenter)
112
clare (textattack.augmentation.CLAREAugmenter)
113
eda (textattack.augmentation.EasyDataAugmenter)
114
embedding (textattack.augmentation.EmbeddingAugmenter)
115
wordnet (textattack.augmentation.WordNetAugmenter)
116
stderr => 2021-10-14 02:03:42.569883: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.6.15/x64/lib
117
2021-10-14 02:03:42.569927: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
118

@cogeid
Copy link
Contributor Author

cogeid commented Oct 14, 2021 via email

@donggrant
Copy link

donggrant commented Oct 14, 2021

I also tried testing these changes earlier and have been getting internal errors in PyTest. Sometimes the test also freezes in the middle or slows down. I've attached the output of one run below. Not sure if this behavior is normal.

(textattackenv) root@DG9XW4Q2:~/cs_research/TextAttack# make test
python -m pytest --dist=loadfile -n auto
============================================================================== test session starts ===============================================================================
platform linux -- Python 3.7.11, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /root/cs_research/TextAttack, configfile: pytest.ini, testpaths: tests
plugins: xdist-2.4.0, forked-1.3.0
gw0 [58] / gw1 [58] / gw2 [58] / gw3 [58]
.............................[gw1] node down: Not properly terminated
F
replacing crashed worker gw1
gw0 [58] / gw4 ok / gw2 [58] / gw3 [58].[gw2] node down: Not properly terminated

replacing crashed worker gw2
gw0 [58] / gw4 [58] / gw5 [58] / gw3 [58][gw4] node down: Not properly terminated
F
replacing crashed worker gw4
gw0 [58] / gw6 [58] / gw5 [58] / gw3 [58][gw5] node down: Not properly terminated
F
replacing crashed worker gw5
gw0 [58] / gw6 [58] / gw7 [58] / gw3 [58]F...........

@qiyanjun
Copy link
Member

@cogeid this looks ready for merge!

Last question: why the default lang for the translate function is not "en" ?

@cogeid
Copy link
Contributor Author

cogeid commented Oct 15, 2021 via email

@qiyanjun
Copy link
Member

@cogeid makes sense.. ready to merge!

@qiyanjun qiyanjun merged commit c5c10f5 into master Oct 15, 2021
@jxmorris12 jxmorris12 deleted the back-translation-transformation branch October 17, 2021 16:35
@agver0 agver0 mentioned this pull request Nov 5, 2021
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants