Added sentence reordering transformation #48

zijwang · 2021-06-20T16:56:30Z

No description provided.

kaustubhdhole · 2021-06-26T07:20:49Z

Hi @zijwang , thank you very much for your changes and your PR!

Sentence shuffling seems to be a great addition to NL-Augmenter but I believe it can be made stronger - Currently it can easily change meaning when transformed (especially in narration, news articles, stories, etc). So, I would like to suggest one change - i.e. at least substituting mentions of pronouns with their actual mentions so that cases like the ones below could be handled:

John is a great person. He resides in Australia. Peter is a wonderful person. He resides in India.

There is a wonderful recent work on sentence decontextualisation which is very relevant in this case which attempts to convert sentences into their decontextualised counterparts - so that they can make sense without the context. https://arxiv.org/pdf/2102.05169.pdf

I would recommend

either using a coref model (probably neural-coref https://github.com/huggingface/neuralcoref to keep consistency with spacy, others should also be fine)
or using this sort of a decontextualising model

The following is my overall review:

Correctness: All checks have passed - please pull "main" once into your branch since we added a facility to allow multiple generations. There's a minor change in the syntax of the outputs.
Interface: I think you've chosen the correct interface.
Specificity: The change is not specific but it is generic and useful.
Novelty: Sentence shuffling or any similar approach is not implemented in NL-Augmenter so far.
Adding New Libraries: The new libraries need to be added in the same transformations folder's requirements.txt.
Test Cases: 5 test cases have been added.
Evaluating Robustness: Robustness evaluation has been conducted.

Thanks again for your changes!

zijwang · 2021-06-27T19:39:21Z

Thanks for the review and suggestions, @kaustubhdhole !

Adding coreference resolution makes sense. I added NeuralCoref to the code to and updated the README and the requirements file. Please let me know how you think now.

Note that the NeuralCoref package seems to have compatibility issues with different versions of SpaCy (see a few issues in NeuralCoref's repo). I added the source code link into the requirements file following this github issue, which hopefully could resolve the issue.

zijwang · 2021-06-29T03:08:19Z

Hi @kaustubhdhole , I tried to add NeuralCoref into the dependency list, however, it looks like the pipeline failed for multiple times. Could you take a look and let me know what the correct config is? Thanks!

kaustubhdhole · 2021-06-29T07:11:26Z

The pipeline seems to be failing because neuralcoref only seems to work on spacy=2.1.0 from what I read. So, we will have to skip neuralcoref till they start supporting later versions. Would you like to try the decontextualizing model rather or any HF coref model?

https://colab.research.google.com/github/google-research/language/blob/master/language/decontext/decontextualization_demo.ipynb#scrollTo=KUGNUKuZ5Vky

They seem to be using tensorflow-text.

kaustubhdhole · 2021-06-29T07:33:49Z

Adding @vgtomahawk here since he had used neuralcoref earlier.

zijwang · 2021-06-29T15:35:56Z

@kaustubhdhole Actually I have got both NeuralCoref + Spacy 2.2.x and NeuralCoref + Spacy 2.3.x working locally. Could you see whether there is any special setting of the pipeline?

Also I have seen the error Fatal Python error: Segmentation fault from the pipeline locally with the same setup. This could be related to memory/disk limits. Could you also take a look of that?

zijwang · 2021-07-03T02:52:34Z

Hey @kaustubhdhole and @vgtomahawk ! Just wanna ping you again to see whether you have any idea on the reason of the error from the pipeline. Let me know there is anything I could help. Thanks!

kaustubhdhole · 2021-07-03T03:29:58Z

Hi @zijwang

The error is due to spacy and neural coref versions clashing. NeuralCoref seems to only work with spacy 2.1.0 and if we downgrade spacy, then checklist would not work. I would suggest you to use a different coref resolver than neuralcoref.

zijwang · 2021-07-03T03:41:45Z

Hi @kaustubhdhole ! Sure I will look into alternatives and let u know once I updated the code.

Meanwhile, as mentioned above, I did get nerualcoref working with spacy 2.2.x and 2.3.x locally on a ubuntu machine. It does require compiling from source though. Let me know if you know a way that I could do that using easy_install and I could also try that out.

zijwang · 2021-07-04T06:31:00Z

Hi @kaustubhdhole ! I updated the coref model to this one using AllenNLP. The pipeline seems working now. Could you take a look?

kaustubhdhole · 2021-07-04T06:31:31Z

Thank you @zijwang for your continuous efforts! I think this PR is qualitatively in great shape! There are 3 minor changes to improve speed:

In order to avoid conflicts with others' requirements.txt, you can add the changes for the requirements.txt in a new file inside the folder "sentence_reordering" like this.
Also, the loading of the coref model predictor = Predictor.from_path.. and other heavy loading can be moved into the constructor so that the transformation does not take up memory during import.
The NLTK sentence splitter can be replaced with the spacy one which is already present in NL-Augmenter.

This snippet has been taken from here.

import spacy
nlp = spacy.load('en_core_web_sm')

text = 'My first birthday was great. My 2. was even better.'
sentences = [i for i in nlp(text).sents]

Also, you might want to update the README file too which mentions "neuralcoref".

Thanks again for your efforts!

zijwang · 2021-07-04T06:43:05Z

Thanks for the quick reply @kaustubhdhole ! I have updated the code and README.

transformations/sentence_reordering/transformation.py

vgtomahawk

Hi @zijwang , thank once again for going above and beyond to incorporate changes for this PR! I didn't notice @kaustubhdhole had tagged me here earlier, but I had faced similar issues with the neuralcoref lob and never managed to get it to work above that specific version of spacy - its commendable you even managed to pull that off locally.

I think its nice we finally did not take the decontextualization route (though it is a very pertinent thing to discuss in this context) since it may have made the sentences enlarged and damaged coherence/naturalness. Though it would indeed be super-interesting as future work to think how we can use decontextualization to do some doc-level stuff [not necessarily shuffling].

I don't think I've much else to add really..

My overall review:

Correctness: I think things seem good on this front
Interface: I think the interface chosen here is apt and pertinent
Specificity: The change is applicable to any task as long as the inputs are multi-sentence i.e non-trivial documents..

Novelty: It is novel w.r.t NL-Augmenter. Even outside NL-Augmenter, I've rarely seen implementations that take care to fix coreference after doing the shuffling.

Btw just FYI, I think this PR has already been heavy and I am not asking you or anyone to address this, but there are still certain phenomenon other than coref that can go wrong after one does shuffling [as you might know]. One prominent one is ellipsis -

S: 1. All of the Ross family has red hair, except Henry. 2. Henry has blonde hair that is very curly. 3. Henry’s father often teases Henry’s mother about the mailman. 4. The mailman has blonde, curly hair, but he is very ugly.5. His dad’s teasing makes Henry feel bad.

S’: 1. All of the Ross family has red hair, except Henry. 2. His dad’s teasing about the mailman makes Henry feel very bad.3. This is because the mailman has blonde, curly hair, but he is very ugly. 4. Henry also has blonde hair that is very curly. 5. Henry’s father often teases Henry’s mother about the mailman

(Note that this example is not my own but from some arxiv preprint i know)

You can see "about the mailman" which was ellipsis-ed out earlier had to be added in here. However, unfortunately even basic ellipsis detection tools are poorly developed today IMO.

Adding New Libraries: This has already been discussed in the previous review, nothing new to add here

Test Cases: 6 test cases are present, sufficient per reqs.

Evaluating Robustness: Appropriate robustness evaluation has been carried out.

Thanks again, and sorry for not chiming into the discussion earlier!!

zijwang · 2021-07-05T00:07:42Z

Thanks a lot @vgtomahawk for the review! This ellipsis problem is indeed hard to resolve. I added a note on this point into the README.

It seems this task already got 2 approvals. Is there anything I should do to merge it into master?

vgtomahawk · 2021-07-05T03:54:59Z

Thanks a lot @vgtomahawk for the review! This ellipsis problem is indeed hard to resolve. I added a note on this point into the README.

I think building a complete sentence shuffler which addresses all such doc level phenomena would be an amazing open prob to work on - I also know just these two basically [ellipsis/coref] but there are many more.. Thanks for updating limitations!!

It seems this task already got 2 approvals. Is there anything I should do to merge it into master?

Done! Merged!

zijwang · 2021-07-10T16:16:06Z

FYI @vgtomahawk there is a new ACL paper on a very similar topic: https://arxiv.org/pdf/2107.03448.pdf

kaustubhdhole · 2021-07-16T11:53:30Z

Thank you very much for your changes. Your changes are merged. As NLP has largely focussed on high-resource languages and domains, if possible, we also encourage you to add a transformation in your mother tongue and/or a low-resource language which might need more attention from the NLP community. This is not mandatory.

zijwang added 2 commits June 20, 2021 09:52

added sentence reordering transformation

f34fa27

update results

b196632

kaustubhdhole added the transformation label Jun 20, 2021

kaustubhdhole self-assigned this Jun 26, 2021

zijwang added 3 commits June 26, 2021 22:40

Merge branch 'GEM-benchmark:main' into sentence_reordering

01fbf48

add neuralcoref

d3323a2

fix requirements.txt

750f4b3

zijwang added 5 commits June 27, 2021 21:06

Update requirements.txt

e4c1aea

Update requirements.txt

e5e90db

Merge branch 'GEM-benchmark:main' into sentence_reordering

52fa4b7

Update requirements.txt

c90143e

reverting requirements.txt to basic version

ec12f22

zijwang added 3 commits July 3, 2021 21:49

Merge branch 'GEM-benchmark:main' into sentence_reordering

3bb198e

change coref model to allennlp

a22f25e

update readme

dc6c71d

zijwang added 2 commits July 3, 2021 23:38

update sent tokenizer and req

1205816

fix sent tokenizer

c1b4681

kaustubhdhole reviewed Jul 4, 2021

View reviewed changes

transformations/sentence_reordering/transformation.py Outdated Show resolved Hide resolved

kaustubhdhole reviewed Jul 4, 2021

View reviewed changes

transformations/sentence_reordering/transformation.py Outdated Show resolved Hide resolved

reformat the code

b127e05

kaustubhdhole approved these changes Jul 4, 2021

View reviewed changes

vgtomahawk self-requested a review July 4, 2021 23:15

vgtomahawk approved these changes Jul 4, 2021

View reviewed changes

update limitations

8e641f7

vgtomahawk merged commit fec76ce into GEM-benchmark:main Jul 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added sentence reordering transformation #48

Added sentence reordering transformation #48

zijwang commented Jun 20, 2021

kaustubhdhole commented Jun 26, 2021

zijwang commented Jun 27, 2021

zijwang commented Jun 29, 2021

kaustubhdhole commented Jun 29, 2021 •

edited

Loading

kaustubhdhole commented Jun 29, 2021

zijwang commented Jun 29, 2021

zijwang commented Jul 3, 2021

kaustubhdhole commented Jul 3, 2021

zijwang commented Jul 3, 2021

zijwang commented Jul 4, 2021

kaustubhdhole commented Jul 4, 2021 •

edited

Loading

zijwang commented Jul 4, 2021

vgtomahawk left a comment

zijwang commented Jul 5, 2021

vgtomahawk commented Jul 5, 2021 •

edited

Loading

zijwang commented Jul 10, 2021

kaustubhdhole commented Jul 16, 2021

Added sentence reordering transformation #48

Added sentence reordering transformation #48

Conversation

zijwang commented Jun 20, 2021

kaustubhdhole commented Jun 26, 2021

zijwang commented Jun 27, 2021

zijwang commented Jun 29, 2021

kaustubhdhole commented Jun 29, 2021 • edited Loading

kaustubhdhole commented Jun 29, 2021

zijwang commented Jun 29, 2021

zijwang commented Jul 3, 2021

kaustubhdhole commented Jul 3, 2021

zijwang commented Jul 3, 2021

zijwang commented Jul 4, 2021

kaustubhdhole commented Jul 4, 2021 • edited Loading

zijwang commented Jul 4, 2021

vgtomahawk left a comment

Choose a reason for hiding this comment

zijwang commented Jul 5, 2021

vgtomahawk commented Jul 5, 2021 • edited Loading

zijwang commented Jul 10, 2021

kaustubhdhole commented Jul 16, 2021

kaustubhdhole commented Jun 29, 2021 •

edited

Loading

kaustubhdhole commented Jul 4, 2021 •

edited

Loading

vgtomahawk commented Jul 5, 2021 •

edited

Loading