Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Tars Context #3063

Merged
merged 2 commits into from
Jan 27, 2023
Merged

Improved Tars Context #3063

merged 2 commits into from
Jan 27, 2023

Conversation

helpmefindaname
Copy link
Member

This PR is fixing #3024 and adds the following changes while doing so:

  • tars models (classifcation & ner) won't fail on labels with only 1 character.
  • models that predict on modified sentences (TARSClassifier, TARSTagger and RelationClassifier) now also make use of the context and therefore can be trained and tested in a flert manner.
  • Empty sentences will not be discarded in the context chain, however they won't change the context as they are empty.
  • it is possible to disable context, by setting sentence._has_context = True without setting sentence._next_sentence or sentence._previous_sentence. That way, batch prediction on context independent sentences will work.
  • context is considered to be set if only "_next_sentence" is set. So every first sentence won't be reset again

@alanakbik
Copy link
Collaborator

@helpmefindaname thanks for improving this!

The TARS context expansion only partly works, however. The problem is that TransformerBaseEmbeddings' __expand_sentence_with_context is called after the TARS sentence is created. This puts the context around the sentence + TARS label.

To illustrate, run this script:

# init the TARS sequence tagger
from flair.data import Sentence, Dictionary
from flair.datasets import CONLL_03
from flair.embeddings import TransformerWordEmbeddings
from flair.models import FewshotClassifier, TARSTagger

# quick init of corpus and label dictionary
corpus = CONLL_03(in_memory=False)
label_dictionary = Dictionary()
label_dictionary.add_item('LOC')

# init transformer embeddings with context
embeddings: TransformerWordEmbeddings = TransformerWordEmbeddings('distilbert-base-uncased', use_context=5)

# init few-shot tagger
tart_tagger: FewshotClassifier = TARSTagger(
    embeddings=embeddings,
    num_negative_labels_to_sample=1,
    prefix=True,
    task_name='ner',
    label_type='ner',
    label_dictionary=label_dictionary,
)

# get tars formatted sentence
sentence = corpus.train[1]
print("\n - This is the original sentence:")
print(sentence.text)

tars_sentence: Sentence = tart_tagger._get_tars_formatted_sentence('LOC', sentence)

print("\n - This is the TARS sentence:")
print(tars_sentence.text)

# expand sentence as done in def __expand_sentence_with_context(self, sentence)
left_context = tars_sentence.left_context(5)
right_context = tars_sentence.right_context(5)
expanded_sentence = left_context + tars_sentence.tokens + right_context

print("\n - This is the expanded TARS sentence:")
print(" ".join(token.text for token in expanded_sentence)) 

This will print:

 - This is the original sentence:
Peter Blackburn

 - This is the TARS sentence:
LOC [SEP] Peter Blackburn

 - This is the expanded TARS sentence:
to boycott British lamb . LOC [SEP] Peter Blackburn BRUSSELS 1996-08-22 The European Commission

But the expanded TARS sentence is suboptimal. It should be

  • LOC [SEP] to boycott British lamb . Peter Blackburn BRUSSELS 1996-08-22 The European Commission, instead of
  • to boycott British lamb . LOC [SEP] Peter Blackburn BRUSSELS 1996-08-22 The European Commission

I am not sure what the best solution here could be. One idea that comes to mind is to separate context from main text through separators. I.s. something like:

  • to boycott British lamb . [CONTEXT] LOC [SEP] Peter Blackburn [CONTEXT] BRUSSELS 1996-08-22 The European, or
  • to boycott British lamb . [SEP] LOC [SEP] Peter Blackburn [SEP] BRUSSELS 1996-08-22 The European ...

What do you think?

@alanakbik
Copy link
Collaborator

@helpmefindaname thanks for adding this! I'll merge this now and may fix the context issue later in a separate PR.

@alanakbik alanakbik merged commit 2f017ea into master Jan 27, 2023
@alanakbik alanakbik deleted the fix_tars_context branch January 27, 2023 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants