Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding fine-tuning models to LUKE #18353

Merged
merged 5 commits into from
Aug 1, 2022

Conversation

ikuyamada
Copy link
Contributor

What does this PR do?

This pull request adds the following four fine-tuning models to LUKE:

  • LukeForMultipleChoice
  • LukeForQuestionAnswering
  • LukeForSequenceClassification
  • LukeForTokenClassification

LUKE was initially developed to solve entity-related NLP tasks, however, the model can also be used as a BERT-like pretrained model, thus has been frequently used to solve common NLP tasks such as text classification and question answering.
This pull request aims to enable users to easily solve such NLP tasks using LUKE.

Following BERT and RoBERTa, this pull request also adds the classifier_dropout property to LukeConfig.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@NielsRogge

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 29, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! Looks good to me.

@NielsRogge NielsRogge requested a review from sgugger July 29, 2022 12:45
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding those new task-specific models. Please look at the implementation of other such models in the library as your implementation contains a couple of issues:

  • you force return_dict=True for the base model, but we have this flag because some optimization modules do not support dict outputs, the flag should be passed along and then the code should be adapted to deal with base model outputs that are either tuples or ModelOutput.
  • when returning a tuple, only the non-None values should be kept

Then you are adding the models to the auto-mapping and use the standard examples in their docstring but this model expects a lot more inputs, so I'm not sure if this is something we want. For instance, the masked-lm model was not added to the AutoModelForMaskedLM API in the original PR because its labels are not the same as other masked LM models.

I've left comments for this on the SequenceClassification model, but they apply to all the models you are adding.

@@ -230,6 +231,7 @@
("led", "LEDForConditionalGeneration"),
("longformer", "LongformerForMaskedLM"),
("longt5", "LongT5ForConditionalGeneration"),
("luke", "LukeForMaskedLM"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model wasn't added in the auto-API on purpose as it does not have the same API as the other ones (it expects different labels in particular).

src/transformers/models/auto/modeling_auto.py Show resolved Hide resolved
src/transformers/models/luke/modeling_luke.py Show resolved Hide resolved
src/transformers/models/luke/modeling_luke.py Outdated Show resolved Hide resolved
src/transformers/models/luke/modeling_luke.py Outdated Show resolved Hide resolved
src/transformers/models/luke/modeling_luke.py Outdated Show resolved Hide resolved
src/transformers/models/luke/modeling_luke.py Outdated Show resolved Hide resolved
src/transformers/models/luke/modeling_luke.py Show resolved Hide resolved
@ikuyamada
Copy link
Contributor Author

@sgugger Thank you so much for your detailed comments!

you force return_dict=True for the base model, but we have this flag because some optimization modules do not support dict outputs, the flag should be passed along and then the code should be adapted to deal with base model outputs that are either tuples or ModelOutput.

I understand that we should not modify the return_dict value in the task-specific classes. However, the existing task-specific classes of LUKE (i.e., LukeForMaskedLM, LukeForEntityClassification, LukeForEntityPairClassification, and LukeForEntitySpanClassification) are also implemented by specifying return_dict=True to the base class.

If we fix the code of these existing classes in this pull request, there is one issue: the entity_last_hidden_state, which is used in the existing task-specific classes, is positioned after the hidden_states and attentions in the BaseLukeModelOutput and BaseLukeModelOutputWithPooling. Therefore, to identify the numerical index of entity_last_hidden_state, we need to write some code that checks whether output_attentions and output_hidden_states are activated. To deal with this, I think it might be a good idea to change the position of the entity_last_hidden_state before the optional hidden-states and attention fields, but it breaks backward compatibility. I would appreciate any advices or tips to address this.

The new task-specific classes proposed in this pull request use only last_hidden_state and pooled_output, so the problem above happens only if we fix the existing task-specific classes.

This model uses a lot of new inputs. Are we sur the standard example which will be included in the docstring will actually work (e.g. are all those entity inputs completely optional?)

The entity input of LUKE is completely optional. The model can run with only word-based inputs without issues.

@sgugger
Copy link
Collaborator

sgugger commented Aug 1, 2022

Ah, I hadn't caught the base model was doing it, and I see why it's more practical with the entity hidden states. Let's leave it as is for now then, and we will fix that in the future if a user opens an issue and actually needs the specific parts of PyTorch return_dict=False.

Can just filter out the not-None values as asked and we should be good to merge then?

@ikuyamada
Copy link
Contributor Author

@sgugger Thanks! I've pushed the commit that filters out None entries from the return values when return_dict is set to False. Please let me know if there are any issues.

@sgugger sgugger merged commit 62098b9 into huggingface:main Aug 1, 2022
@sgugger
Copy link
Collaborator

sgugger commented Aug 1, 2022

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants