Adding fine-tuning models to LUKE #18353

ikuyamada · 2022-07-29T06:29:16Z

What does this PR do?

This pull request adds the following four fine-tuning models to LUKE:

LukeForMultipleChoice
LukeForQuestionAnswering
LukeForSequenceClassification
LukeForTokenClassification

LUKE was initially developed to solve entity-related NLP tasks, however, the model can also be used as a BERT-like pretrained model, thus has been frequently used to solve common NLP tasks such as text classification and question answering.
This pull request aims to enable users to easily solve such NLP tasks using LUKE.

Following BERT and RoBERTa, this pull request also adds the classifier_dropout property to LukeConfig.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@NielsRogge

HuggingFaceDocBuilderDev · 2022-07-29T06:40:07Z

The documentation is not available anymore as the PR was closed or merged.

tests/models/luke/test_modeling_luke.py

NielsRogge

Thanks for adding this! Looks good to me.

sgugger

Thanks for adding those new task-specific models. Please look at the implementation of other such models in the library as your implementation contains a couple of issues:

you force return_dict=True for the base model, but we have this flag because some optimization modules do not support dict outputs, the flag should be passed along and then the code should be adapted to deal with base model outputs that are either tuples or ModelOutput.
when returning a tuple, only the non-None values should be kept

Then you are adding the models to the auto-mapping and use the standard examples in their docstring but this model expects a lot more inputs, so I'm not sure if this is something we want. For instance, the masked-lm model was not added to the AutoModelForMaskedLM API in the original PR because its labels are not the same as other masked LM models.

I've left comments for this on the SequenceClassification model, but they apply to all the models you are adding.

sgugger · 2022-07-29T12:56:27Z

src/transformers/models/auto/modeling_auto.py

@@ -230,6 +231,7 @@
        ("led", "LEDForConditionalGeneration"),
        ("longformer", "LongformerForMaskedLM"),
        ("longt5", "LongT5ForConditionalGeneration"),
+        ("luke", "LukeForMaskedLM"),


This model wasn't added in the auto-API on purpose as it does not have the same API as the other ones (it expects different labels in particular).

src/transformers/models/auto/modeling_auto.py

src/transformers/models/luke/modeling_luke.py

ikuyamada · 2022-08-01T08:15:25Z

@sgugger Thank you so much for your detailed comments!

you force return_dict=True for the base model, but we have this flag because some optimization modules do not support dict outputs, the flag should be passed along and then the code should be adapted to deal with base model outputs that are either tuples or ModelOutput.

I understand that we should not modify the return_dict value in the task-specific classes. However, the existing task-specific classes of LUKE (i.e., LukeForMaskedLM, LukeForEntityClassification, LukeForEntityPairClassification, and LukeForEntitySpanClassification) are also implemented by specifying return_dict=True to the base class.

If we fix the code of these existing classes in this pull request, there is one issue: the entity_last_hidden_state, which is used in the existing task-specific classes, is positioned after the hidden_states and attentions in the BaseLukeModelOutput and BaseLukeModelOutputWithPooling. Therefore, to identify the numerical index of entity_last_hidden_state, we need to write some code that checks whether output_attentions and output_hidden_states are activated. To deal with this, I think it might be a good idea to change the position of the entity_last_hidden_state before the optional hidden-states and attention fields, but it breaks backward compatibility. I would appreciate any advices or tips to address this.

The new task-specific classes proposed in this pull request use only last_hidden_state and pooled_output, so the problem above happens only if we fix the existing task-specific classes.

This model uses a lot of new inputs. Are we sur the standard example which will be included in the docstring will actually work (e.g. are all those entity inputs completely optional?)

The entity input of LUKE is completely optional. The model can run with only word-based inputs without issues.

sgugger · 2022-08-01T12:26:02Z

Ah, I hadn't caught the base model was doing it, and I see why it's more practical with the entity hidden states. Let's leave it as is for now then, and we will fix that in the future if a user opens an issue and actually needs the specific parts of PyTorch return_dict=False.

Can just filter out the not-None values as asked and we should be good to merge then?

ikuyamada · 2022-08-01T14:21:41Z

@sgugger Thanks! I've pushed the commit that filters out None entries from the return values when return_dict is set to False. Please let me know if there are any issues.

sgugger · 2022-08-01T15:09:52Z

Thanks again!

ikuyamada added 3 commits July 29, 2022 14:41

add LUKE models for downstream tasks

3c90887

add new LUKE models to docs

79d8e3a

fix typos

0d16750

NielsRogge reviewed Jul 29, 2022

View reviewed changes

tests/models/luke/test_modeling_luke.py Outdated Show resolved Hide resolved

NielsRogge reviewed Jul 29, 2022

View reviewed changes

tests/models/luke/test_modeling_luke.py Outdated Show resolved Hide resolved

NielsRogge approved these changes Jul 29, 2022

View reviewed changes

NielsRogge requested a review from sgugger July 29, 2022 12:45

remove commented lines

6fe1ec1

sgugger reviewed Jul 29, 2022

View reviewed changes

exclude None items from tuple return values

651c41d

sgugger merged commit 62098b9 into huggingface:main Aug 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding fine-tuning models to LUKE #18353

Adding fine-tuning models to LUKE #18353

ikuyamada commented Jul 29, 2022

HuggingFaceDocBuilderDev commented Jul 29, 2022 •

edited

Loading

NielsRogge left a comment

sgugger left a comment

sgugger Jul 29, 2022

ikuyamada commented Aug 1, 2022

sgugger commented Aug 1, 2022 •

edited

Loading

ikuyamada commented Aug 1, 2022

sgugger commented Aug 1, 2022

Adding fine-tuning models to LUKE #18353

Adding fine-tuning models to LUKE #18353

Conversation

ikuyamada commented Jul 29, 2022

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Jul 29, 2022 • edited Loading

NielsRogge left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

sgugger Jul 29, 2022

Choose a reason for hiding this comment

ikuyamada commented Aug 1, 2022

sgugger commented Aug 1, 2022 • edited Loading

ikuyamada commented Aug 1, 2022

sgugger commented Aug 1, 2022

HuggingFaceDocBuilderDev commented Jul 29, 2022 •

edited

Loading

sgugger commented Aug 1, 2022 •

edited

Loading