-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding fine-tuning models to LUKE #18353
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding those new task-specific models. Please look at the implementation of other such models in the library as your implementation contains a couple of issues:
- you force
return_dict=True
for the base model, but we have this flag because some optimization modules do not support dict outputs, the flag should be passed along and then the code should be adapted to deal with base model outputs that are either tuples orModelOutput
. - when returning a tuple, only the non-None values should be kept
Then you are adding the models to the auto-mapping and use the standard examples in their docstring but this model expects a lot more inputs, so I'm not sure if this is something we want. For instance, the masked-lm model was not added to the AutoModelForMaskedLM API in the original PR because its labels are not the same as other masked LM models.
I've left comments for this on the SequenceClassification model, but they apply to all the models you are adding.
@@ -230,6 +231,7 @@ | |||
("led", "LEDForConditionalGeneration"), | |||
("longformer", "LongformerForMaskedLM"), | |||
("longt5", "LongT5ForConditionalGeneration"), | |||
("luke", "LukeForMaskedLM"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model wasn't added in the auto-API on purpose as it does not have the same API as the other ones (it expects different labels in particular).
@sgugger Thank you so much for your detailed comments!
I understand that we should not modify the If we fix the code of these existing classes in this pull request, there is one issue: the The new task-specific classes proposed in this pull request use only
The entity input of LUKE is completely optional. The model can run with only word-based inputs without issues. |
Ah, I hadn't caught the base model was doing it, and I see why it's more practical with the entity hidden states. Let's leave it as is for now then, and we will fix that in the future if a user opens an issue and actually needs the specific parts of PyTorch Can just filter out the not-None values as asked and we should be good to merge then? |
@sgugger Thanks! I've pushed the commit that filters out None entries from the return values when |
Thanks again! |
What does this PR do?
This pull request adds the following four fine-tuning models to LUKE:
LukeForMultipleChoice
LukeForQuestionAnswering
LukeForSequenceClassification
LukeForTokenClassification
LUKE was initially developed to solve entity-related NLP tasks, however, the model can also be used as a BERT-like pretrained model, thus has been frequently used to solve common NLP tasks such as text classification and question answering.
This pull request aims to enable users to easily solve such NLP tasks using LUKE.
Following BERT and RoBERTa, this pull request also adds the
classifier_dropout
property toLukeConfig
.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@NielsRogge